Hi Lars,
Thanks for your info. Our data is dense and no compression is used.
We saw a blog on HBASE architecture at
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.
It looks |'hbase org.apache.hadoop.hbase.io.hfile.HFile|' can provide
more detailed info for each HFile and it has info like
'totalBytes=84055'. The totalBytes value is smaller than the value
gotten by "hadoop fs -dus" (84447 in the example). We are still trying
to understand what these values really mean.
On 8/22/11 10:46 PM, lars hofhansl wrote:
Hi Jianwu,
Are you using compression?
Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in your
"schema" have values, or only a few)?
With HBase you need to keep in mind that each value is tagged with (rowkey,
column family name, column value, timestamp).
That allows it to store data in a sparse way, but also means that each value
comes with a lot of baggage.
I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without
compression and to about 5T with GZ compression.
That is just an anecdote, though, and probably stems from the fact that each
column in Oracle was transferred to HBase, even empty (null) ones.
-- Lars
________________________________
From: Jianwu Wang<[email protected]>
To: [email protected]
Sent: Monday, August 22, 2011 5:36 PM
Subject: how to get precious data size in hbase?
Hi there,
We have some data saved in hbase on HDFS. We know using the following
command can get the file size of each hbase table: hadoop fs -dus
/hbase/tableName.
For mysql, we can get exact data size for each table using sql queries
displayed on
http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can
also get file disk size using command like: du -s /path/to/datafile. Yet the
data size gotten using sql query is quite smaller than the file disk size
gotten using du -s. We think the above hadoop command also get file disk size,
not the data size in database. So we are wondering whether there is a way like
msql query running on hbase shell to get the data size in Hbase. Thanks a lot!
--
Best wishes
Sincerely yours
Jianwu Wang
[email protected]