Hi Jianwu,

Are you using compression?
Is your data sparse or dense? (I.e. for a typical row key, do all or most 
columns in your "schema" have values, or only a few)?


With HBase you need to keep in mind that each value is tagged with (rowkey, 
column family name, column value, timestamp).
That allows it to store data in a sparse way, but also means that each value 
comes with a lot of baggage.


I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without 
compression and to about 5T with GZ compression.
That is just an anecdote, though, and probably stems from the fact that each 
column in Oracle was transferred to HBase, even empty (null) ones.


-- Lars



________________________________
From: Jianwu Wang <[email protected]>
To: [email protected]
Sent: Monday, August 22, 2011 5:36 PM
Subject: how to get precious data size in hbase?

Hi there,

    We have some data saved in hbase on HDFS. We know using the following 
command can get the file size of each hbase table: hadoop fs -dus 
/hbase/tableName.

    For mysql, we can get exact data size for each table using sql queries 
displayed on 
http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can 
also get file disk size using command like: du -s /path/to/datafile. Yet the 
data size gotten using sql query is quite smaller than the file disk size 
gotten using du -s. We think the above hadoop command also get file disk size, 
not the data size in database. So we are wondering whether there is a way like 
msql query running on hbase shell to get the data size in Hbase.  Thanks a lot!

-- 
Best wishes

Sincerely yours

Jianwu Wang
[email protected]

Reply via email to