Hi Jianwu,
Are you using compression? Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in your "schema" have values, or only a few)? With HBase you need to keep in mind that each value is tagged with (rowkey, column family name, column value, timestamp). That allows it to store data in a sparse way, but also means that each value comes with a lot of baggage. I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without compression and to about 5T with GZ compression. That is just an anecdote, though, and probably stems from the fact that each column in Oracle was transferred to HBase, even empty (null) ones. -- Lars ________________________________ From: Jianwu Wang <[email protected]> To: [email protected] Sent: Monday, August 22, 2011 5:36 PM Subject: how to get precious data size in hbase? Hi there, We have some data saved in hbase on HDFS. We know using the following command can get the file size of each hbase table: hadoop fs -dus /hbase/tableName. For mysql, we can get exact data size for each table using sql queries displayed on http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can also get file disk size using command like: du -s /path/to/datafile. Yet the data size gotten using sql query is quite smaller than the file disk size gotten using du -s. We think the above hadoop command also get file disk size, not the data size in database. So we are wondering whether there is a way like msql query running on hbase shell to get the data size in Hbase. Thanks a lot! -- Best wishes Sincerely yours Jianwu Wang [email protected]
