10 -> 100gb sounds about right. Of course it depends on the relative size of the keys and the values.
HBase needs to store the entire coordinates (rowkey, column identifier, timestamp) for each KeyValue (i.e. each column), whereas the TSV file only stores the values. You can try Snappy or LZO compression if (CPU) performance is the primary consideration or GZ if disk/IO is more important. Also 0.94+ comes with key prefix compression, which will help a lot in many cases. -- Lars ________________________________ From: mete <[email protected]> To: [email protected] Sent: Saturday, April 7, 2012 1:21 PM Subject: hbase table size Hello folks, i am trying to import a CSV file that is around 10 gb into HBASE. After the import, i check the size of the folder with the hadoop fs -du command, and it is a little above 100 gigabytes in size. I did not confgure any compression or anything. I have both tried with sequential import using the api and creating a Hfile and mounting into hbase but the size is nearly the same. Does this sound like normal? Kind Regards. Mete
