10 -> 100gb sounds about right. Of course it depends on the relative size of 
the keys and the values.

HBase needs to store the entire coordinates (rowkey, column identifier, 
timestamp) for each KeyValue (i.e. each column), whereas the TSV file only 
stores the values.


You can try Snappy or LZO compression if (CPU) performance is the primary 
consideration or GZ if disk/IO is more important.
Also 0.94+ comes with key prefix compression, which will help a lot in many 
cases.


-- Lars



________________________________
 From: mete <[email protected]>
To: [email protected] 
Sent: Saturday, April 7, 2012 1:21 PM
Subject: hbase table size
 
Hello folks,

i am trying to import a CSV file that is around 10 gb into HBASE. After the
import, i check the size of the folder with the hadoop fs -du command, and
it is a little above 100 gigabytes in size.
I did not confgure any compression or anything.  I have both tried with
sequential import using the api and creating a Hfile and mounting into
hbase but the size is nearly the same. Does this sound like normal?

Kind Regards.
Mete

Reply via email to