Hello and thanks for the insight. I think i misused it a little bit. I was extracting csv columns and storing each in a different hbase column. (which i did not need at all, they are indexed against the row key anyway) I will try to put the entire line as a single column and compress the family as well for the next run.
Cheers On Mon, Apr 9, 2012 at 1:04 PM, Ioan Eugen Stan <[email protected]>wrote: > 2012/4/7 mete <[email protected]>: > > Hello folks, > > > > i am trying to import a CSV file that is around 10 gb into HBASE. After > the > > import, i check the size of the folder with the hadoop fs -du command, > and > > it is a little above 100 gigabytes in size. > > I did not confgure any compression or anything. I have both tried with > > sequential import using the api and creating a Hfile and mounting into > > hbase but the size is nearly the same. Does this sound like normal? > > > > Kind Regards. > > Mete > > > Hi Mete, > > Start with compression. It's the most easiest solution. Also try to > make your column family of size 1 e.g. "C", or "D" and also make your > qualifiers as small as possible, if possible. This will also save some > space. > > Regards, > -- > Ioan Eugen Stan > http://ieugen.blogspot.com/ >
