I already use LZO compression in HBase. Or do you mean a compressed Java object? Do you know an implementation?
kind regards 2013/11/2 Asaf Mesika <[email protected]> > I would try to compress this bit set. > > On Nov 2, 2013, at 2:43 PM, John <[email protected]> wrote: > > > Hi, > > > > thanks for your answer! I increase the "Map Task Maximum Heap Size" to > 2gb > > and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region > > server are now crashing all the time :-/ I try to store the bitvector > > (120mb in size) for some rows. This seems to be very memory intensive, > the > > usedHeapMB increase very fast (up to 2gb). I'm not sure if it is the > > reading or the writing task which causes this, but I thnk its the writing > > task. Any idea how to minimize the memory usage? My mapper looks like > this: > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> { > > > > private void storeBitvectorToHBase( > > Put row = new Put(name); > > row.setWriteToWAL(false); > > row.add(cf, Bytes.toBytes("columname"), > toByteArray(bitvector)); > > ImmutableBytesWritable key = new ImmutableBytesWritable( > > name); > > context.write(key, row); > > } > > } > > > > > > kind regards > > > > > > 2013/11/1 Jean-Marc Spaggiari <[email protected]> > > > >> Ho John, > >> > >> You might be better to ask this on the CDH mailing list since it's more > >> related to Cloudera Manager than HBase. > >> > >> In the meantime, can you try to update the "Map Task Maximum Heap Size" > >> parameter too? > >> > >> JM > >> > >> > >> 2013/11/1 John <[email protected]> > >> > >>> Hi, > >>> > >>> I have a problem with the memory. My use case is the following: I've > >> crated > >>> a MapReduce-job and iterate in this over every row. If the row has more > >>> than for example 10k columns I will create a bloomfilter (a bitSet) for > >>> this row and store it in the hbase structure. This worked fine so far. > >>> > >>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in > >> size. > >>> In every map()-function there exist 2 BitSet. If i try to execute the > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG > >>> > >>> Obviously, the tasktracker does not have enougth memory. I try to > adjust > >>> the configuration for the memory, but I'm not sure which is the right > >> one. > >>> I try to change the "MapReduce Child Java Maximum Heap Size" value from > >> 1GB > >>> to 2GB, but still got the same error. > >>> > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the > >>> Clouder Manager > >>> > >>> kind regards > >>> > >> > >
