So, If I have a lot of puts per row, say 100 times the number of memory threshold, 100 different store files will be written to the same region (at least...) ? Will this trigger major compaction for the region during/after bulk load ? Is the trigger #storeFiles > hbase.hstore.compactionThreshold ?
On Wed, Nov 6, 2013 at 1:01 PM, rajeshbabu chintaguntla < [email protected]> wrote: > > When we execute context.write(null,null),we will close the current > writer(which opened a storefile) and on next write request we will create > new writer for other storefile. > If a row key has puts of size more than the threshold, then they will be > written to multiple store files. So same rowkey data will be distributed to > multiple storefiles. > In outer while loop we will continue the reduce from the point at which we > have flushed or rolled. We will not omit any data. > > ________________________________________ > From: Amit Sela [[email protected]] > Sent: Wednesday, November 06, 2013 3:54 PM > To: [email protected] > Subject: PutSortReducer memory threshold > > Looking at the code of PutSortReducer I see that if my key has puts with > size bigger than memory, the iteration stops and all puts up to the > threshold point will be written to context. > If iterator has more puts, context.write(null,null) is executed. > Does this tell the bulk load tool to re-execute the reduce from that point > in some way (if so, how ?) or the rest of the data is just omitted ? > > Thanks, > > Amit. >
