When we execute context.write(null,null),we will close the current writer(which 
opened a storefile) and on next write request we will create new writer for 
other storefile.
If a row key has puts of size more than the threshold, then they will be 
written to multiple store files. So same rowkey data will be distributed to 
multiple storefiles. 
In outer while loop we will continue the reduce from the point at which we have 
flushed or rolled. We will not omit any data.

________________________________________
From: Amit Sela [[email protected]]
Sent: Wednesday, November 06, 2013 3:54 PM
To: [email protected]
Subject: PutSortReducer memory threshold

Looking at the code of PutSortReducer I see that if my key has puts with
size bigger than memory, the iteration stops and all puts up to the
threshold point will be written to context.
If iterator has more puts,  context.write(null,null) is executed.
Does this tell the bulk load tool to re-execute the reduce from that point
in some way (if so, how ?) or the rest of the data is just omitted ?

Thanks,

Amit.

Reply via email to