Since they are in the same batch, they could end up on the same timestamp and one will hide the other. When not batched, there's always a few milliseconds between the two Puts so it ends up ok. So for your case it seems like you need to use your own HTable without the write buffer since it's enforced in TOF, that means that the overall throughput will be lower.
J-D On Tue, Jul 27, 2010 at 9:57 AM, Karthik Kambatla <[email protected]> wrote: > Hi Jean > > I looked at the TableMapReduceUtil code and I implemented my own version of > TableOutputFormat to find and isolate the problem. > > In TableOutputFormat, table.setAutoFlush(true) is called so the writes can > be batch-written. In our case, there are multiple puts on the same row in > the batch and only few of them are getting committed. I removed that line in > MyOutputFormat, and most of the commits go through. > > What is the expected behavior in the following case? > > ArrayList<Put> puts = new ArrayList<Put>(); > > Put p1 = new Put(Bytes.toBytes(0)); > p1.add(family, column, Bytes.toBytes(1)); > puts.add(p1); > > Put p2 = new Put(Bytes.toBytes(0)); > p2.add(family, column, Bytes.toBytes(2)); > puts.add(p2); > > table.put(puts); > > Thanks > Karthik > > > > On Tue, Jul 27, 2010 at 9:25 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> TableOutputFormat is really just a wrapper around a HTable, see for >> yourself >> http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java >> >> So there must be something else about the way you use it, or the way >> you use HTable directly. Showing bits of your code could be helpful. >> >> J-D >> >> On Mon, Jul 26, 2010 at 11:17 PM, Karthik Kambatla >> <[email protected]> wrote: >> > Hi >> > >> > I am experiencing a few problems with TableMapReduceUtil, where in only >> some >> > of the puts from the reduce are written to the output table. If I >> explicitly >> > write to the table from within reduce without using TableMapReduceUtil, >> all >> > the puts are written to the table. >> > >> > In our application, multiple puts could be on the same row. In case two >> puts >> > are on the same key, our application requires both puts to be committed >> as >> > two different versions. >> > >> > Am I missing something here? Is there a cleaner way to approach this >> issue? >> > >> > Thanks for the help. >> > >> > Karthik >> > >> >
