When I use my own TableOutputFormat with the setAutoFlush set to true, we still observe loosing one or two commits. There seems to be some other issue too here.
Thanks Karthik On Tue, Jul 27, 2010 at 10:18 AM, Jean-Daniel Cryans <[email protected]>wrote: > Since they are in the same batch, they could end up on the same > timestamp and one will hide the other. When not batched, there's > always a few milliseconds between the two Puts so it ends up ok. So > for your case it seems like you need to use your own HTable without > the write buffer since it's enforced in TOF, that means that the > overall throughput will be lower. > > J-D > > On Tue, Jul 27, 2010 at 9:57 AM, Karthik Kambatla > <[email protected]> wrote: > > Hi Jean > > > > I looked at the TableMapReduceUtil code and I implemented my own version > of > > TableOutputFormat to find and isolate the problem. > > > > In TableOutputFormat, table.setAutoFlush(true) is called so the writes > can > > be batch-written. In our case, there are multiple puts on the same row in > > the batch and only few of them are getting committed. I removed that line > in > > MyOutputFormat, and most of the commits go through. > > > > What is the expected behavior in the following case? > > > > ArrayList<Put> puts = new ArrayList<Put>(); > > > > Put p1 = new Put(Bytes.toBytes(0)); > > p1.add(family, column, Bytes.toBytes(1)); > > puts.add(p1); > > > > Put p2 = new Put(Bytes.toBytes(0)); > > p2.add(family, column, Bytes.toBytes(2)); > > puts.add(p2); > > > > table.put(puts); > > > > Thanks > > Karthik > > > > > > > > On Tue, Jul 27, 2010 at 9:25 AM, Jean-Daniel Cryans <[email protected] > >wrote: > > > >> TableOutputFormat is really just a wrapper around a HTable, see for > >> yourself > >> > http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java > >> > >> So there must be something else about the way you use it, or the way > >> you use HTable directly. Showing bits of your code could be helpful. > >> > >> J-D > >> > >> On Mon, Jul 26, 2010 at 11:17 PM, Karthik Kambatla > >> <[email protected]> wrote: > >> > Hi > >> > > >> > I am experiencing a few problems with TableMapReduceUtil, where in > only > >> some > >> > of the puts from the reduce are written to the output table. If I > >> explicitly > >> > write to the table from within reduce without using > TableMapReduceUtil, > >> all > >> > the puts are written to the table. > >> > > >> > In our application, multiple puts could be on the same row. In case > two > >> puts > >> > are on the same key, our application requires both puts to be > committed > >> as > >> > two different versions. > >> > > >> > Am I missing something here? Is there a cleaner way to approach this > >> issue? > >> > > >> > Thanks for the help. > >> > > >> > Karthik > >> > > >> > > >
