Hi Jean I looked at the TableMapReduceUtil code and I implemented my own version of TableOutputFormat to find and isolate the problem.
In TableOutputFormat, table.setAutoFlush(true) is called so the writes can be batch-written. In our case, there are multiple puts on the same row in the batch and only few of them are getting committed. I removed that line in MyOutputFormat, and most of the commits go through. What is the expected behavior in the following case? ArrayList<Put> puts = new ArrayList<Put>(); Put p1 = new Put(Bytes.toBytes(0)); p1.add(family, column, Bytes.toBytes(1)); puts.add(p1); Put p2 = new Put(Bytes.toBytes(0)); p2.add(family, column, Bytes.toBytes(2)); puts.add(p2); table.put(puts); Thanks Karthik On Tue, Jul 27, 2010 at 9:25 AM, Jean-Daniel Cryans <[email protected]>wrote: > TableOutputFormat is really just a wrapper around a HTable, see for > yourself > http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java > > So there must be something else about the way you use it, or the way > you use HTable directly. Showing bits of your code could be helpful. > > J-D > > On Mon, Jul 26, 2010 at 11:17 PM, Karthik Kambatla > <[email protected]> wrote: > > Hi > > > > I am experiencing a few problems with TableMapReduceUtil, where in only > some > > of the puts from the reduce are written to the output table. If I > explicitly > > write to the table from within reduce without using TableMapReduceUtil, > all > > the puts are written to the table. > > > > In our application, multiple puts could be on the same row. In case two > puts > > are on the same key, our application requires both puts to be committed > as > > two different versions. > > > > Am I missing something here? Is there a cleaner way to approach this > issue? > > > > Thanks for the help. > > > > Karthik > > >
