If 2 reducers output the same cell at the same time, it could have the same timestamp.
J-D On Tue, Jul 27, 2010 at 3:34 PM, Karthik Kambatla <[email protected]> wrote: > When I use my own TableOutputFormat with the setAutoFlush set to true, we > still observe loosing one or two commits. There seems to be some other issue > too here. > > Thanks > Karthik > > On Tue, Jul 27, 2010 at 10:18 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Since they are in the same batch, they could end up on the same >> timestamp and one will hide the other. When not batched, there's >> always a few milliseconds between the two Puts so it ends up ok. So >> for your case it seems like you need to use your own HTable without >> the write buffer since it's enforced in TOF, that means that the >> overall throughput will be lower. >> >> J-D >> >> On Tue, Jul 27, 2010 at 9:57 AM, Karthik Kambatla >> <[email protected]> wrote: >> > Hi Jean >> > >> > I looked at the TableMapReduceUtil code and I implemented my own version >> of >> > TableOutputFormat to find and isolate the problem. >> > >> > In TableOutputFormat, table.setAutoFlush(true) is called so the writes >> can >> > be batch-written. In our case, there are multiple puts on the same row in >> > the batch and only few of them are getting committed. I removed that line >> in >> > MyOutputFormat, and most of the commits go through. >> > >> > What is the expected behavior in the following case? >> > >> > ArrayList<Put> puts = new ArrayList<Put>(); >> > >> > Put p1 = new Put(Bytes.toBytes(0)); >> > p1.add(family, column, Bytes.toBytes(1)); >> > puts.add(p1); >> > >> > Put p2 = new Put(Bytes.toBytes(0)); >> > p2.add(family, column, Bytes.toBytes(2)); >> > puts.add(p2); >> > >> > table.put(puts); >> > >> > Thanks >> > Karthik >> > >> > >> > >> > On Tue, Jul 27, 2010 at 9:25 AM, Jean-Daniel Cryans <[email protected] >> >wrote: >> > >> >> TableOutputFormat is really just a wrapper around a HTable, see for >> >> yourself >> >> >> http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java >> >> >> >> So there must be something else about the way you use it, or the way >> >> you use HTable directly. Showing bits of your code could be helpful. >> >> >> >> J-D >> >> >> >> On Mon, Jul 26, 2010 at 11:17 PM, Karthik Kambatla >> >> <[email protected]> wrote: >> >> > Hi >> >> > >> >> > I am experiencing a few problems with TableMapReduceUtil, where in >> only >> >> some >> >> > of the puts from the reduce are written to the output table. If I >> >> explicitly >> >> > write to the table from within reduce without using >> TableMapReduceUtil, >> >> all >> >> > the puts are written to the table. >> >> > >> >> > In our application, multiple puts could be on the same row. In case >> two >> >> puts >> >> > are on the same key, our application requires both puts to be >> committed >> >> as >> >> > two different versions. >> >> > >> >> > Am I missing something here? Is there a cleaner way to approach this >> >> issue? >> >> > >> >> > Thanks for the help. >> >> > >> >> > Karthik >> >> > >> >> >> > >> >
