If 2 reducers output the same cell at the same time, it could have the
same timestamp.

J-D

On Tue, Jul 27, 2010 at 3:34 PM, Karthik Kambatla
<[email protected]> wrote:
> When I use my own TableOutputFormat with the setAutoFlush set to true, we
> still observe loosing one or two commits. There seems to be some other issue
> too here.
>
> Thanks
> Karthik
>
> On Tue, Jul 27, 2010 at 10:18 AM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> Since they are in the same batch, they could end up on the same
>> timestamp and one will hide the other. When not batched, there's
>> always a few milliseconds between the two Puts so it ends up ok. So
>> for your case it seems like you need to use your own HTable without
>> the write buffer since it's enforced in TOF, that means that the
>> overall throughput will be lower.
>>
>> J-D
>>
>> On Tue, Jul 27, 2010 at 9:57 AM, Karthik Kambatla
>> <[email protected]> wrote:
>> > Hi Jean
>> >
>> > I looked at the TableMapReduceUtil code and I implemented my own version
>> of
>> > TableOutputFormat to find and isolate the problem.
>> >
>> > In TableOutputFormat, table.setAutoFlush(true) is called so the writes
>> can
>> > be batch-written. In our case, there are multiple puts on the same row in
>> > the batch and only few of them are getting committed. I removed that line
>> in
>> > MyOutputFormat, and most of the commits go through.
>> >
>> > What is the expected behavior in the following case?
>> >
>> > ArrayList<Put> puts = new ArrayList<Put>();
>> >
>> > Put p1 = new Put(Bytes.toBytes(0));
>> > p1.add(family, column, Bytes.toBytes(1));
>> > puts.add(p1);
>> >
>> > Put p2 = new Put(Bytes.toBytes(0));
>> > p2.add(family, column, Bytes.toBytes(2));
>> > puts.add(p2);
>> >
>> > table.put(puts);
>> >
>> > Thanks
>> > Karthik
>> >
>> >
>> >
>> > On Tue, Jul 27, 2010 at 9:25 AM, Jean-Daniel Cryans <[email protected]
>> >wrote:
>> >
>> >> TableOutputFormat is really just a wrapper around a HTable, see for
>> >> yourself
>> >>
>> http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
>> >>
>> >> So there must be something else about the way you use it, or the way
>> >> you use HTable directly. Showing bits of your code could be helpful.
>> >>
>> >> J-D
>> >>
>> >> On Mon, Jul 26, 2010 at 11:17 PM, Karthik Kambatla
>> >> <[email protected]> wrote:
>> >> > Hi
>> >> >
>> >> > I am experiencing a few problems with TableMapReduceUtil, where in
>> only
>> >> some
>> >> > of the puts from the reduce are written to the output table. If I
>> >> explicitly
>> >> > write to the table from within reduce without using
>> TableMapReduceUtil,
>> >> all
>> >> > the puts are written to the table.
>> >> >
>> >> > In our application, multiple puts could be on the same row. In case
>> two
>> >> puts
>> >> > are on the same key, our application requires both puts to be
>> committed
>> >> as
>> >> > two different versions.
>> >> >
>> >> > Am I missing something here? Is there a cleaner way to approach this
>> >> issue?
>> >> >
>> >> > Thanks for the help.
>> >> >
>> >> > Karthik
>> >> >
>> >>
>> >
>>
>

Reply via email to