Re: TableMapReduceUtil parallel puts to same row

Karthik Kambatla Tue, 27 Jul 2010 16:15:27 -0700

Oh! I was of the opinion that the time stamps depend on when the region
server corresponding to that cell receives it.


Further, I have tried locking the row before committing the put. I get an
UnknownRowLock exception when trying to unlock. What does rowLock() return
when it fails to acquire a lock on the row?

Thanks
Karthik

On Tue, Jul 27, 2010 at 3:37 PM, Jean-Daniel Cryans <[email protected]>wrote:

> If 2 reducers output the same cell at the same time, it could have the
> same timestamp.
>
> J-D
>
> On Tue, Jul 27, 2010 at 3:34 PM, Karthik Kambatla
> <[email protected]> wrote:
> > When I use my own TableOutputFormat with the setAutoFlush set to true, we
> > still observe loosing one or two commits. There seems to be some other
> issue
> > too here.
> >
> > Thanks
> > Karthik
> >
> > On Tue, Jul 27, 2010 at 10:18 AM, Jean-Daniel Cryans <
> [email protected]>wrote:
> >
> >> Since they are in the same batch, they could end up on the same
> >> timestamp and one will hide the other. When not batched, there's
> >> always a few milliseconds between the two Puts so it ends up ok. So
> >> for your case it seems like you need to use your own HTable without
> >> the write buffer since it's enforced in TOF, that means that the
> >> overall throughput will be lower.
> >>
> >> J-D
> >>
> >> On Tue, Jul 27, 2010 at 9:57 AM, Karthik Kambatla
> >> <[email protected]> wrote:
> >> > Hi Jean
> >> >
> >> > I looked at the TableMapReduceUtil code and I implemented my own
> version
> >> of
> >> > TableOutputFormat to find and isolate the problem.
> >> >
> >> > In TableOutputFormat, table.setAutoFlush(true) is called so the writes
> >> can
> >> > be batch-written. In our case, there are multiple puts on the same row
> in
> >> > the batch and only few of them are getting committed. I removed that
> line
> >> in
> >> > MyOutputFormat, and most of the commits go through.
> >> >
> >> > What is the expected behavior in the following case?
> >> >
> >> > ArrayList<Put> puts = new ArrayList<Put>();
> >> >
> >> > Put p1 = new Put(Bytes.toBytes(0));
> >> > p1.add(family, column, Bytes.toBytes(1));
> >> > puts.add(p1);
> >> >
> >> > Put p2 = new Put(Bytes.toBytes(0));
> >> > p2.add(family, column, Bytes.toBytes(2));
> >> > puts.add(p2);
> >> >
> >> > table.put(puts);
> >> >
> >> > Thanks
> >> > Karthik
> >> >
> >> >
> >> >
> >> > On Tue, Jul 27, 2010 at 9:25 AM, Jean-Daniel Cryans <
> [email protected]
> >> >wrote:
> >> >
> >> >> TableOutputFormat is really just a wrapper around a HTable, see for
> >> >> yourself
> >> >>
> >>
> http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
> >> >>
> >> >> So there must be something else about the way you use it, or the way
> >> >> you use HTable directly. Showing bits of your code could be helpful.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Mon, Jul 26, 2010 at 11:17 PM, Karthik Kambatla
> >> >> <[email protected]> wrote:
> >> >> > Hi
> >> >> >
> >> >> > I am experiencing a few problems with TableMapReduceUtil, where in
> >> only
> >> >> some
> >> >> > of the puts from the reduce are written to the output table. If I
> >> >> explicitly
> >> >> > write to the table from within reduce without using
> >> TableMapReduceUtil,
> >> >> all
> >> >> > the puts are written to the table.
> >> >> >
> >> >> > In our application, multiple puts could be on the same row. In case
> >> two
> >> >> puts
> >> >> > are on the same key, our application requires both puts to be
> >> committed
> >> >> as
> >> >> > two different versions.
> >> >> >
> >> >> > Am I missing something here? Is there a cleaner way to approach
> this
> >> >> issue?
> >> >> >
> >> >> > Thanks for the help.
> >> >> >
> >> >> > Karthik
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: TableMapReduceUtil parallel puts to same row

Reply via email to