RE: question on Filtering and checkAndPut()

Jonathan Gray Tue, 01 Jun 2010 10:13:23 -0700

Do you expect a very high percentage to be duplicates or just some?

An alternate approach is to just perform the insertions.  Writes are faster 
than reads, so sometimes it's best to just insert.  This will create an 
additional version but if you aren't relying on versions then will have little 
impact.


If a majority of stuff will be duplicate, then maybe consider something 
different.  Just remember that requiring reads before each write is going to 
significantly slow everything down.

> -----Original Message-----
> From: Raghava Mutharaju [mailto:[email protected]]
> Sent: Tuesday, June 01, 2010 9:49 AM
> To: [email protected]
> Subject: Re: question on Filtering and checkAndPut()
> 
> Thank you JG.
> 
> >>> is checking if the values there are the same as the ones you are
> trying
> to insert?
>         Yes, that is right. I am doing this because there could be
> duplicate
> values generated. In the current iteration of MR, I could generate a
> value
> which was already present in that row/qualifier combination (it is
> sufficient if the value be in any column).
> 
> Regards,
> Raghava.
> 
> On Tue, Jun 1, 2010 at 12:32 PM, Jonathan Gray <[email protected]>
> wrote:
> 
> > And for checkAndPut, from the javadoc:
> >
> > "Atomically checks if a row/family/qualifier value match the
> expectedValue.
> > If it does, it adds the put."
> >
> > This can be used a number of ways.  It sounds like what you're
> describing
> > is checking if the values there are the same as the ones you are
> trying to
> > insert?  This wouldn't make much sense, why would you re-insert the
> same
> > value?  You specify a row, family, qualifier, and value.  You also
> specify a
> > Put.
> >
> > checkAndPut is an example of an atomic operation.  I may want to only
> > insert certain data if the value I expect is there at the time I am
> > inserting.  Think about updating account balances, state transitions,
> data
> > processing, etc.  You may read some data at an earlier point in time,
> do
> > some processing, and then insert.  When you do the insert, you only
> want it
> > to happen if something else hasn't gone in during your process time
> and
> > modified the data that was there.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Raghava Mutharaju [mailto:[email protected]]
> > > Sent: Tuesday, June 01, 2010 1:47 AM
> > > To: [email protected]
> > > Cc: [email protected]
> > > Subject: question on Filtering and checkAndPut()
> > >
> > > Hi all,
> > >
> > >      Can the following type of value filter be possible -- Within a
> > > row,
> > > irrespective of the columns (qualifiers), the presence of a value
> > > should be
> > > checked. If that value is present then the row along with all the
> > > columns
> > > should be fetched.
> > >
> > > SingleColumnValueFilter requires the we specify the name of the
> > > qualifier
> > > but here I would like to check the value across all the qualifiers
> of
> > > the
> > > row. ValueFilter can be used but it does not return all the columns
> if
> > > there
> > > is a match - it only returns the matched column along with the row.
> So
> > > I
> > > want something which is a mix of both. Is this possible?
> > >
> > > Can someone please explain the functionality of checkAndPut()
> method in
> > > HTable? I couldn't get it from the api doc. When I came across this
> > > method,
> > > my guess was that it would check for duplicate values -- for the
> given
> > > (row,
> > > family, qualifier) combination whether the given value is same as
> the
> > > value
> > > mentioned in put (for the same combination).
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Raghava.
> >

RE: question on Filtering and checkAndPut()

Reply via email to