To implement a set, you would need to do a check. For your application, can you explain more specifically what the behavior is when you attempt to insert a duplicate into the set?
> -----Original Message----- > From: Raghava Mutharaju [mailto:[email protected]] > Sent: Tuesday, June 01, 2010 10:49 AM > To: [email protected] > Subject: Re: question on Filtering and checkAndPut() > > Hi JG, > > There cannot be duplicate insertions because in my case, a row > represents > a set and the qualifier values represent each element of the set. So > whenever I insert a value, I have to check whether the value already > exists. > A new values goes under a new qualifier. Do you think this is an > appropriate > schema design? > > Regards, > Raghava. > > On Tue, Jun 1, 2010 at 1:12 PM, Jonathan Gray <[email protected]> > wrote: > > > Do you expect a very high percentage to be duplicates or just some? > > > > An alternate approach is to just perform the insertions. Writes are > faster > > than reads, so sometimes it's best to just insert. This will create > an > > additional version but if you aren't relying on versions then will > have > > little impact. > > > > If a majority of stuff will be duplicate, then maybe consider > something > > different. Just remember that requiring reads before each write is > going to > > significantly slow everything down. > > > > > -----Original Message----- > > > From: Raghava Mutharaju [mailto:[email protected]] > > > Sent: Tuesday, June 01, 2010 9:49 AM > > > To: [email protected] > > > Subject: Re: question on Filtering and checkAndPut() > > > > > > Thank you JG. > > > > > > >>> is checking if the values there are the same as the ones you > are > > > trying > > > to insert? > > > Yes, that is right. I am doing this because there could be > > > duplicate > > > values generated. In the current iteration of MR, I could generate > a > > > value > > > which was already present in that row/qualifier combination (it is > > > sufficient if the value be in any column). > > > > > > Regards, > > > Raghava. > > > > > > On Tue, Jun 1, 2010 at 12:32 PM, Jonathan Gray <[email protected]> > > > wrote: > > > > > > > And for checkAndPut, from the javadoc: > > > > > > > > "Atomically checks if a row/family/qualifier value match the > > > expectedValue. > > > > If it does, it adds the put." > > > > > > > > This can be used a number of ways. It sounds like what you're > > > describing > > > > is checking if the values there are the same as the ones you are > > > trying to > > > > insert? This wouldn't make much sense, why would you re-insert > the > > > same > > > > value? You specify a row, family, qualifier, and value. You > also > > > specify a > > > > Put. > > > > > > > > checkAndPut is an example of an atomic operation. I may want to > only > > > > insert certain data if the value I expect is there at the time I > am > > > > inserting. Think about updating account balances, state > transitions, > > > data > > > > processing, etc. You may read some data at an earlier point in > time, > > > do > > > > some processing, and then insert. When you do the insert, you > only > > > want it > > > > to happen if something else hasn't gone in during your process > time > > > and > > > > modified the data that was there. > > > > > > > > JG > > > > > > > > > -----Original Message----- > > > > > From: Raghava Mutharaju [mailto:[email protected]] > > > > > Sent: Tuesday, June 01, 2010 1:47 AM > > > > > To: [email protected] > > > > > Cc: [email protected] > > > > > Subject: question on Filtering and checkAndPut() > > > > > > > > > > Hi all, > > > > > > > > > > Can the following type of value filter be possible -- > Within a > > > > > row, > > > > > irrespective of the columns (qualifiers), the presence of a > value > > > > > should be > > > > > checked. If that value is present then the row along with all > the > > > > > columns > > > > > should be fetched. > > > > > > > > > > SingleColumnValueFilter requires the we specify the name of the > > > > > qualifier > > > > > but here I would like to check the value across all the > qualifiers > > > of > > > > > the > > > > > row. ValueFilter can be used but it does not return all the > columns > > > if > > > > > there > > > > > is a match - it only returns the matched column along with the > row. > > > So > > > > > I > > > > > want something which is a mix of both. Is this possible? > > > > > > > > > > Can someone please explain the functionality of checkAndPut() > > > method in > > > > > HTable? I couldn't get it from the api doc. When I came across > this > > > > > method, > > > > > my guess was that it would check for duplicate values -- for > the > > > given > > > > > (row, > > > > > family, qualifier) combination whether the given value is same > as > > > the > > > > > value > > > > > mentioned in put (for the same combination). > > > > > > > > > > Thank you. > > > > > > > > > > Regards, > > > > > Raghava. > > > > > >
