The goal is to compute all the elements of the set. Let there be a set, S =
{a,b,c}. I took S to be the row id and each element to be a qualifier value.
So each element goes under a new column within the 'S' row. My plan was to
use something like the AllColumnValueFilter on 'S' (obtained by using
Row/Prefix Filter), and if the result is empty insert newly computed value
(element) or else ignore the value.This computation of elements happens in several iterations and it stops only when no further value can be inserted into the set (i.e. all the values which can be computed have been computed). That is why duplicates cannot be present. Regards, Raghava. On Tue, Jun 1, 2010 at 1:53 PM, Jonathan Gray <[email protected]> wrote: > To implement a set, you would need to do a check. > > For your application, can you explain more specifically what the behavior > is when you attempt to insert a duplicate into the set? > > > -----Original Message----- > > From: Raghava Mutharaju [mailto:[email protected]] > > Sent: Tuesday, June 01, 2010 10:49 AM > > To: [email protected] > > Subject: Re: question on Filtering and checkAndPut() > > > > Hi JG, > > > > There cannot be duplicate insertions because in my case, a row > > represents > > a set and the qualifier values represent each element of the set. So > > whenever I insert a value, I have to check whether the value already > > exists. > > A new values goes under a new qualifier. Do you think this is an > > appropriate > > schema design? > > > > Regards, > > Raghava. > > > > On Tue, Jun 1, 2010 at 1:12 PM, Jonathan Gray <[email protected]> > > wrote: > > > > > Do you expect a very high percentage to be duplicates or just some? > > > > > > An alternate approach is to just perform the insertions. Writes are > > faster > > > than reads, so sometimes it's best to just insert. This will create > > an > > > additional version but if you aren't relying on versions then will > > have > > > little impact. > > > > > > If a majority of stuff will be duplicate, then maybe consider > > something > > > different. Just remember that requiring reads before each write is > > going to > > > significantly slow everything down. > > > > > > > -----Original Message----- > > > > From: Raghava Mutharaju [mailto:[email protected]] > > > > Sent: Tuesday, June 01, 2010 9:49 AM > > > > To: [email protected] > > > > Subject: Re: question on Filtering and checkAndPut() > > > > > > > > Thank you JG. > > > > > > > > >>> is checking if the values there are the same as the ones you > > are > > > > trying > > > > to insert? > > > > Yes, that is right. I am doing this because there could be > > > > duplicate > > > > values generated. In the current iteration of MR, I could generate > > a > > > > value > > > > which was already present in that row/qualifier combination (it is > > > > sufficient if the value be in any column). > > > > > > > > Regards, > > > > Raghava. > > > > > > > > On Tue, Jun 1, 2010 at 12:32 PM, Jonathan Gray <[email protected]> > > > > wrote: > > > > > > > > > And for checkAndPut, from the javadoc: > > > > > > > > > > "Atomically checks if a row/family/qualifier value match the > > > > expectedValue. > > > > > If it does, it adds the put." > > > > > > > > > > This can be used a number of ways. It sounds like what you're > > > > describing > > > > > is checking if the values there are the same as the ones you are > > > > trying to > > > > > insert? This wouldn't make much sense, why would you re-insert > > the > > > > same > > > > > value? You specify a row, family, qualifier, and value. You > > also > > > > specify a > > > > > Put. > > > > > > > > > > checkAndPut is an example of an atomic operation. I may want to > > only > > > > > insert certain data if the value I expect is there at the time I > > am > > > > > inserting. Think about updating account balances, state > > transitions, > > > > data > > > > > processing, etc. You may read some data at an earlier point in > > time, > > > > do > > > > > some processing, and then insert. When you do the insert, you > > only > > > > want it > > > > > to happen if something else hasn't gone in during your process > > time > > > > and > > > > > modified the data that was there. > > > > > > > > > > JG > > > > > > > > > > > -----Original Message----- > > > > > > From: Raghava Mutharaju [mailto:[email protected]] > > > > > > Sent: Tuesday, June 01, 2010 1:47 AM > > > > > > To: [email protected] > > > > > > Cc: [email protected] > > > > > > Subject: question on Filtering and checkAndPut() > > > > > > > > > > > > Hi all, > > > > > > > > > > > > Can the following type of value filter be possible -- > > Within a > > > > > > row, > > > > > > irrespective of the columns (qualifiers), the presence of a > > value > > > > > > should be > > > > > > checked. If that value is present then the row along with all > > the > > > > > > columns > > > > > > should be fetched. > > > > > > > > > > > > SingleColumnValueFilter requires the we specify the name of the > > > > > > qualifier > > > > > > but here I would like to check the value across all the > > qualifiers > > > > of > > > > > > the > > > > > > row. ValueFilter can be used but it does not return all the > > columns > > > > if > > > > > > there > > > > > > is a match - it only returns the matched column along with the > > row. > > > > So > > > > > > I > > > > > > want something which is a mix of both. Is this possible? > > > > > > > > > > > > Can someone please explain the functionality of checkAndPut() > > > > method in > > > > > > HTable? I couldn't get it from the api doc. When I came across > > this > > > > > > method, > > > > > > my guess was that it would check for duplicate values -- for > > the > > > > given > > > > > > (row, > > > > > > family, qualifier) combination whether the given value is same > > as > > > > the > > > > > > value > > > > > > mentioned in put (for the same combination). > > > > > > > > > > > > Thank you. > > > > > > > > > > > > Regards, > > > > > > Raghava. > > > > > > > > >
