Re: Any better way to ensure unicity ?

Vincent Barat Tue, 13 Jul 2010 00:32:54 -0700


Le 12/07/10 16:56, Mridul Muralidharan a écrit :

I am not sure what you mean here exactly.
Will a sid row have multiple (different) values for the otherfields ?

Yes.

But if you want to pick any one row for a given sid, then I thinkwhat you have below might be good enough (you can omit the lastline though).

OK. Thanks. The last line is used to retrieve the exact same datastructure and naming as the original table. This way, I canoptionally perform this treatment without modifying my code. If youknow a better way...


Cheers,


Regards,
Mridul



On Monday 12 July 2010 06:53 PM, Vincent Barat wrote:

   Hello everybody,

I have a simple table containing sessions. Each sessions has an
unique key (the sid, which is actually a uuid).
But a session can be present several times in my input table.

I want to ensure that I only have 1 record for each sid (because I
perform subsequent JOIN based on this sid).

Currently I use the following script, but I wonder if there is
something more efficient:

sessions = GROUP sessions BY sid;
sessions = FOREACH sessions { first = LIMIT sessions 1; GENERATE
FLATTEN(first);};
sessions = FOREACH sessions GENERATE sid, .. and all the fields I
have in the session table...

Do you see any optimization I can do, especially on the FLATTEN /
GENERATE part ?

Thank you very much for your help.

Re: Any better way to ensure unicity ?

Reply via email to