Hello everybody,

I have a simple table containing sessions. Each sessions has an unique key (the sid, which is actually a uuid).
But a session can be present several times in my input table.

I want to ensure that I only have 1 record for each sid (because I perform subsequent JOIN based on this sid).

Currently I use the following script, but I wonder if there is something more efficient:

sessions = GROUP sessions BY sid;
sessions = FOREACH sessions { first = LIMIT sessions 1; GENERATE FLATTEN(first);}; sessions = FOREACH sessions GENERATE sid, .. and all the fields I have in the session table...

Do you see any optimization I can do, especially on the FLATTEN / GENERATE part ?

Thank you very much for your help.

Reply via email to