Hello everybody,
I have a simple table containing sessions. Each sessions has an
unique key (the sid, which is actually a uuid).
But a session can be present several times in my input table.
I want to ensure that I only have 1 record for each sid (because I
perform subsequent JOIN based on this sid).
Currently I use the following script, but I wonder if there is
something more efficient:
sessions = GROUP sessions BY sid;
sessions = FOREACH sessions { first = LIMIT sessions 1; GENERATE
FLATTEN(first);};
sessions = FOREACH sessions GENERATE sid, .. and all the fields I
have in the session table...
Do you see any optimization I can do, especially on the FLATTEN /
GENERATE part ?
Thank you very much for your help.