At an intermediate point in my processing, I have these tuples:

DUMP X;
(A,1L,1L)
(A,2L,2L)
(A,3L,6L)
(A,5L,1L)

The middle element of these tuples can have any integer value from 1-5, and the 
third element can have any positive integer value. (These data points mean, for 
example for the third tuple, "I saw 6 distinct words that started with the 
letter A that occurred 3 times each.") My problem is that to do the math I need 
to do next, I need to know that there were 0 words that occurred 4 times, so I 
need to group these four tuples into one record that permits me to ask "what is 
the value that goes with 1, ... what is the value that goes with 5".

I could stream these through a script and do what I want, but I'm new to Pig 
and I'd like to explore what can be done strictly within Pig.

Maybe I could gather these into a tuple, but with a 0 at the position for 4:

($-NT,1L,2L,6L,0L,1L)

or else somehow generate a map from this:

($NT, 1L#1L, 2L#2L, 3L#6L, 5L#1L)

which would also alert me to the absence of 4L. Can I do either of these things?

Thanks,
Greg Langmead
Research Scientist
Language Weaver, Inc.

Reply via email to