At an intermediate point in my processing, I have these tuples: DUMP X; (A,1L,1L) (A,2L,2L) (A,3L,6L) (A,5L,1L)
The middle element of these tuples can have any integer value from 1-5, and the third element can have any positive integer value. (These data points mean, for example for the third tuple, "I saw 6 distinct words that started with the letter A that occurred 3 times each.") My problem is that to do the math I need to do next, I need to know that there were 0 words that occurred 4 times, so I need to group these four tuples into one record that permits me to ask "what is the value that goes with 1, ... what is the value that goes with 5". I could stream these through a script and do what I want, but I'm new to Pig and I'd like to explore what can be done strictly within Pig. Maybe I could gather these into a tuple, but with a 0 at the position for 4: ($-NT,1L,2L,6L,0L,1L) or else somehow generate a map from this: ($NT, 1L#1L, 2L#2L, 3L#6L, 5L#1L) which would also alert me to the absence of 4L. Can I do either of these things? Thanks, Greg Langmead Research Scientist Language Weaver, Inc.
