My example of a combined tuple should have A and not $-NT or $NT, and same for the map:
(A, 1L, 2L, 6L, 0L, 1L) (A, 1L#1L, 2L#2L, 3L#6L, 5L#1L) On May 5, 2010, at 5:06 PM, Greg Langmead wrote: > At an intermediate point in my processing, I have these tuples: > > DUMP X; > (A,1L,1L) > (A,2L,2L) > (A,3L,6L) > (A,5L,1L) > > The middle element of these tuples can have any integer value from 1-5, and > the third element can have any positive integer value. (These data points > mean, for example for the third tuple, "I saw 6 distinct words that started > with the letter A that occurred 3 times each.") My problem is that to do the > math I need to do next, I need to know that there were 0 words that occurred > 4 times, so I need to group these four tuples into one record that permits me > to ask "what is the value that goes with 1, ... what is the value that goes > with 5". > > I could stream these through a script and do what I want, but I'm new to Pig > and I'd like to explore what can be done strictly within Pig. > > Maybe I could gather these into a tuple, but with a 0 at the position for 4: > > ($-NT,1L,2L,6L,0L,1L) > > or else somehow generate a map from this: > > ($NT, 1L#1L, 2L#2L, 3L#6L, 5L#1L) > > which would also alert me to the absence of 4L. Can I do either of these > things? > > Thanks, > Greg Langmead > Research Scientist > Language Weaver, Inc.
