Right now there is no UDF that converts a bag of tuples into a map. But you can always write one :-)
Thanks, -Richard -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Gianmarco Sent: Thursday, May 06, 2010 10:53 AM To: [email protected] Subject: Re: Help identifying missing value Is it possible to generate a map inside a foreach? Something like : a = load 'input' USING PigStorage(',') AS (l:chararray,n1:long,n2:long); b = group a BY l; c = foreach b { ones = filter a BY n1 == 1; GENERATE FLATTEN([1#ones]) ;}; (Of course this does not compile, but I didn' t manage to generate even a simple map like [1#2] in a programmatic way, so there must be something wrong with my approach) Gianmarco On Thu, May 6, 2010 at 19:13, Richard Ding <[email protected]> wrote: > Using group by and foreach you can get tuples like this: > > (A, {(1L,1L),(2L,2L),(3L,6L),(5L,1L)}) > > By counting the number of tuples in the bag, you can then find the > missing values. > > Here is the script: > > L = load 'X' using PigStorage(',') as (a:chararray, b:long, c:long); > G = group L by a; > F = foreach G { O = order L by b; generate group, O.(b, c); } > dump F > > Thanks > -Richard > > -----Original Message----- > From: Greg Langmead [mailto:[email protected]] > Sent: Wednesday, May 05, 2010 2:15 PM > To: [email protected] > Subject: Re: Help identifying missing value > > My example of a combined tuple should have A and not $-NT or $NT, and > same for the map: > > (A, 1L, 2L, 6L, 0L, 1L) > > (A, 1L#1L, 2L#2L, 3L#6L, 5L#1L) > > On May 5, 2010, at 5:06 PM, Greg Langmead wrote: > > > At an intermediate point in my processing, I have these tuples: > > > > DUMP X; > > (A,1L,1L) > > (A,2L,2L) > > (A,3L,6L) > > (A,5L,1L) > > > > The middle element of these tuples can have any integer value from > 1-5, and the third element can have any positive integer value. (These > data points mean, for example for the third tuple, "I saw 6 distinct > words that started with the letter A that occurred 3 times each.") My > problem is that to do the math I need to do next, I need to know that > there were 0 words that occurred 4 times, so I need to group these four > tuples into one record that permits me to ask "what is the value that > goes with 1, ... what is the value that goes with 5". > > > > I could stream these through a script and do what I want, but I'm new > to Pig and I'd like to explore what can be done strictly within Pig. > > > > Maybe I could gather these into a tuple, but with a 0 at the position > for 4: > > > > ($-NT,1L,2L,6L,0L,1L) > > > > or else somehow generate a map from this: > > > > ($NT, 1L#1L, 2L#2L, 3L#6L, 5L#1L) > > > > which would also alert me to the absence of 4L. Can I do either of > these things? > > > > Thanks, > > Greg Langmead > > Research Scientist > > Language Weaver, Inc. > >
