Right now there is no UDF that converts a bag of tuples into a map. But
you can always write one :-) 

Thanks,
-Richard
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Gianmarco
Sent: Thursday, May 06, 2010 10:53 AM
To: [email protected]
Subject: Re: Help identifying missing value

Is it possible to generate a map inside a foreach?


Something like :

 a = load 'input' USING PigStorage(',') AS
(l:chararray,n1:long,n2:long);
 b = group a BY l;
 c = foreach b { ones = filter a BY n1 == 1; GENERATE FLATTEN([1#ones])
;};

(Of course this does not compile, but I didn' t manage to generate even
a
simple map like [1#2] in a programmatic way, so there must be something
wrong with my approach)


Gianmarco




On Thu, May 6, 2010 at 19:13, Richard Ding <[email protected]> wrote:

> Using group by and foreach you can get tuples like this:
>
> (A, {(1L,1L),(2L,2L),(3L,6L),(5L,1L)})
>
> By counting the number of tuples in the bag, you can then find the
> missing values.
>
> Here is the script:
>
> L = load 'X' using PigStorage(',') as (a:chararray, b:long, c:long);
> G = group L by a;
> F = foreach G { O = order L by b; generate group, O.(b, c); }
> dump F
>
> Thanks
> -Richard
>
> -----Original Message-----
> From: Greg Langmead [mailto:[email protected]]
> Sent: Wednesday, May 05, 2010 2:15 PM
> To: [email protected]
> Subject: Re: Help identifying missing value
>
> My example of a combined tuple should have A and not $-NT or $NT, and
> same for the map:
>
> (A, 1L, 2L, 6L, 0L, 1L)
>
> (A, 1L#1L, 2L#2L, 3L#6L, 5L#1L)
>
> On May 5, 2010, at 5:06 PM, Greg Langmead wrote:
>
> > At an intermediate point in my processing, I have these tuples:
> >
> > DUMP X;
> > (A,1L,1L)
> > (A,2L,2L)
> > (A,3L,6L)
> > (A,5L,1L)
> >
> > The middle element of these tuples can have any integer value from
> 1-5, and the third element can have any positive integer value. (These
> data points mean, for example for the third tuple, "I saw 6 distinct
> words that started with the letter A that occurred 3 times each.") My
> problem is that to do the math I need to do next, I need to know that
> there were 0 words that occurred 4 times, so I need to group these
four
> tuples into one record that permits me to ask "what is the value that
> goes with 1, ... what is the value that goes with 5".
> >
> > I could stream these through a script and do what I want, but I'm
new
> to Pig and I'd like to explore what can be done strictly within Pig.
> >
> > Maybe I could gather these into a tuple, but with a 0 at the
position
> for 4:
> >
> > ($-NT,1L,2L,6L,0L,1L)
> >
> > or else somehow generate a map from this:
> >
> > ($NT, 1L#1L, 2L#2L, 3L#6L, 5L#1L)
> >
> > which would also alert me to the absence of 4L. Can I do either of
> these things?
> >
> > Thanks,
> > Greg Langmead
> > Research Scientist
> > Language Weaver, Inc.
>
>

Reply via email to