Hijack away. I would be curious as to the reason we need this as well. 2010/12/6 Anze <[email protected]>
> > Sorry to hijack your question, Jonathan, but while we are at it... :) > > Is there a way to tell Pig NOT to add "base_alias::"? Almost half my code > consists of FOREACH... GENERATE that just remove these prefixes. > > Thanks, > > Anze > > On Monday 06 December 2010, Daniel Dai wrote: > > After join, cross, foreach flatten, Pig will automatically add > > "base_alias::" prefix. All other cases use "." > > > > Daniel > > > > Jonathan Coveney wrote: > > > It's very hard to search for this among the docs because it's so > generic, > > > so I thought I'd ask... I'm sure the answer is painfully easy. > > > > > > Taking a look at this code that I found online, for example > > > > > > -- > > > -- Read in a bag of tuples (timeseries for this example) and divide the > > > -- numeric column by its maximum. > > > -- > > > %default DATABAG 'data/timeseries.tsv' > > > > > > data = LOAD '$DATABAG' AS (month:chararray, count:int); > > > accumulate = GROUP data ALL; > > > calc_max = FOREACH accumulate GENERATE FLATTEN(data), > > > MAX(data.count) AS max_count; > > > normalize = FOREACH calc_max GENERATE data::month AS month, > > > data::count AS count, (float)data::count / (float)max_count AS > > > normed_count; > > > DUMP normalize; > > > > > > What purpose does data::month serve versus data.count? > > > > > > Thanks > >
