If one uses meaningful names then Pig would never use '::' anyway. The problem is when you use multiple joins in sequence, then '::' names get very annoying. But that's just my opinion. :)
Anze On Tuesday 07 December 2010, Jonathan Coveney wrote: > Would that even be much better? It seems like it'd be better to have it be > consistent in appending the whatever::, so that at least you have to be > cognizant of it when you do the join. If it starts being too clever, then > it's up to you to figure out when it does and doesn't do it which might be > annoying. > > 2010/12/7 Anze <[email protected]> > > > I understand the reason for this, it just seems like a drastic solution. > > :) > > > > Ideally, Pig should be clever enough to detect ambiguity and deal with > > it, and > > leave the non-conflicting names intact. For instance: > > > > A = load 'foo' as (x, y, z); > > B = load 'bar' as (x, a, b, c); > > C = join A by x, B by x; > > DESCRIBE C; > > C: {A::x, y, z, B::x, a, b, c} > > > > or even: > > C: {x, y, z, B::x, a, b, c} > > > > or even a step further, in case of JOIN: > > C: {x, y, z, a, b, c} > > (since join *joins* by x, why would there be two? This doesn't always > > work for > > other operations, of course) > > > > Reasoning: at least in my cases the names are descriptive from the start, > > therefore there are almost no name conflicts. In rare cases where there > > are Pig can determine that and use old syntax with "::", then let me > > deal with it. > > > > I know this is backwards-incompatible change and is not likely to be > > accepted, > > but still... :) > > > > Anze > > > > On Monday 06 December 2010, Alan Gates wrote: > > > The reason it's needed is that ambiguities would result otherwise. > > > > > > A = load 'foo' as (x, y, z); > > > B = load 'bar' as (w, x, y, z); > > > C = join A by x, B by x; > > > D = filter C by z > 0; -- which z? > > > > > > As long as the name is not ambiguous, the :: is not required. So in > > > the above example it would be perfectly legal to say > > > > > > D = filter C by w > 0; > > > > > > Out of curiosity, why do you want to remove the :: names? > > > > > > Alan. > > > > > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote: > > > > Hijack away. I would be curious as to the reason we need this as > > > > well. > > > > > > > > 2010/12/6 Anze <[email protected]> > > > > > > > >> Sorry to hijack your question, Jonathan, but while we are at it... > > > >> :) > > > >> > > > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half > > > >> my code > > > >> consists of FOREACH... GENERATE that just remove these prefixes. > > > >> > > > >> Thanks, > > > >> > > > >> Anze > > > >> > > > >> On Monday 06 December 2010, Daniel Dai wrote: > > > >>> After join, cross, foreach flatten, Pig will automatically add > > > >>> "base_alias::" prefix. All other cases use "." > > > >>> > > > >>> Daniel > > > >>> > > > >>> Jonathan Coveney wrote: > > > >>>> It's very hard to search for this among the docs because it's so > > > >> > > > >> generic, > > > >> > > > >>>> so I thought I'd ask... I'm sure the answer is painfully easy. > > > >>>> > > > >>>> Taking a look at this code that I found online, for example > > > >>>> > > > >>>> -- > > > >>>> -- Read in a bag of tuples (timeseries for this example) and > > > >>>> divide the > > > >>>> -- numeric column by its maximum. > > > >>>> -- > > > >>>> %default DATABAG 'data/timeseries.tsv' > > > >>>> > > > >>>> data = LOAD '$DATABAG' AS (month:chararray, count:int); > > > >>>> accumulate = GROUP data ALL; > > > >>>> calc_max = FOREACH accumulate GENERATE FLATTEN(data), > > > >>>> MAX(data.count) AS max_count; > > > >>>> normalize = FOREACH calc_max GENERATE data::month AS month, > > > >>>> data::count AS count, (float)data::count / (float)max_count AS > > > >>>> normed_count; > > > >>>> DUMP normalize; > > > >>>> > > > >>>> What purpose does data::month serve versus data.count? > > > >>>> > > > >>>> Thanks
