[
https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shravan Matthur Narayanamurthy updated PIG-430:
-----------------------------------------------
Status: Patch Available (was: Open)
I have fixed part of the problem that addresses the project issue. The issue
mentioned in distinct still remains. The problem here is that we see that
projects are being introduced into the input of distinct which creates a unique
case where the projection chaining will not work. The problem is similar to the
one where you can assign a nested project to a variable inside a nested block.
This has been solved by replacing the nested project with a foreach statement.
The solution to the distinct problem should be something similar where the
input to the distinct can also be a nested project. I made a local change by
replacing BaseEvalSpec by NestedProject in my code for this and it works.
However, I don't want to mess up something because I am not completely aware of
the side-effects of changing this in the parser. Its better if someone more
comfortable with the parser took a look at this one.
Also, I think there are some issues with the parsing of nested things. I tried
the following and the parser just doesn't terminate the nested block waiting
and keeps waiting for more input:
A = load 'file';
B = group A by $0;
C = foreach B { C1=distinct "const"; generate C1;}
I was clueless as to why this is happening but I tried this because I thought
that the input to a nested distinct shouldn't be BaseEvalSpec which can
FuncEvalSpecs and Constants. I think we need to change things a bit here.
> Projections in nested filter and inside foreach do not work
> -----------------------------------------------------------
>
> Key: PIG-430
> URL: https://issues.apache.org/jira/browse/PIG-430
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Santhosh Srinivasan
> Assignee: Shravan Matthur Narayanamurthy
> Fix For: types_branch
>
> Attachments: 430-1.patch
>
>
> The following queries do not work:
> Nested filter:
> a = load 'studenttab10k' as (name, age, gpa);
> b = filter a by age < 20;
> c = group b by age;
> d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co
> = order cd by $0; generate group, flatten(co); }
> store d into 'output';
> Nested Distinct:
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> b = group a by name;
> c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
> store c into 'output';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.