[
https://issues.apache.org/jira/browse/PIG-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597801#action_12597801
]
Pi Song commented on PIG-159:
-----------------------------
Comments on v8:-
1) Double doesn't work at all when you create a typed complex constant. To fix
this, add "(dataType == DOUBLE) ||" in DataType.isAtomic
2) In LOCross.getSchema, you concat all the fields from
"Collection<LogicalOperator> pred = mPlan.getPredecessors(this);". This is not
good because mPlan.getPredecessors doesn't preserve the order. Use mInputs in
stead.
3) Here is my query:-
{noformat}
a = load 'a' as (field1: integer, field2: long);
b = load 'a' as (field1: bytearray, field2: double);
c = group a by field1, b by field1 ;
{noformat}
When I parse using the latest query parser, I've got:-
{noformat}
(74:LOCogroup={group: integer,a: {field1: integer,field2: long},b: {field1:
bytearray,field2: double}}==>80)
<COGroup Inner Plan>
(72:LOProject=integer==>TERMINAL)
<COGroup Inner Plan>
(73:LOProject=bytearray==>81)
(81:LOCast=integer==>TERMINAL)
(80:LOForEach={field2: double,bytearray}==>TERMINAL)
<ForEach Inner Plan>
(79:LOGenerate=(field2: double,bytearray)==>TERMINAL)
<Generate Inner Plan>
(76:LOProject=double==>TERMINAL)
<Generate Inner Plan>
(77:LOProject=(field1: integer,field2: long)==>78)
(78:LOUserFunc=bytearray==>TERMINAL)
{noformat}
I don't know where LOUserFunc comes from.
4) The way LOProject is used seems a bit weird to me. I found that when you do
someting like this:-
{noformat}
c = group a by field1, b by field1 ;
d = foreach c generate grp, a.(field1, field2), b.(field1, field2) ;
{noformat}
you will have in generate's inner plans:-
Project(0 sentinel=true )
Project(0,1 sentinel=false)
Project(0,1 sentinel=false)
The second and the third are the same. Because you use projects to select
columns from inner bags, they don't contain information to refer back to the
columns those bags come from!! By having mSentinal seems to make it more
difficult to understand because Project now has a few different meanings
1)Actual Projection 2) Bridging between plans. Isn't it better to introduce a
new LO to work as sentinel?
5) I think it's time to think about aggregate function in foreach generate. We
just have to add List<AggregateApec> in either Foreach or Generate (which one
is better I'm not sure but ForEach seems to handle more whole bag things so
seems more suitable to me)
{noformat}
class AggregateApec {
AggregateOperator agg ;
int col ;
}
{noformat}
6) Nested expressions in COGroup doesn't work. For example:-
{noformat}
c = cogroup a by (field1+field2)*field1, b by field1 ;
{noformat}
will throw an error message because the parser thinks "(" is the beginning of
tuple. Maybe we just need more lookahead?
> Make changes to the parser to support new types functionality
> -------------------------------------------------------------
>
> Key: PIG-159
> URL: https://issues.apache.org/jira/browse/PIG-159
> Project: Pig
> Issue Type: Sub-task
> Components: impl
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: parser_chages_v5.patch, parser_chages_v6.patch,
> parser_chages_v7.patch, parser_chages_v8.patch, parser_chages_v9.patch
>
>
> In order to support the new types functionality described in
> http://wiki.apache.org/pig/PigTypesFunctionalSpec, the parse needs to change
> in the following ways:
> 1) AS needs to support types in addition to aliases. So where previously it
> was legal to say:
> a = load 'myfile' as a, b, c;
> it will now also be legal to say
> a = load 'myfile' as a integer, b float, c chararray;
> 2) Non string constants need to be supported. This includes non-string
> atomic types (integer, long, float, double) and the non-atomic types bags,
> tuples, and maps.
> 3) A cast operator needs to be added so that fields can be explicitly casted.
> 4) Changes to DEFINE, to allow users to declare arguments and return types
> for UDFs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.