[ 
https://issues.apache.org/jira/browse/PIG-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635886#action_12635886
 ] 

Alan Gates commented on PIG-335:
--------------------------------

Comments:

This new method of tracking lineage of data through the script is complex.  It 
would be good to add a couple of paragraphs to the class level comments in 
Schema.java describing how it works.

LOProject, around line 206, you added mFieldSchema.setParent(null, 
expressionOperator).  If I understand the code correctly this is the case where 
you are projecting star from a relational operator.  Why is the parent 
canonical name null in this case?  And what are the ramifications of that?

If the user writes a query like:

A = load 'Alpha' using MyLoadFunc;
B = load 'Beta' using TheirLoadFunc;
C = cogroup A by $0, B by $0;
D = foreach c generate group + 1;

they will get "Found more than one load function interface to use: MyLoadFunc, 
TheirLoadFunc" as an error message.  That doesn't make clear what the issue is 
(of course there's more than one load func interface, I gave you two load 
funcs!).  Something like:  "Cannot resolve load function to use for casting $0 
to integer, two possibilities:  MyLoadFunc, TheirLoadFunc" would be much more 
helpful.

Same with some of the other error messages that just mention load func 
interface.  They should at the very least mention that they're trying to find 
the right cast to use.

In TypeCheckingVisitor.getLoadFunc(LogicalOperator, String) I see a list of 
relational operators (Filter, etc.).  But I don't see Cogroup, Union, or Cross 
in that list.  How are you tracing data that comes through those operators?  
Those are the ones with the special case, where if the load functions match we 
know how to do the cast, and if they don't match we don't know.  But I don't 
see where they're tracing the lineage of their data.

> Casting does not work in certain cases with multiple loads
> ----------------------------------------------------------
>
>                 Key: PIG-335
>                 URL: https://issues.apache.org/jira/browse/PIG-335
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Santhosh Srinivasan
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: PIG_335.patch, PIG_335_1.patch
>
>
> Given a script like:
> A = load 'bla' as (x, y) using Loader1();
> B = load 'morebla' as (s, t) using Loader2();
> C = cogroup A by x, B by s;
> D = foreach C generate flatten(A), flatten(B);
> E = foreach D generate x, y, t + 1;
> In this case, in the last foreach, a cast will need to be added to t + 1 to 
> allow t (a byte array) to be added to an integer.  We use load functions to 
> handle this late casting.  The issue is that we do not currently have a way 
> to know whether to use Loader1 or Loader2 to cast the data.  We need to track 
> the lineage of fields so that the cast operator can select the correct loader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to