[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1605.
-----------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Release audit warning is due to jdiff. No new file added. Patch committed to 
both trunk and 0.8 branch.

> Adding soft link to plan to solve input file dependency
> -------------------------------------------------------
>
>                 Key: PIG-1605
>                 URL: https://issues.apache.org/jira/browse/PIG-1605
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1605-1.patch, PIG-1605-2.patch
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to