[
https://issues.apache.org/jira/browse/PIG-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606546#action_12606546
]
Arun C Murthy commented on PIG-272:
-----------------------------------
Sigh, attaching the original store function isn't enough.
The problem is that currently Pig re-runs the entire job and doesn't use the
existing results on HDFS for the JOIN in the above example. When that happens
the StreamingCommand's output-spec is still setup as 'BinaryStorage' and
results in this error.
> Failure running complex script with streaming
> ---------------------------------------------
>
> Key: PIG-272
> URL: https://issues.apache.org/jira/browse/PIG-272
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Arun C Murthy
>
> The following script fails (stack is further down):
> define CMD `perl identity.pl`;
> define CMD1 `perl identity.pl`;
> A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> B = stream A through CMD;
> store B into 'B1';
> C = stream B through CMD1;
> D = JOIN B by name, C by name;
> store D into 'D1';
> If I remove the intermediate store, the script works fine. Also if I replace
> streaming commands with other operators such as filter and foreach, it works
> even with the intermediate store.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.