[
https://issues.apache.org/jira/browse/PIG-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605781#action_12605781
]
Olga Natkovich commented on PIG-272:
------------------------------------
Arun, helped to diagnose the problem. The issue is that the following sequence
B = stream A through CMD;
store B into 'B1';
kicks in the optimization and as the result store users BinaryStorage to write
the results of the first job.
When the second job starts to run, it realizes that it can reuse the results
and tries to load them also using BinaryStorage which is wrong and causes
exceptions since the tuples don't have structure expected by the second script.
The solution is to attach the original store function to the materialized
results; however, the code changes for it are quite ugly.
> Failure running complex script with streaming
> ---------------------------------------------
>
> Key: PIG-272
> URL: https://issues.apache.org/jira/browse/PIG-272
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Arun C Murthy
>
> The following script fails (stack is further down):
> define CMD `perl identity.pl`;
> define CMD1 `perl identity.pl`;
> A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> B = stream A through CMD;
> store B into 'B1';
> C = stream B through CMD1;
> D = JOIN B by name, C by name;
> store D into 'D1';
> If I remove the intermediate store, the script works fine. Also if I replace
> streaming commands with other operators such as filter and foreach, it works
> even with the intermediate store.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.