[ 
https://issues.apache.org/jira/browse/PIG-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605781#action_12605781
 ] 

Olga Natkovich commented on PIG-272:
------------------------------------

Arun, helped to diagnose the problem. The issue is that the following sequence

B = stream A through CMD;
store B into 'B1';

kicks in the optimization and as the result store users BinaryStorage to write 
the results of the first job.

When the second job starts to run, it realizes that it can reuse the results 
and tries to load them also using BinaryStorage which is wrong and causes 
exceptions since the tuples don't have structure expected by the second script.

The solution is to attach the original store function to the materialized 
results; however, the code changes for it are quite ugly.

> Failure running complex script with streaming
> ---------------------------------------------
>
>                 Key: PIG-272
>                 URL: https://issues.apache.org/jira/browse/PIG-272
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Arun C Murthy
>
> The following script fails (stack is further down):
> define CMD `perl identity.pl`;
> define CMD1 `perl identity.pl`;
> A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> B = stream A through CMD;
> store B into 'B1';
> C = stream B through CMD1;
> D = JOIN B by name, C by name;
> store D into 'D1';
> If I remove the intermediate store, the script works fine. Also if I replace 
> streaming commands with other operators such as filter and foreach, it works 
> even with the intermediate store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to