[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800609#action_12800609
 ] 

Ankur commented on PIG-1191:
----------------------------

Listed below are the identified cases. 

CASE 1: LOAD -> FILTER -> FOREACH -> LIMIT -> STORE
===================================================

SCRIPT
-----------
sds = LOAD '/my/data/location'
      USING my.org.MyMapLoader()
      AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
queries = FILTER sds BY mapFields#'page_params'#'query' is NOT NULL;
queries_rand = FOREACH queries
               GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS 
query_string;
queries_limit = LIMIT queries_rand 100;
STORE queries_limit INTO 'out'; 

RESULT 
------------
FAILS in reduce stage with the following exception

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string.
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)


CASE 2: LOAD -> FOREACH -> FILTER -> LIMIT -> STORE
===================================================
Note that FILTER and FOREACH order is reversed

SCRIPT
-----------
sds = LOAD '/my/data/location'
      USING my.org.MyMapLoader()
      AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
queries_rand = FOREACH sds
               GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS 
query_string;
queries = FILTER queries_rand BY query_string IS NOT null;
queries_limit = LIMIT queries 100; 
STORE queries_limit INTO 'out';

RESULT
-----------
SUCCESS - Results are correctly stored. So if a projection is done before 
FILTER it recieves the LoadFunc in the POCast
operator and everything is cool.


CASE 3: LOAD -> FOREACH -> FOREACH -> FILTER -> LIMIT -> STORE
==============================================================

SCRIPT
-----------
ds = LOAD '/my/data/location'
      USING my.org.MyMapLoader()
      AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE 
          (map[]) (mapFields#'page_params') AS params;
queries = FOREACH params
          GENERATE (CHARARRAY) (params#'query') AS query_string;
queries_filtered = FILTER queries
                   BY query_string IS NOT null;
queries_limit = LIMIT queries_filtered 100;
STORE queries_limit INTO 'out';

RESULT
-----------
FAILS in Map stage. Looks like the 2nd FOREACH did not get the loadFunc and 
bailed out with following stack trace

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string. at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at

CASE 4: LOAD -> FOREACH -> FOREACH -> LIMIT -> STORE
====================================================

SCRIPT
-----------
sds = LOAD '/my/data/location'
      USING my.org.MyMapLoader()
      AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE
          (map[]) (mapFields#'page_params') AS params;
queries = FOREACH params
          GENERATE (CHARARRAY) (params#'query') AS query_string;
queries_limit = LIMIT queries 100;
STORE queries_limit INTO 'out';

RESULT
-----------
SUCCESS. The two FOREACH seem to be getting the loadFunc. 

CASE 5: LOAD -> FOREACH -> FOREACH -> FOREACH -> LIMIT -> STORE
================================================================

SCRIPT
-----------
ds = LOAD '/my/data/location'
      USING my.org.MyMapLoader()
      AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE
          (map[]) (mapFields#'page_params') AS params;
queries = FOREACH params
          GENERATE (CHARARRAY) (params#'query') AS query_string;
rand_queries = FOREACH queries GENERATE query_string as query;
queries_limit = LIMIT rand_queries 100;
STORE rand_queries INTO 'out';

RESULT
-----------
FAILS in map stage. Again the poor second FOREACH seems to be bailing out with 
stack trace

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string. at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 
 

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> ---------------------------------------------------------------------
>
>                 Key: PIG-1191
>                 URL: https://issues.apache.org/jira/browse/PIG-1191
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Priority: Blocker
>         Attachments: PIG-1191-1.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> <actual-type>
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to