[ 
https://issues.apache.org/jira/browse/PIG-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665095#action_12665095
 ] 

Shravan Matthur Narayanamurthy commented on PIG-616:
----------------------------------------------------

I don't think introducing schema in the physical side is a good way out. What 
we probably need is something like nested cast operators for complex types.

> Casts to complex types do not work as expected
> ----------------------------------------------
>
>                 Key: PIG-616
>                 URL: https://issues.apache.org/jira/browse/PIG-616
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> When we specify a (complex) type as a column in Pig, the TypeCastInserter 
> inserts the appropriate cast for the (complex) type. However, in the 
> implementation of POCast.java, when databyte arrays are converted to the 
> (complex) types, we invoke the bytesToXXX method. 
> For complex types, especially tuples and bags, we do not enforce the typing 
> information specified by the user in the AS clause or with the explicit cast 
> statement. The implementation solely relies on bytesToXXX to figure out the 
> right type.
> An example of a query that fails is given below. Wrt the query, the data is a 
> single column that is a bag of integers. The user specifies this bag to be a 
> bag of chararray. This conversion is allowed in pig but the implementation 
> does not perform the actual cast. Instead the bytesToBag is called on the 
> stream. The resulting type is a bag of integers and not a bag of chararray. 
> In the subsequent statement the user (correctly) assumes that the conversion 
> has been performed but in reality it has not been done. At run time when a 
> chararray based operation is performed we see a ClassCastException.
> The notion of a schema has is absent in the physical operators. The 
> schema/fieldSchema in the logical layer has to be passed on to the physical 
> layer. The schema can be used to perform additional operations like casting, 
> etc.
> {code}
> grunt> cat bag.data
> {(1)}
> grunt> a = load 'bag.data' as (b:{t:(c:chararray)});
> grunt> b = foreach a generate flatten(b);
> grunt> c = foreach b generate CONCAT('Hello ', $0);
> grunt> dump c;
> 2009-01-12 10:44:44,417 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-01-12 10:45:09,439 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2009-01-12 10:45:09,440 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Job failed!
> 2009-01-12 10:45:09,443 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) 
> task_200812151518_9681_m_000000java.lang.ClassCastException: 
> java.lang.Integer cannot be cast to java.lang.String
>         at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:37)
>         at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:31)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:259)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:271)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:187)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:175)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> ...
> 2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1066: Unable to open iterator for alias c
> 2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator 
> for alias c
>         at org.apache.pig.PigServer.openIterator(PigServer.java:426)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:271)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
>         at org.apache.pig.Main.main(Main.java:302)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>         at org.apache.pig.PigServer.openIterator(PigServer.java:420)
>         ... 5 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to