[jira] Commented: (PIG-880) Order by is borken with complex fields

Pradeep Kamath (JIRA) Fri, 10 Jul 2009 16:21:41 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729882#action_12729882
 ]


Pradeep Kamath commented on PIG-880:
------------------------------------

The root cause of this issue is that in interpreting map data, PigStorage 
returns values in the map to be of the type that it deduces based on the data. 
So string data for values are returned as String, integer values are returned 
as Integer. However the logical layer in Pig assumes the type of the values in 
the map to be ByteArray since it cannot assume any type. If one of the sampled 
values forming the quantile list is a null, it is assumed to be of type of the 
reduce key of the final order by job. In this case, since the order by key is 
smap#'name', it is thought to be of type ByteArray. However the values 
resulting from the map lookup are actually of type String.  This mismatch 
results in the above exception - if nulls are filtered out, map.collect() fails 
because hadoop thinks the map key type is bytearray but it gets a Text (string).

A proposal to fix this is to Change TextDataParser which is used by PigStorage 
for reading map data to return ByteArray type for the values in the map.

Thoughts?



> Order by is borken with complex fields
> --------------------------------------
>
>                 Key: PIG-880
>                 URL: https://issues.apache.org/jira/browse/PIG-880
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Olga Natkovich
>             Fix For: 0.4.0
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;            
> s = order f by $0;           
> store s into 'sc.out'         
> Stack:
> Caused by: java.lang.ArrayStoreException
>         at java.lang.System.arraycopy(Native Method)
>         at java.util.Arrays.copyOf(Arrays.java:2763)
>         at java.util.ArrayList.toArray(ArrayList.java:305)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
>         ... 5 more
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
>         at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
>         at org.apache.pig.PigServer.execute(PigServer.java:762)
>         at org.apache.pig.PigServer.access$100(PigServer.java:91)
>         at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
>         at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
>         at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-880) Order by is borken with complex fields

Reply via email to