[jira] Updated: (PIG-334) Sorting on fields of type double does not work

Alan Gates (JIRA) Fri, 25 Jul 2008 15:42:23 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alan Gates updated PIG-334:
---------------------------

    Attachment: doublesort.patch

This takes the file from HADOOP-3061 and adds it to the pig data so we can use 
a double as a key.

I also, at Pi's request, moved the hadoop->pig data type translation functions 
from data.DataType to backend.hadoop.HDataType.

This does not however fully resolve the sorting issue.  Sorting on any type of 
declared type returns 

java.io.IOException: Type mismatch in key from map: expected 
org.apache.hadoop.io.BytesWritable, recieved 
org.apache.pig.backend.hadoop.DoubleWritable
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:83)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:122)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)

>From looking at the explain plan, it looks like the project schema for the 
>local rearrange is set to bytearray instead of the
correct type.


> Sorting on fields of type double does not work
> ----------------------------------------------
>
>                 Key: PIG-334
>                 URL: https://issues.apache.org/jira/browse/PIG-334
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: doublesort.patch
>
>
> In the new pipeline, when possible, pig uses hadoop writable comparable types 
> for the hadoop key rather than tuple.  As of hadoop 0.17 there is no 
> DoubleWritable type.  It has been added for hadoop 0.18.  But it appears that 
> we will be ready to integrate the types branch back into trunk before hadoop 
> 0.18 is released.  So we need to implement a DoubleWritable for ourselves 
> until that time.
> The code can be taken from HADOOP-3061.  The code where we convert to and 
> from hadoop types (DataType.getWritableComparableTypes and convertToPigType) 
> needs to be changed to use this type.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-334) Sorting on fields of type double does not work

Reply via email to