[
https://issues.apache.org/jira/browse/PIG-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-334:
---------------------------
Attachment: doublesort.patch
This takes the file from HADOOP-3061 and adds it to the pig data so we can use
a double as a key.
I also, at Pi's request, moved the hadoop->pig data type translation functions
from data.DataType to backend.hadoop.HDataType.
This does not however fully resolve the sorting issue. Sorting on any type of
declared type returns
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.BytesWritable, recieved
org.apache.pig.backend.hadoop.DoubleWritable
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:83)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:122)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
>From looking at the explain plan, it looks like the project schema for the
>local rearrange is set to bytearray instead of the
correct type.
> Sorting on fields of type double does not work
> ----------------------------------------------
>
> Key: PIG-334
> URL: https://issues.apache.org/jira/browse/PIG-334
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Alan Gates
> Assignee: Alan Gates
> Priority: Critical
> Fix For: types_branch
>
> Attachments: doublesort.patch
>
>
> In the new pipeline, when possible, pig uses hadoop writable comparable types
> for the hadoop key rather than tuple. As of hadoop 0.17 there is no
> DoubleWritable type. It has been added for hadoop 0.18. But it appears that
> we will be ready to integrate the types branch back into trunk before hadoop
> 0.18 is released. So we need to implement a DoubleWritable for ourselves
> until that time.
> The code can be taken from HADOOP-3061. The code where we convert to and
> from hadoop types (DataType.getWritableComparableTypes and convertToPigType)
> needs to be changed to use this type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.