[ 
https://issues.apache.org/jira/browse/PIG-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-558:
-------------------------------

    Assignee: Pradeep Kamath
      Status: Patch Available  (was: Open)

The issue was that table 1 has only one column which is also the join key. Due 
to a recent optimization wherein parts of the value which are in the key would 
be omitted, this results in an empty tuple being sent as the value from 
POLocalRearrange.  The POPackage following the POLocalRearrange would look at 
metadata stored in itself to figure out how to construct the value out of the 
key if necessary. However when the POLocalRearrange is in a reduce and the 
POPackage is in the next map, the POLocalRearrange output gets written to DFS 
in BinStorage format resulting in a tuple of size 0 being written out. 
BinStorage while reading considers a tuple of size 0 to be a fatal error.

Fix:
The patch fixes BinStorage to consider a tuple of size 0 to be a valid tuple 
which is reconstructed as such. The POPackage then builds up the correct value 
from the key. The patch also has a unit test to test this.

The unit test depends on certain functions introduced in MiniCluster and 
test/org/apache/pig/test/Util.java as of the patch in PIG-580. If PIG-580 is 
not committed before this patch, then the "additional" patch 
("PIG-558-additional.patch") attached here should also be applied.

> Distinct followed by a Join results in Invalid size 0 for a tuple error
> -----------------------------------------------------------------------
>
>                 Key: PIG-558
>                 URL: https://issues.apache.org/jira/browse/PIG-558
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: table1, table2
>
>
> The following Pig script does a right outer join after the DISTINCT.
> {code}
> nonuniqtable1 = LOAD 'table1' AS (f1:chararray);
> table1 = DISTINCT nonuniqtable1;
> table2 = LOAD 'table2' AS (f1:chararray, f2:int);
> temp = COGROUP table1 BY f1 INNER, table2 BY f1;
> DESCRIBE temp;
> explain temp;
> dump temp;
> {code}
> ========================================================================================================
> It results in the following error. This is true for other join types as well.
> ========================================================================================================
> java.io.IOException: Invalid size 0 for a tuple
>       at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:57)
>       at 
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:62)
>       at org.apache.pig.builtin.BinStorage.getNext(BinStorage.java:90)
>       at 
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:103)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:157)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:133)
>       at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ========================================================================================================

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to