Bag items are casted to unexpected types?

daga Sat, 10 Jan 2009 09:28:50 -0800

I often meet errors with chararray bag items.  It seems that a bag
item can be casted to some other type rather than specified chararray
type.  May be it's just before becoming a true chararray value.  But
it can produce strange errors.


I suppose that there is a try to recognize bag item type somewhere in
deserializer, right?  So why the user specified type is not used
directly.  And what are the symbols that a string should not have to
be not casted to other type?

The latest issue with bags:

a = load 'a' as (word: chararray, length: long, phrases: bag{t:
tuple(id: chararray)});
b = order a by word;
store b into 'b';

It gives lots of errors like:

2009-01-10 20:01:49,507 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
- Error message from task (map)
task_200901101933_0007_m_000077java.lang.RuntimeException: Unexpected
data type 116 found in stream.
        at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:117)
        at org.apache.pig.builtin.BinStorage.getNext(BinStorage.java:90)
        at 
org.apache.pig.impl.builtin.RandomSampleLoader.getNext(RandomSampleLoader.java:44)
        at 
org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:101)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:157)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:133)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)

The number in "Unexpected data type 116 found in stream" message varies.

2009-01-10 20:01:49,507 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
- Error message from task (map)
task_200901101933_0007_m_000078java.lang.OutOfMemoryError: Java heap
space
        at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:105)
        at org.apache.pig.builtin.BinStorage.getNext(BinStorage.java:90)
        at 
org.apache.pig.impl.builtin.RandomSampleLoader.getNext(RandomSampleLoader.java:44)
        at 
org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:101)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:157)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:133)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)

The data files are not very large and mapred.child.java.opts options
is -Xmx2048m.

If column 'phrases' is filtered out before ordering, everything is ok.

What is wrong with my bags usage?

Thanks.

Bag items are casted to unexpected types?

Reply via email to