My PIG script that is roughly like this:
A = LOAD input1 USING JsonLoader AS (x:map[]);
B = LOAD input2 USING JsonLoader AS (x:map[]);
A = FOREACH A GENERATE x, x#'item' AS item:chararray;
B = FOREACH B GENERATE x, x#'item' AS item:chararray;
U = UNION A, B;
DUMP U;
This leads to the following exception:
java.lang.RuntimeException: Unexpected data type -1 found in stream.
at
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
at
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
at
org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Any ideas ?
I am able to dump A and B.
-Rakesh