Thanks to a bug fix put in by a colleague of mine, merge joins work for
tables loaded into pig via HBaseStorage. In our test environment and in the
test environment for pig itself, I'm able to get all sorts of fairly
complex data merging without issue.

However, when I use that same code on larger data sets in a production
environment, the merge join fails. If I run it on the same exact tables on
the same cluster after trimming the data down to just a few rows, the merge
join works fine.

Here is the most basic I've been able to get the pig script. I've been
taking out pieces and parts trying to narrow it down but it still fails:



If I change the count portion to a limit 5 or something, I'm able to dump
the relation.

The merge join finishes all of its mappers, but when it gets to the reduce
step and starts doing a sort (don't ask me why it's even doing a sort on
pre-sorted data), it throws the following error:

2016-03-09 19:36:01,738 WARN [main]
org.apache.hadoop.mapred.YarnChild: Exception running child :
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error
while doing final merge
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassCastException:
org.apache.pig.backend.hadoop.hbase.TableSplitComparable cannot be
cast to org.apache.hadoop.hbase.mapreduce.TableSplit
        at 
org.apache.pig.backend.hadoop.hbase.TableSplitComparable.compareTo(TableSplitComparable.java:26)
        at org.apache.pig.data.DataType.compare(DataType.java:566)
        at org.apache.pig.data.DataType.compare(DataType.java:464)
        at 
org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareDatum(BinInterSedes.java:1106)
        at 
org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:1082)
        at 
org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:787)
        at 
org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:728)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTupleSortComparator.compare(PigTupleSortComparator.java:100)
        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:587)
        at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:128)
        at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:55)
        at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:678)
        at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:596)
        at org.apache.hadoop.mapred.Merger.merge(Merger.java:131)
        at org.apache.hadoop.mapred.Merger.merge(Merger.java:115)
        at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:722)
        at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370)
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158)



If I switch the order of the two relations in the merge join, I get a
different error which appears more promising, but I still don't know what
to do about it:

2016-03-09 19:55:24,789 WARN [main]
org.apache.hadoop.mapred.YarnChild: Exception running child :
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: c: Local
Rearrange[tuple]{chararray}(false) - scope-334 Operator Key:
scope-334): org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Error while executing ForEach at [c[62,4]]
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:291)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:279)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
0: Error while executing ForEach at [c[62,4]]
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
        ... 12 more
Caused by: java.lang.NullPointerException
        at 
org.apache.pig.impl.builtin.DefaultIndexableLoader.seekNear(DefaultIndexableLoader.java:190)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:542)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextTuple(POMergeJoin.java:299)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNextTuple(POPreCombinerLocalRearrange.java:126)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)


Again, I've tried replicating the exact scenario (and more complicated
ones) in local environments and I can't get it to fail. I think it's
related to yarn/mapreduce, but I can't figure out why that would matter or
what it's really doing.

I'm trying to set up the e2e (end to end) tests in the pig repo, but I'm
not having any luck there, either. If I can't get a test failure, I'm
afraid I'm not going to be able to fix the bug or issue.

Can anyone help point me in the right direction as far as next debugging
steps or what might be wrong?


William Watson
Lead Software Engineer

Reply via email to