yeah... when hive.auto.convert.join.noconditionaltask=false; we cannot merge multiple Map-joins into a single Map phase... I think that trunk should work when hive.auto.convert.join.noconditionaltask=true.
Thanks, Yin On Wed, Sep 4, 2013 at 4:03 PM, Nathanial Thelen <n...@natethelen.com>wrote: > Doing: > > set hive.auto.convert.join.noconditionaltask=false; > > > makes it work (though it does way more map reduce jobs than it should). > When I get some time I will test against the latest trunk. > > Thanks, > Nate > > > On Sep 3, 2013, at 6:09 PM, Yin Huai <huaiyin....@gmail.com> wrote: > > Based on the log, it may be also related to > https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a > not very optimized way), can you try "set > hive.auto.convert.join.noconditionaltask=false;" ? If you still get the > error, give "set hive.auto.convert.join=false;" a try (it will turn off map > join auto convert, so you will use reduce-side join). > > Thanks, > > Yin > > > On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <hashut...@apache.org>wrote: > >> Not sure about EMR. Your best bet is to ask on EMR forums. >> >> Thanks, >> Ashutosh >> >> >> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <n...@natethelen.com>wrote: >> >>> Is there a way to run a patch on EMR? >>> >>> Thanks, >>> Nate >>> >>> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <hashut...@apache.org> >>> wrote: >>> >>> Fix in very related area has been checked in trunk today : >>> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix >>> your issue. >>> Can you try latest trunk? >>> >>> Ashutosh >>> >>> >>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <n...@natethelen.com>wrote: >>> >>>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I >>>> have been getting NullPointerExceptions (NPE) for certain queries in our >>>> staging environment. Only difference between stage and production is the >>>> amount of traffic we get so the data set is much smaller. We are not using >>>> any custom code. >>>> >>>> I have greatly simplified the query down to the bare minimum that will >>>> cause the error: >>>> >>>> SELECT >>>> count(DISTINCT ag.adGroupGuid) as groups, >>>> count(DISTINCT av.adViewGuid) as ads, >>>> count(DISTINCT ac.adViewGuid) as uniqueClicks >>>> FROM >>>> adgroup ag >>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid >>>> >>>> This will return the following before any Map Reduce jobs start: >>>> >>>> FAILED: NullPointerException null >>>> >>>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and >>>> scanning, I see this error: >>>> >>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 >>>> length: 94324 file count: 20 directory count: 1 >>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: >>>> 142609 file count: 21 directory count: 1 >>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 >>>> length: 65519 file count: 21 directory count: 1 >>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: >>>> 205096 file count: 20 directory count: 1 >>>> 2013-09-03 18:09:19,800 INFO >>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where >>>> optimization is applicable >>>> 2013-09-03 18:09:19,801 INFO >>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table >>>> scans >>>> 2013-09-03 18:09:19,801 INFO >>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where >>>> optimization is applicable >>>> 2013-09-03 18:09:19,801 INFO >>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table >>>> scans >>>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver >>>> (SessionState.java:printError(386)) - FAILED: NullPointerException null >>>> java.lang.NullPointerException >>>> at >>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) >>>> at >>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) >>>> at >>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) >>>> at >>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) >>>> at >>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) >>>> at >>>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) >>>> at >>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) >>>> at >>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) >>>> at >>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) >>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) >>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) >>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) >>>> at >>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) >>>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) >>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) >>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) >>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:187) >>>> >>>> The same error also happens if I do an INNER JOIN to adclick, FYI. >>>> >>>> I have checked that there are not any null values for any of the >>>> columns referenced in the query. >>>> >>>> Making almost any changes to the query results it in successfully >>>> running. Here are some I have tried: >>>> >>>> Removed JOIN to adgroup: >>>> >>>> SELECT >>>> count(DISTINCT av.adViewGuid) as ads, >>>> count(DISTINCT ac.adViewGuid) as uniqueClicks >>>> FROM >>>> adview av >>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >>>> >>>> WORKS: >>>> 561 6 >>>> >>>> Removed JOIN to adclick: >>>> >>>> SELECT >>>> count(DISTINCT ag.adGroupGuid) as groups, >>>> count(DISTINCT av.adViewGuid) as ads >>>> FROM >>>> adgroup ag >>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid; >>>> >>>> WORKS: >>>> 543 561 >>>> >>>> Removing DISTINCT from any of the 3 counts >>>> >>>> SELECT >>>> count(DISTINCT ag.adGroupGuid) as groups, >>>> count(DISTINCT av.adViewGuid) as ads, >>>> count(ac.adViewGuid) as uniqueClicks >>>> FROM >>>> adgroup ag >>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >>>> >>>> WORKS: >>>> 543 561 7 >>>> >>>> >>>> SELECT >>>> count(ag.adGroupGuid) as groups, >>>> count(DISTINCT av.adViewGuid) as ads, >>>> count(DISTINCT ac.adViewGuid) as uniqueClicks >>>> FROM >>>> adgroup ag >>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >>>> >>>> WORKS: >>>> 562 561 6 >>>> >>>> >>>> SELECT >>>> count(DISTINCT ag.adGroupGuid) as groups, >>>> count(av.adViewGuid) as ads, >>>> count(DISTINCT ac.adViewGuid) as uniqueClicks >>>> FROM >>>> adgroup ag >>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >>>> >>>> WORKS: >>>> 543 562 6 >>>> >>>> I am not exactly sure what to do next. Thoughts? >>>> >>>> Nate >>>> >>> >>> >>> >> > >