[jira] [Created] (HIVE-4011) Sort Merge Join does not kick-in
Amir Youssefi created HIVE-4011: --- Summary: Sort Merge Join does not kick-in Key: HIVE-4011 URL: https://issues.apache.org/jira/browse/HIVE-4011 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.9.0 Environment: Linux Reporter: Amir Youssefi After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables). Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats. More details: set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select /*+ MAPJOIN(l) */ l.stock_price_open lo, r.stock_price_open ro from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte) where ... DDL: (both tables) PARTITIONED BY (year string) CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' also made sure we had: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally
[ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Youssefi updated HIVE-4011: Attachment: SMJ-JIRA-4011.txt > Sort Merge Join runs locally > > > Key: HIVE-4011 > URL: https://issues.apache.org/jira/browse/HIVE-4011 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.9.0, 0.10.0 > Environment: Linux >Reporter: Amir Youssefi > Labels: joins, mapjoin > Attachments: SMJ-JIRA-4011.txt > > > After required settings to get Sort Merge Join, it does not kick-in and falls > back to MapJoin with a local first step (on two bucketed and partitioned > tables). > Ran into the issue on Hive 0.9 at large scale to make sure issue persists I > ran it on Hive 0.10 with sample public data and regular storage Formats. > More details: > set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > set hive.optimize.bucketmapjoin = true; > set hive.optimize.bucketmapjoin.sortedmerge = true; > select /*+ MAPJOIN(l) */ > l.stock_price_open lo, > r.stock_price_open ro > from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and > l.stock_symbol = r.stock_symbol and l.dte=r.dte) > where ... > DDL: > (both tables) > PARTITIONED BY (year string) > CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' > also made sure we had: > set hive.enforce.bucketing=true; > set hive.enforce.sorting=true; > Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally
[ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Youssefi updated HIVE-4011: Summary: Sort Merge Join runs locally (was: Sort Merge Join does not kick-in and runs locally) > Sort Merge Join runs locally > > > Key: HIVE-4011 > URL: https://issues.apache.org/jira/browse/HIVE-4011 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.9.0, 0.10.0 > Environment: Linux >Reporter: Amir Youssefi > Labels: joins, mapjoin > > After required settings to get Sort Merge Join, it does not kick-in and falls > back to MapJoin with a local first step (on two bucketed and partitioned > tables). > Ran into the issue on Hive 0.9 at large scale to make sure issue persists I > ran it on Hive 0.10 with sample public data and regular storage Formats. > More details: > set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > set hive.optimize.bucketmapjoin = true; > set hive.optimize.bucketmapjoin.sortedmerge = true; > select /*+ MAPJOIN(l) */ > l.stock_price_open lo, > r.stock_price_open ro > from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and > l.stock_symbol = r.stock_symbol and l.dte=r.dte) > where ... > DDL: > (both tables) > PARTITIONED BY (year string) > CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' > also made sure we had: > set hive.enforce.bucketing=true; > set hive.enforce.sorting=true; > Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4011) Sort Merge Join does not kick-in and runs locally
[ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Youssefi updated HIVE-4011: Summary: Sort Merge Join does not kick-in and runs locally (was: Sort Merge Join does not kick-in) > Sort Merge Join does not kick-in and runs locally > - > > Key: HIVE-4011 > URL: https://issues.apache.org/jira/browse/HIVE-4011 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.9.0, 0.10.0 > Environment: Linux >Reporter: Amir Youssefi > Labels: joins, mapjoin > > After required settings to get Sort Merge Join, it does not kick-in and falls > back to MapJoin with a local first step (on two bucketed and partitioned > tables). > Ran into the issue on Hive 0.9 at large scale to make sure issue persists I > ran it on Hive 0.10 with sample public data and regular storage Formats. > More details: > set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > set hive.optimize.bucketmapjoin = true; > set hive.optimize.bucketmapjoin.sortedmerge = true; > select /*+ MAPJOIN(l) */ > l.stock_price_open lo, > r.stock_price_open ro > from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and > l.stock_symbol = r.stock_symbol and l.dte=r.dte) > where ... > DDL: > (both tables) > PARTITIONED BY (year string) > CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' > also made sure we had: > set hive.enforce.bucketing=true; > set hive.enforce.sorting=true; > Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira