[jira] [Created] (HIVE-4011) Sort Merge Join does not kick-in

2013-02-12 Thread Amir Youssefi (JIRA)
Amir Youssefi created HIVE-4011:
---

 Summary: Sort Merge Join does not kick-in
 Key: HIVE-4011
 URL: https://issues.apache.org/jira/browse/HIVE-4011
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.0
 Environment: Linux
Reporter: Amir Youssefi


After required settings to get Sort Merge Join, it does not kick-in and falls 
back to MapJoin with a local first step (on two bucketed and partitioned 
tables).

Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran 
it on Hive 0.10 with sample public data and regular storage Formats.

More details:

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

select /*+ MAPJOIN(l) */
l.stock_price_open lo,
r.stock_price_open ro
from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
l.stock_symbol = r.stock_symbol and l.dte=r.dte)
where ...

DDL:

(both tables)
PARTITIONED BY (year string)
CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'


also made sure we had:

set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;

Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally

2013-02-12 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated HIVE-4011:


Attachment: SMJ-JIRA-4011.txt

> Sort Merge Join runs locally
> 
>
> Key: HIVE-4011
> URL: https://issues.apache.org/jira/browse/HIVE-4011
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 0.10.0
> Environment: Linux
>Reporter: Amir Youssefi
>  Labels: joins, mapjoin
> Attachments: SMJ-JIRA-4011.txt
>
>
> After required settings to get Sort Merge Join, it does not kick-in and falls 
> back to MapJoin with a local first step (on two bucketed and partitioned 
> tables).
> Ran into the issue on Hive 0.9 at large scale to make sure issue persists I 
> ran it on Hive 0.10 with sample public data and regular storage Formats.
> More details:
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> select /*+ MAPJOIN(l) */
> l.stock_price_open lo,
> r.stock_price_open ro
> from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
> l.stock_symbol = r.stock_symbol and l.dte=r.dte)
> where ...
> DDL:
> (both tables)
> PARTITIONED BY (year string)
> CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> also made sure we had:
> set hive.enforce.bucketing=true;
> set hive.enforce.sorting=true;
> Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally

2013-02-12 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated HIVE-4011:


Summary: Sort Merge Join runs locally  (was: Sort Merge Join does not 
kick-in and runs locally)

> Sort Merge Join runs locally
> 
>
> Key: HIVE-4011
> URL: https://issues.apache.org/jira/browse/HIVE-4011
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 0.10.0
> Environment: Linux
>Reporter: Amir Youssefi
>  Labels: joins, mapjoin
>
> After required settings to get Sort Merge Join, it does not kick-in and falls 
> back to MapJoin with a local first step (on two bucketed and partitioned 
> tables).
> Ran into the issue on Hive 0.9 at large scale to make sure issue persists I 
> ran it on Hive 0.10 with sample public data and regular storage Formats.
> More details:
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> select /*+ MAPJOIN(l) */
> l.stock_price_open lo,
> r.stock_price_open ro
> from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
> l.stock_symbol = r.stock_symbol and l.dte=r.dte)
> where ...
> DDL:
> (both tables)
> PARTITIONED BY (year string)
> CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> also made sure we had:
> set hive.enforce.bucketing=true;
> set hive.enforce.sorting=true;
> Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4011) Sort Merge Join does not kick-in and runs locally

2013-02-12 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated HIVE-4011:


Summary: Sort Merge Join does not kick-in and runs locally  (was: Sort 
Merge Join does not kick-in)

> Sort Merge Join does not kick-in and runs locally
> -
>
> Key: HIVE-4011
> URL: https://issues.apache.org/jira/browse/HIVE-4011
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 0.10.0
> Environment: Linux
>Reporter: Amir Youssefi
>  Labels: joins, mapjoin
>
> After required settings to get Sort Merge Join, it does not kick-in and falls 
> back to MapJoin with a local first step (on two bucketed and partitioned 
> tables).
> Ran into the issue on Hive 0.9 at large scale to make sure issue persists I 
> ran it on Hive 0.10 with sample public data and regular storage Formats.
> More details:
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> select /*+ MAPJOIN(l) */
> l.stock_price_open lo,
> r.stock_price_open ro
> from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
> l.stock_symbol = r.stock_symbol and l.dte=r.dte)
> where ...
> DDL:
> (both tables)
> PARTITIONED BY (year string)
> CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> also made sure we had:
> set hive.enforce.bucketing=true;
> set hive.enforce.sorting=true;
> Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira