[ 
https://issues.apache.org/jira/browse/HIVE-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7751:
----------------------------------------------------
    Assignee:     (was: Hari Sankar Sivarama Subramaniyan)

> Mapjoin set in a non-conditional task  can fail in MR mode because of  memory 
> overhead issues
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7751
>                 URL: https://issues.apache.org/jira/browse/HIVE-7751
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>
> select sum(ss_quantity) from store_sales join store on store.s_store_sk = 
> store_sales.ss_store_sk join customer_demographics on 
> customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk join 
> customer_address on store_sales.ss_addr_sk = customer_address.ca_address_sk 
> join date_dim on store_sales.ss_sold_date_sk = date_dim.d_date_sk where 
> d_year = 2000 and ((cd_marital_status = 'M' and cd_education_status = 
> 'Advanced Degree' and ss_sales_price between 100.00 and 150.00) or 
> (cd_marital_status = 'M' and cd_education_status = 'Advanced Degree' and 
> ss_sales_price between 50.00 and 100.00) or (cd_marital_status = 'M' and 
> cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 
> 200.00)) and ((ca_country = 'United States' and ca_state in ('TX', 'OH', 
> 'TX') and ss_net_profit between 0 and 2000) or (ca_country = 'United States' 
> and ca_state in ('OR', 'MN', 'KY') and ss_net_profit between 150 and 3000) or 
> (ca_country = 'United States' and ca_state in ('VA', 'TX', 'MS') and 
> ss_net_profit between 50 and 25000));
> The above query where the data is stored as orc format can fail because we 
> convert the above join to a non-conditional task assuming that mapjoin would 
> succeed at runtime. But at runtime, the query can fail due to memory overhead 
> issues. The improvement to prevent such failures would be to use table 
> statistics instead of calling ql.exec.Utilities.getTotalInputFileSize() 
> inside the CommonJoinTaskDispatcher. This would make sure that we take better 
> decisions for MR mode. Tez on the other hand would handle such scenarios 
> better because it actaully relies on table stats to get the data size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to