[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=447061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447061
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 17/Jun/20 05:00
Start Date: 17/Jun/20 05:00
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1124:
URL: https://github.com/apache/hive/pull/1124


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447061)
Time Spent: 1.5h  (was: 1h 20m)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  

[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=447075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447075
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 17/Jun/20 05:25
Start Date: 17/Jun/20 05:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz closed pull request #1132:
URL: https://github.com/apache/hive/pull/1132


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447075)
Time Spent: 2h  (was: 1h 50m)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  

[jira] [Assigned] (HIVE-23706) Fix nulls first sorting behavior

2020-06-16 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-23706:
-


> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23706) Fix nulls first sorting behavior

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23706:
--
Labels: pull-request-available  (was: )

> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> INSERT INTO t(a) VALUES (1, null, 3, 2, 2, 2)
> SELECT a FROM t ORDER BY a DESC NULLS FIRST
> {code}
> should return 
> {code}
> 3
> 2
> 2
> 2
> 1
> null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23706) Fix nulls first sorting behavior

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23706?focusedWorklogId=447044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447044
 ]

ASF GitHub Bot logged work on HIVE-23706:
-

Author: ASF GitHub Bot
Created on: 17/Jun/20 03:55
Start Date: 17/Jun/20 03:55
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1131:
URL: https://github.com/apache/hive/pull/1131


   Testing done:
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=order_null.q -pl itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447044)
Remaining Estimate: 0h
Time Spent: 10m

> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> INSERT INTO t(a) VALUES (1, null, 3, 2, 2, 2)
> SELECT a FROM t ORDER BY a DESC NULLS FIRST
> {code}
> should return 
> {code}
> 3
> 2
> 2
> 2
> 1
> null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-23493:
--
Attachment: (was: HIVE-23493.1.patch)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  ,year_total t_c_secyear
>  ,year_total t_w_firstyear
>  ,year_total t_w_secyear
>  where t_s_secyear.customer_id = t_s_firstyear.customer_id
>and t_s_firstyear.customer_id = t_c_secyear.customer_id
>and t_s_firstyear.customer_id = t_c_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_secyear.customer_id
>and t_s_firstyear.sale_type = 's'
>and t_c_firstyear.sale_type = 'c'
>and t_w_firstyear.sale_type = 'w'
>and t_s_secyear.sale_type = 's'
>and t_c_secyear.sale_type = 'c'
>and t_w_secyear.sale_type = 'w'
>and t_s_firstyear.dyear =  1999
>and t_s_secyear.dyear = 1999+1
>and t_c_firstyear.dyear =  1999
>

[jira] [Updated] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore

2020-06-16 Thread Dustin Koupal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Koupal updated HIVE-23707:
-
Description: 
When attempting to create a materialized view with transactions enabled, we get 
the following exception:

 
{code:java}
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to 
generate new Mapping of type 
org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
CLOB declared for field 
"org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java type 
java.lang.String cant be mapped for this datastore.ERROR : FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:Failed to generate new Mapping of type 
org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
CLOB declared for field 
"org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java type 
java.lang.String cant be mapped for this datastore.JDBC type CLOB declared for 
field "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of 
java type java.lang.String cant be mapped for this 
datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB declared 
for field "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of 
java type java.lang.String cant be mapped for this datastore. at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386)
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616)
 at 
org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59)
 at 
org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48)
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482)
 at 
org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536) 
at 
org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) 
at 
org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270)
 at 
org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889)
 at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088)
 at 
org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271)
 at 
org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760)
 at 
org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267) 
at 
org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484)
 at 
org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120)
 at 
org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
 at 
org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079)
 at 
org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
 at 
org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
 at 
org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
 at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724)
 at 
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749)
 at 
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:1308) 
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at 
com.sun.proxy.$Proxy25.createTable(Unknown Source) at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1882)
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1786)
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2035)
 at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source) at 

[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=447071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447071
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 17/Jun/20 05:22
Start Date: 17/Jun/20 05:22
Worklog Time Spent: 10m 
  Work Description: kasakrisz closed pull request #1096:
URL: https://github.com/apache/hive/pull/1096


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447071)
Time Spent: 1h 50m  (was: 1h 40m)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total 

[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=447066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447066
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 17/Jun/20 05:08
Start Date: 17/Jun/20 05:08
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1132:
URL: https://github.com/apache/hive/pull/1132


   fix commit mesage



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447066)
Time Spent: 1h 40m  (was: 1.5h)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  

[jira] [Updated] (HIVE-23467) Add a skip.trash config for HMS to skip trash when deleting external table data

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23467:
--
Labels: pull-request-available  (was: )

> Add a skip.trash config for HMS to skip trash when deleting external table 
> data
> ---
>
> Key: HIVE-23467
> URL: https://issues.apache.org/jira/browse/HIVE-23467
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sam An
>Assignee: Yu-Wen Lai
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have an auto.purge flag, which means skip trash. It can be confusing as we 
> have 'external.table.purge'='true' to indicate delete table data when this 
> tblproperties is set. 
> We should make the meaning clearer by introducing a skip trash alias/option. 
> Additionally, we shall add an alias for external.table.purge, and name it 
> external.table.autodelete, and document it more prominently, so as to 
> maintain backward compatibility, and make the meaning of auto deletion of 
> data more obvious. 
> The net effect of these 2 changes will be. If the user sets 
> 'external.table.autodelete'='true'
> the table data will be removed when table is dropped. and if 
> 'skip.trash'='true' 
> is set, HMS will not move the table data to trash folder when removing the 
> files. This will result in faster removal, especially when underlying FS is 
> S3. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23467) Add a skip.trash config for HMS to skip trash when deleting external table data

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23467?focusedWorklogId=447076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447076
 ]

ASF GitHub Bot logged work on HIVE-23467:
-

Author: ASF GitHub Bot
Created on: 17/Jun/20 05:41
Start Date: 17/Jun/20 05:41
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request #1133:
URL: https://github.com/apache/hive/pull/1133


   …ng external table data
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447076)
Remaining Estimate: 0h
Time Spent: 10m

> Add a skip.trash config for HMS to skip trash when deleting external table 
> data
> ---
>
> Key: HIVE-23467
> URL: https://issues.apache.org/jira/browse/HIVE-23467
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sam An
>Assignee: Yu-Wen Lai
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have an auto.purge flag, which means skip trash. It can be confusing as we 
> have 'external.table.purge'='true' to indicate delete table data when this 
> tblproperties is set. 
> We should make the meaning clearer by introducing a skip trash alias/option. 
> Additionally, we shall add an alias for external.table.purge, and name it 
> external.table.autodelete, and document it more prominently, so as to 
> maintain backward compatibility, and make the meaning of auto deletion of 
> data more obvious. 
> The net effect of these 2 changes will be. If the user sets 
> 'external.table.autodelete'='true'
> the table data will be removed when table is dropped. and if 
> 'skip.trash'='true' 
> is set, HMS will not move the table data to trash folder when removing the 
> files. This will result in faster removal, especially when underlying FS is 
> S3. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reopened HIVE-23493:
---

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  ,year_total t_c_secyear
>  ,year_total t_w_firstyear
>  ,year_total t_w_secyear
>  where t_s_secyear.customer_id = t_s_firstyear.customer_id
>and t_s_firstyear.customer_id = t_c_secyear.customer_id
>and t_s_firstyear.customer_id = t_c_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_secyear.customer_id
>and t_s_firstyear.sale_type = 's'
>and t_c_firstyear.sale_type = 'c'
>and t_w_firstyear.sale_type = 'w'
>and t_s_secyear.sale_type = 's'
>and t_c_secyear.sale_type = 'c'
>and t_w_secyear.sale_type = 'w'
>and t_s_firstyear.dyear =  1999
>and t_s_secyear.dyear = 1999+1
>and t_c_firstyear.dyear =  1999
>

[jira] [Resolved] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-23493.
---
Resolution: Fixed

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  ,year_total t_c_secyear
>  ,year_total t_w_firstyear
>  ,year_total t_w_secyear
>  where t_s_secyear.customer_id = t_s_firstyear.customer_id
>and t_s_firstyear.customer_id = t_c_secyear.customer_id
>and t_s_firstyear.customer_id = t_c_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_secyear.customer_id
>and t_s_firstyear.sale_type = 's'
>and t_c_firstyear.sale_type = 'c'
>and t_w_firstyear.sale_type = 'w'
>and t_s_secyear.sale_type = 's'
>and t_c_secyear.sale_type = 'c'
>and t_w_secyear.sale_type = 'w'
>and t_s_firstyear.dyear =  1999
>and t_s_secyear.dyear = 1999+1
>and 

[jira] [Updated] (HIVE-23706) Fix nulls first sorting behavior

2020-06-16 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-23706:
--
Description: 
{code}
INSERT INTO t(a) VALUES (1, null, 3, 2, 2, 2)

SELECT a FROM t ORDER BY a DESC NULLS FIRST
{code}
should return 
{code}
3
2
2
2
1
null
{code}

> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> {code}
> INSERT INTO t(a) VALUES (1, null, 3, 2, 2, 2)
> SELECT a FROM t ORDER BY a DESC NULLS FIRST
> {code}
> should return 
> {code}
> 3
> 2
> 2
> 2
> 1
> null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23707) Unable to create materialized views with transactions enabled with MySQL metastore

2020-06-16 Thread Dustin Koupal (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138110#comment-17138110
 ] 

Dustin Koupal commented on HIVE-23707:
--

I tried to track this down, but wasn't very successful.  The best I can come up 
with is, should this be VARCHAR?

 

[https://github.com/apache/hive/blob/871ee8009380e1bab160b58dc378a7f668c64584/standalone-metastore/metastore-server/src/main/resources/package.jdo#L256]

 

I'm wondering if it's a similar issue to this:

 

[https://github.com/apache/hive/commit/5861b6af52839794c18f5aa686c24aabdb737b93]

> Unable to create materialized views with transactions enabled with MySQL 
> metastore
> --
>
> Key: HIVE-23707
> URL: https://issues.apache.org/jira/browse/HIVE-23707
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
>Reporter: Dustin Koupal
>Priority: Blocker
>
> When attempting to create a materialized view with transactions enabled, we 
> get the following exception:
>  
> {code:java}
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to 
> generate new Mapping of type 
> org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
> CLOB declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore.ERROR : FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:Failed to generate new Mapping of type 
> org.datanucleus.store.rdbms.mapping.java.StringMapping, exception : JDBC type 
> CLOB declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore.JDBC type CLOB 
> declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this 
> datastore.org.datanucleus.exceptions.NucleusException: JDBC type CLOB 
> declared for field 
> "org.apache.hadoop.hive.metastore.model.MCreationMetadata.txnList" of java 
> type java.lang.String cant be mapped for this datastore. at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616)
>  at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59)
>  at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442) 
> at 
> org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276) 
> at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889)
>  at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088)
>  at 
> org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271)
>  at 
> org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760)
>  at 
> org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267)
>  at 
> org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484)
>  at 
> org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120)
>  at 
> org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
>  at 
> 

[jira] [Commented] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138111#comment-17138111
 ] 

Krisztian Kasa commented on HIVE-23493:
---

Pushed to master. Thank you [~jcamachorodriguez] for review.

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  ,year_total t_c_secyear
>  ,year_total t_w_firstyear
>  ,year_total t_w_secyear
>  where t_s_secyear.customer_id = t_s_firstyear.customer_id
>and t_s_firstyear.customer_id = t_c_secyear.customer_id
>and t_s_firstyear.customer_id = t_c_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_secyear.customer_id
>and t_s_firstyear.sale_type = 's'
>and t_c_firstyear.sale_type = 'c'
>and t_w_firstyear.sale_type = 'w'
>and t_s_secyear.sale_type = 's'
>and t_c_secyear.sale_type = 'c'
>and t_w_secyear.sale_type = 'w'
>and 

[jira] [Updated] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-23493:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  ,t_s_secyear.customer_first_name
>  ,t_s_secyear.customer_last_name
>  ,t_s_secyear.customer_birth_country
>  from year_total t_s_firstyear
>  ,year_total t_s_secyear
>  ,year_total t_c_firstyear
>  ,year_total t_c_secyear
>  ,year_total t_w_firstyear
>  ,year_total t_w_secyear
>  where t_s_secyear.customer_id = t_s_firstyear.customer_id
>and t_s_firstyear.customer_id = t_c_secyear.customer_id
>and t_s_firstyear.customer_id = t_c_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_firstyear.customer_id
>and t_s_firstyear.customer_id = t_w_secyear.customer_id
>and t_s_firstyear.sale_type = 's'
>and t_c_firstyear.sale_type = 'c'
>and t_w_firstyear.sale_type = 'w'
>and t_s_secyear.sale_type = 's'
>and t_c_secyear.sale_type = 'c'
>and t_w_secyear.sale_type = 'w'
>and t_s_firstyear.dyear =  1999
>

[jira] [Work logged] (HIVE-23683) Add queue time to compaction

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23683?focusedWorklogId=446341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446341
 ]

ASF GitHub Bot logged work on HIVE-23683:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 07:27
Start Date: 16/Jun/20 07:27
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1101:
URL: https://github.com/apache/hive/pull/1101#discussion_r440640809



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/process/show/compactions/ShowCompactionsOperation.java
##
@@ -81,6 +81,8 @@ private void writeHeader(DataOutputStream os) throws 
IOException {
 os.write(Utilities.tabCode);
 os.writeBytes("Worker");
 os.write(Utilities.tabCode);
+os.writeBytes("Queue Time");

Review comment:
   Sounds good to me!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446341)
Time Spent: 50m  (was: 40m)

> Add queue time to compaction
> 
>
> Key: HIVE-23683
> URL: https://issues.apache.org/jira/browse/HIVE-23683
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It would be good to report to the user when the transaction is initiated. 
> This info can be used when considering the health status of the compaction 
> system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-16 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23611:

Attachment: HIVE-23611.01.patch

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-23611.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2020-06-16 Thread Jl.Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136371#comment-17136371
 ] 

Jl.Yang edited comment on HIVE-14564 at 6/16/20, 6:59 AM:
--

Hi,In  version 0.13.1, I select result by left join , in the first sub query I 
used order by key word to sort query result. It happend the same error, when I 
delete order by key word , the error is disapper .I want to know if it is the 
same question by serialization  LazyBinarySerDe. 

2020-06-16 11:01:21,128 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1587475402360_2625484_m_000384_3: Error: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"_col0":29,"_col1":"_vivoX6PlusDEES8F6IFIFLBPFVK\u001dN_vivoX6PlusDEES8F6IFIFLBPFVK�]:�ڈ\u0011r
 
\u�F�{\nu","_col2":"�K�\u0006&\u0006�\u00014<�70AE�\u0006��\u0011�#��Dp\u�&��\u0011���\u0006�\u00014<�\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col3":70,"_col4":76,"_col5":66,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":70,"_col13":null,"_col14":86,"_col15":null,"_col16":156413}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"_col0":29,"_col1":"_vivoX6PlusDEES8F6IFIFLBPFVK\u001dN_vivoX6PlusDEES8F6IFIFLBPFVK�]:�ڈ\u0011r
 
\u�F�{\nu","_col2":"�K�\u0006&\u0006�\u00014<�70AE�\u0006��\u0011�#��Dp\u�&��\u0011���\u0006�\u00014<�\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col3":70,"_col4":76,"_col5":66,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":70,"_col13":null,"_col14":86,"_col15":null,"_col16":156413}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:329)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:261)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.populateCachedDistributionKeys(ReduceSinkOperator.java:349)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:282)
... 13 more


was (Author: ylskykill):
Hi,In  version 0.98, I select result by left join , in the first sub query I 
used order by key word to sort query result. It happend the same error, when I 
delete order by key word , the error is disapper .I want to know if it is the 
same question by serialization  LazyBinarySerDe. 

2020-06-16 11:01:21,128 INFO [AsyncDispatcher event handler] 

[jira] [Resolved] (HIVE-21952) Hive should allow to delete serde properties too, not just add them

2020-06-16 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-21952.
---
Resolution: Fixed

Merged to master, thank you [~belugabehr].

> Hive should allow to delete serde properties too, not just add them
> ---
>
> Key: HIVE-21952
> URL: https://issues.apache.org/jira/browse/HIVE-21952
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 4.0.0, 2.3.5
>Reporter: Ruslan Dautkhanov
>Assignee: Miklos Gergely
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive should allow to delete serde properties not just add/change them
> We have a use case when a presence of certain serde properties 
> causes issues and we want to delete just that one serde property. 
> It's not currently possible.
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23392) Metastore upgrade script TXN_LOCK_TBL rename inconsistency

2020-06-16 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136443#comment-17136443
 ] 

Aasha Medhi commented on HIVE-23392:


Issue in the mysql upgrade script.

{code:java}
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
Warning: You have an error in your SQL syntax; check the manual that 
corresponds to your MariaDB server version for the right syntax to use near 
'COLUMN NTXN_NEXT TO TXN_LOCK' at line 1 (state=42000,code=1064)
Error: You have an error in your SQL syntax; check the manual that corresponds 
to your MariaDB server version for the right syntax to use near 'COLUMN 
NTXN_NEXT TO TXN_LOCK' at line 1 (state=42000,code=1064)
{code}

Fixing it in HIVE-23697


> Metastore upgrade script TXN_LOCK_TBL rename inconsistency
> --
>
> Key: HIVE-23392
> URL: https://issues.apache.org/jira/browse/HIVE-23392
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23392.1.patch
>
>
> HIVE-23048 introduced a bug in the metastore upgrade scripts, by not renaming 
> correctly the columns in TXN_LOCK_TBL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2020-06-16 Thread Jl.Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136371#comment-17136371
 ] 

Jl.Yang commented on HIVE-14564:


Hi,In  version 0.98, I select result by left join , in the first sub query I 
used order by key word to sort query result. It happend the same error, when I 
delete order by key word , the error is disapper .I want to know if it is the 
same question by serialization  LazyBinarySerDe. 

2020-06-16 11:01:21,128 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1587475402360_2625484_m_000384_3: Error: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"_col0":29,"_col1":"_vivoX6PlusDEES8F6IFIFLBPFVK\u001dN_vivoX6PlusDEES8F6IFIFLBPFVK�]:�ڈ\u0011r
 
\u�F�{\nu","_col2":"�K�\u0006&\u0006�\u00014<�70AE�\u0006��\u0011�#��Dp\u�&��\u0011���\u0006�\u00014<�\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col3":70,"_col4":76,"_col5":66,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":70,"_col13":null,"_col14":86,"_col15":null,"_col16":156413}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"_col0":29,"_col1":"_vivoX6PlusDEES8F6IFIFLBPFVK\u001dN_vivoX6PlusDEES8F6IFIFLBPFVK�]:�ڈ\u0011r
 
\u�F�{\nu","_col2":"�K�\u0006&\u0006�\u00014<�70AE�\u0006��\u0011�#��Dp\u�&��\u0011���\u0006�\u00014<�\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col3":70,"_col4":76,"_col5":66,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":70,"_col13":null,"_col14":86,"_col15":null,"_col16":156413}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:329)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:261)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.populateCachedDistributionKeys(ReduceSinkOperator.java:349)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:282)
... 13 more

> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-14564
> URL: https://issues.apache.org/jira/browse/HIVE-14564
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>

[jira] [Work logged] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?focusedWorklogId=446362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446362
 ]

ASF GitHub Bot logged work on HIVE-23611:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 08:40
Start Date: 16/Jun/20 08:40
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1120:
URL: https://github.com/apache/hive/pull/1120


   … base dir during REPL operation
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446362)
Remaining Estimate: 0h
Time Spent: 10m

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-23611.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-16 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23611:

Status: Patch Available  (was: Open)

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23611:
--
Labels: pull-request-available  (was: )

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21952) Hive should allow to delete serde properties too, not just add them

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21952?focusedWorklogId=446350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446350
 ]

ASF GitHub Bot logged work on HIVE-21952:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 07:45
Start Date: 16/Jun/20 07:45
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1112:
URL: https://github.com/apache/hive/pull/1112


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446350)
Time Spent: 0.5h  (was: 20m)

> Hive should allow to delete serde properties too, not just add them
> ---
>
> Key: HIVE-21952
> URL: https://issues.apache.org/jira/browse/HIVE-21952
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 4.0.0, 2.3.5
>Reporter: Ruslan Dautkhanov
>Assignee: Miklos Gergely
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive should allow to delete serde properties not just add/change them
> We have a use case when a presence of certain serde properties 
> causes issues and we want to delete just that one serde property. 
> It's not currently possible.
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=446407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446407
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:45
Start Date: 16/Jun/20 10:45
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on pull request #1109:
URL: https://github.com/apache/hive/pull/1109#issuecomment-644686392


   @sankarh please take a look at the patch. 
   
   The tests failure seems unrelated, though I will try to run them locally and 
verify. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446407)
Time Spent: 0.5h  (was: 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23688:
--
Labels: pull-request-available  (was: )

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
>   at 
> 

[jira] [Work logged] (HIVE-21722) REPL:: logs are missing in hiveStatement.getQueryLog output during parallel execution mode.

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21722?focusedWorklogId=446418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446418
 ]

ASF GitHub Bot logged work on HIVE-21722:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #625:
URL: https://github.com/apache/hive/pull/625


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446418)
Time Spent: 3h  (was: 2h 50m)

> REPL:: logs are missing in hiveStatement.getQueryLog output during parallel 
> execution mode.
> ---
>
> Key: HIVE-21722
> URL: https://issues.apache.org/jira/browse/HIVE-21722
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21722.01.patch, HIVE-21722.02.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> getQueryLog only reads logs from Background thread scope. If parallel 
> execution is set to true, a new thread is created for execution and all the 
> logs added by the new thread are not added to the parent  Background thread 
> scope. In replication scope, replStateLogTasks are started in parallel mode 
> causing the logs to be skipped from getQueryLog scope. 
> There is one more issue, with the conf is not passed while creating 
> replStateLogTask during bootstrap load end. The same issue is there with 
> event load during incremental load. The incremental load end log task is 
> created with the proper config. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21729) Arrow serializer sometimes shifts timestamp by one second

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21729?focusedWorklogId=446417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446417
 ]

ASF GitHub Bot logged work on HIVE-21729:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #627:
URL: https://github.com/apache/hive/pull/627


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446417)
Time Spent: 0.5h  (was: 20m)

> Arrow serializer sometimes shifts timestamp by one second
> -
>
> Key: HIVE-21729
> URL: https://issues.apache.org/jira/browse/HIVE-21729
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21729.1.patch, HIVE-21729.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This happens due to 
> [secondInMicros|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java#L445]
>  are calculated like
> {code}
> final long secondInMicros = (secondInMillis - secondInMillis % 
> MILLIS_PER_SECOND) * MICROS_PER_MILLIS;
> {code}
> Instead this should be calculated like(by taking nanos from 
> timestampColumnVector itself)
> {code}
> final long nanos = timestampColumnVector.getNanos(j);
> final long secondInMicros = (secondInMillis - nanos / NS_PER_MILLIS) * 
> MICROS_PER_MILLIS;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21717) Rename is failing for directory in move task

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21717?focusedWorklogId=446422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446422
 ]

ASF GitHub Bot logged work on HIVE-21717:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #624:
URL: https://github.com/apache/hive/pull/624


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446422)
Time Spent: 2h  (was: 1h 50m)

> Rename is failing for directory in move task 
> -
>
> Key: HIVE-21717
> URL: https://issues.apache.org/jira/browse/HIVE-21717
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21717.01.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Rename fails with destination directory not empty in case a directory is move 
> directly to the table location from staging directory as rename cannot 
> overwrite non empty destination directory.
>  
> In replication scenarios, if user does some concurrent write during bootstrap 
> dump, it may happen that some data which are already replicated through 
> bootstrap, will be tried during next incremental load also. This is handled 
> by making the operations reentrant during repl load. But here move task is 
> not able to delete the directory created by bootstrap load even though 
> replace flag is set to true. This is causing the incremental load to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-14888) SparkClientImpl checks for "kerberos" string in hiveconf only when determining whether to use keytab file.

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14888?focusedWorklogId=446411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446411
 ]

ASF GitHub Bot logged work on HIVE-14888:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #635:
URL: https://github.com/apache/hive/pull/635


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446411)
Time Spent: 40m  (was: 0.5h)

> SparkClientImpl checks for "kerberos" string in hiveconf only when 
> determining whether to use keytab file.
> --
>
> Key: HIVE-14888
> URL: https://issues.apache.org/jira/browse/HIVE-14888
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.0
>Reporter: Thomas Rega
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-14888.1-spark.patch, HIVE-14888.2.patch, 
> HIVE-14888.3.patch, HIVE-14888.4.patch, HIVE-14888.5.patch
>
>   Original Estimate: 5m
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The SparkClientImpl will only provide a principal and keytab argument if the 
> HADOOP_SECURITY_AUTHENTICATION in hive conf is set to "kerberos". This will 
> not work on clusters with Hadoop security enabled that are not configured as 
> "kerberos", for example, a cluster which is configured for "ldap".
> The solution is to call UserGroupInformation.isSecurityEnabled() instead.
>  
> Code Review: [https://reviews.apache.org/r/70718/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13781?focusedWorklogId=446412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446412
 ]

ASF GitHub Bot logged work on HIVE-13781:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #640:
URL: https://github.com/apache/hive/pull/640


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446412)
Time Spent: 0.5h  (was: 20m)

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-13781.1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21766) Repl load command config is not passed to the txn manager

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21766?focusedWorklogId=446409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446409
 ]

ASF GitHub Bot logged work on HIVE-21766:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #647:
URL: https://github.com/apache/hive/pull/647


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446409)
Time Spent: 1h 50m  (was: 1h 40m)

> Repl load command config is not passed to the txn manager
> -
>
> Key: HIVE-21766
> URL: https://issues.apache.org/jira/browse/HIVE-21766
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21766.01.patch, HIVE-21766.02.patch, 
> HIVE-21766.03.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Cause:*
> REPL LOAD replicates Txn State (writeIds of tables) to the target HMS 
> (backend RDBMS). But, in this case, it is still connected to source HMS due 
> to configs passed in WITH clause were not stored in HiveTxnManager. 
> We pass the config object to the ReplTxnTask objects but HiveTxnManager was 
> created by Driver using session config object.
> *Fix:*
> We need to pass it to HiveTxnManager too by creating a txn manager for repl 
> txn operations with the config passed by user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21742) Vectorization: CASE result type casting

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21742?focusedWorklogId=446416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446416
 ]

ASF GitHub Bot logged work on HIVE-21742:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #656:
URL: https://github.com/apache/hive/pull/656


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446416)
Time Spent: 3h 40m  (was: 3.5h)

> Vectorization: CASE result type casting
> ---
>
> Key: HIVE-21742
> URL: https://issues.apache.org/jira/browse/HIVE-21742
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Vectorization
>Affects Versions: 3.1.1
>Reporter: Gopal Vijayaraghavan
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21742.1.patch, HIVE-21742.2.patch, 
> HIVE-21742.3.patch, HIVE-21742.4.patch, HIVE-21742.5.patch, 
> HIVE-21742.6.patch, HIVE-21799.4.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> {code}
> create temporary table foo(q548284 int);
> insert into foo values(1),(2),(3),(4),(5),(6);
> select q548284, CASE WHEN ((q548284 = 1)) THEN (0.2) WHEN ((q548284 = 2)) 
> THEN (0.4) WHEN ((q548284 = 3)) THEN (0.6) WHEN ((q548284 = 4)) THEN (0.8) 
> WHEN ((q548284 = 5)) THEN (1) ELSE (null) END from foo order by q548284 limit 
> 1;
> {code}
> Fails with 
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
> at 
> org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector.setElement(DecimalColumnVector.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnNull.evaluate(IfExprColumnNull.java:101)
> {code}
> This gets fixed if the case return of (1) is turned into a (1.0).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21656) Vectorize UDF mask

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21656?focusedWorklogId=446420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446420
 ]

ASF GitHub Bot logged work on HIVE-21656:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #611:
URL: https://github.com/apache/hive/pull/611


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446420)
Time Spent: 50m  (was: 40m)

> Vectorize UDF mask
> --
>
> Key: HIVE-21656
> URL: https://issues.apache.org/jira/browse/HIVE-21656
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21656.1.patch, HIVE-21656.2.patch, 
> HIVE-21656.3.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21731) Hive import fails, post upgrade of source 3.0 cluster, to a target 4.0 cluster with strict managed table set to true.

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21731?focusedWorklogId=446414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446414
 ]

ASF GitHub Bot logged work on HIVE-21731:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #628:
URL: https://github.com/apache/hive/pull/628


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446414)
Time Spent: 3h 40m  (was: 3.5h)

> Hive import fails, post upgrade of source 3.0 cluster, to a target 4.0 
> cluster with strict managed table set to true.
> -
>
> Key: HIVE-21731
> URL: https://issues.apache.org/jira/browse/HIVE-21731
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21731.01.patch, HIVE-21731.02.patch, 
> HIVE-21731.03.patch, HIVE-21731.04.patch, HIVE-21731.05.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The scenario is 
>  # Replication policy is set with hive  3.0 source cluster (strict managed 
> table set to false) and hive 4.0 target cluster with strict managed table set 
>  true.
>  # User upgrades the 3.0 source cluster to 4.0 cluster using upgrade tool.
>  # The upgrade converts all managed tables to acid tables.
>  # In the next repl dump, user sets hive .repl .dump .include .acid .tables 
> and hive .repl .bootstrap. acid. tables set true triggering bootstrap of 
> newly converted ACID tables.
>  # As the old tables are non-txn tables, dump is not filtering the events 
> even tough bootstrap acid table is set to true. This is causing the repl load 
> to fail as the write id is not set in the table object.
>  # If we ignore the event replay, the bootstrap is failing with dump 
> directory mismatch error.
> The fix should be 
>  # Ignore dumping the alter table event if bootstrap acid table is set true 
> and the alter is converting a non-acid table to acid table.
>  # In case of bootstrap during incremental load, ignore the dump directory 
> property set in table object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21721) nvl function fail with NullPointerException if the two paramtype are different

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21721?focusedWorklogId=446419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446419
 ]

ASF GitHub Bot logged work on HIVE-21721:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #623:
URL: https://github.com/apache/hive/pull/623


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446419)
Time Spent: 40m  (was: 0.5h)

> nvl function fail with NullPointerException if the two paramtype are different
> --
>
> Key: HIVE-21721
> URL: https://issues.apache.org/jira/browse/HIVE-21721
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0
>Reporter: philipse
>Assignee: philipse
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: image-2019-05-11-10-41-05-168.png, 
> image-2019-05-12-23-47-49-401.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hi all
> when i user nvl and case when fucntion, it behaviors like the following.which 
> is weird to me, and makes me headache everytime. i need to check where the 
> nullpointexception throws out when it hadppens,So can the reasons more 
> friendly to users,
> !image-2019-05-11-10-41-05-168.png!
> {code:java}
> select nvl(cast('2019-05-10 11:11:11,111' as timestamp),'2019-05-10 
> 11:11:11,111');
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21709) Count with expression does not work in Parquet

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21709?focusedWorklogId=446410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446410
 ]

ASF GitHub Bot logged work on HIVE-21709:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #631:
URL: https://github.com/apache/hive/pull/631


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446410)
Time Spent: 0.5h  (was: 20m)

> Count with expression does not work in Parquet
> --
>
> Key: HIVE-21709
> URL: https://issues.apache.org/jira/browse/HIVE-21709
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.2
>Reporter: Mainak Ghosh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For parquet file with nested schema, count with expression as column name 
> does not work when you are filtering on another column in the same struct. 
> Here are the steps to reproduce:
> {code:java}
> CREATE TABLE `test_table`( `rtb_win` struct<`impression_id`:string, 
> `pub_id`:string>) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS 
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
> INSERT INTO TABLE test_table SELECT named_struct('impression_id', 'cat', 
> 'pub_id', '2');
> select count(rtb_win.impression_id) from test_table where rtb_win.pub_id ='2';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> +--+ 
> | _c0  |
> +--+ 
> | 0    | 
> +--+
> select count(*) from test_parquet_count_mghosh where rtb_win.pub_id ='2';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases. 
> +--+ 
> | _c0  | 
> +--+ 
> | 1    | 
> +--+{code}
> As you can see the first query returns the wrong result while the second one 
> returns the correct result.
> The issue is an column order mismatch between the actual parquet file 
> (impression_id first and pub_id second) and the Hive prunedCols datastructure 
> (reverse). As a result in the filter we compare with the wrong value and the 
> count returns 0. I have been able to identify the cause of this mismatch.
> I would love to get the code reviewed and merged. Some of the code changes 
> are changes to commits from Ferdinand Xu and Chao Sun.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?focusedWorklogId=446415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446415
 ]

ASF GitHub Bot logged work on HIVE-21776:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #642:
URL: https://github.com/apache/hive/pull/642


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446415)
Time Spent: 1.5h  (was: 1h 20m)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch, 
> HIVE-21776.03.patch, HIVE-21776.04.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When a UDF with jar on HDFS is replicated, we add the jar path to the dump. 
> The dumped URL of jar has checksum and cmroot added to it. During load, we 
> load the jar on target. ReplCopyTask handles the jar paths separately from 
> the paths in _files and it uses the presence of checksum and cmroot for that 
> decision. (Those two are not present in _files URL). If ReplChangeManager is 
> not initialized during dump, dumped URL of jar does not contain checksum and 
> cmroot and thus ReplCopyTask fails to copy the UDF jar to the target. This 
> fails the repl load since the function can not be created. Fix is to 
> initialize ReplChangeManager always.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21736) Operator := error

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21736?focusedWorklogId=446413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446413
 ]

ASF GitHub Bot logged work on HIVE-21736:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #632:
URL: https://github.com/apache/hive/pull/632


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446413)
Time Spent: 0.5h  (was: 20m)

> Operator := error
> -
>
> Key: HIVE-21736
> URL: https://issues.apache.org/jira/browse/HIVE-21736
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Affects Versions: 2.3.5
>Reporter: YMXDGYX
>Assignee: YMXDGYX
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> in hplsql
> execute " i:=1" will throw an exception
> but can execute "i := 1" (Add a space before the operator:=)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?focusedWorklogId=446421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446421
 ]

ASF GitHub Bot logged work on HIVE-20967:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:54
Start Date: 16/Jun/20 10:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #613:
URL: https://github.com/apache/hive/pull/613


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446421)
Time Spent: 2h  (was: 1h 50m)

> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, pull-request-available
> Attachments: HIVE-20967.01.patch, HIVE-20967.03.patch, 
> HIVE-20967.03.patch, HIVE-20967.04.patch, HIVE-20967.05.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Some of the events from Hive2 may cause conflicts in Hive3 
> (hive.strict.managed.tables=true) when applied. So, need to handle them 
> properly.
>  1. Alter table to convert non-acid to acid.
>  - Do not allow this conversion on source of replication if strict.managed is 
> false.
> 2. Alter table or partition that changes the location.
>  - For managed tables at source, the table location shouldn't be changed for 
> the given non-partitioned table and partition location shouldn't be changed 
> for partitioned table as alter event doesn't capture the new files list. So, 
> it may cause data inconsistsency. So, if database is enabled for replication 
> at source, then alter location on managed tables should be blocked.
>  - For external partitioned tables, if location is changed at source, the the 
> location should be changed for the table and any partitions which reside 
> within the table location, but not for the partitions which are not within 
> the table location. (may be we just need the test).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-06-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23688:
---
Status: Open  (was: Patch Available)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error 

[jira] [Updated] (HIVE-23702) Add metastore metrics to show age of the oldest initiated compaction

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23702:
--
Labels: pull-request-available  (was: )

> Add metastore metrics to show age of the oldest initiated compaction
> 
>
> Key: HIVE-23702
> URL: https://issues.apache.org/jira/browse/HIVE-23702
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be good to have a metrics which will show the age of the oldest 
> initiated compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23702) Add metastore metrics to show age of the oldest initiated compaction

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23702?focusedWorklogId=446476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446476
 ]

ASF GitHub Bot logged work on HIVE-23702:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 12:49
Start Date: 16/Jun/20 12:49
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #1123:
URL: https://github.com/apache/hive/pull/1123


   Added new metrics to the updateCompactionMetrics
   Added new tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446476)
Remaining Estimate: 0h
Time Spent: 10m

> Add metastore metrics to show age of the oldest initiated compaction
> 
>
> Key: HIVE-23702
> URL: https://issues.apache.org/jira/browse/HIVE-23702
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be good to have a metrics which will show the age of the oldest 
> initiated compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23612) Option for HiveStrictManagedMigration to impersonate a user for FS operations

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23612?focusedWorklogId=446379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446379
 ]

ASF GitHub Bot logged work on HIVE-23612:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 09:23
Start Date: 16/Jun/20 09:23
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1121:
URL: https://github.com/apache/hive/pull/1121


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446379)
Remaining Estimate: 0h
Time Spent: 10m

> Option for HiveStrictManagedMigration to impersonate a user for FS operations
> -
>
> Key: HIVE-23612
> URL: https://issues.apache.org/jira/browse/HIVE-23612
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
> Attachments: HIVE-23612.0.patch, HIVE-23612.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveStrictManagedMigration tool can be used to move HDFS paths and to change 
> ownership on said paths. It may be beneficial to do such file system 
> operations as a different user than the one the tool itself is run.
> Moreover, while creating the external DB directory, the tool will chown the 
> new directory to the user set as DB owner in HMS. If this is unset, no chown 
> command is used. In this case we should make the 'hive' user the directory 
> owner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23612) Option for HiveStrictManagedMigration to impersonate a user for FS operations

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23612:
--
Labels: pull-request-available  (was: )

> Option for HiveStrictManagedMigration to impersonate a user for FS operations
> -
>
> Key: HIVE-23612
> URL: https://issues.apache.org/jira/browse/HIVE-23612
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23612.0.patch, HIVE-23612.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveStrictManagedMigration tool can be used to move HDFS paths and to change 
> ownership on said paths. It may be beneficial to do such file system 
> operations as a different user than the one the tool itself is run.
> Moreover, while creating the external DB directory, the tool will chown the 
> new directory to the user set as DB owner in HMS. If this is unset, no chown 
> command is used. In this case we should make the 'hive' user the directory 
> owner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=446387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446387
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 09:42
Start Date: 16/Jun/20 09:42
Worklog Time Spent: 10m 
  Work Description: SparksFyz opened a new pull request #1122:
URL: https://github.com/apache/hive/pull/1122


   …d null values in map
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446387)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> 

[jira] [Resolved] (HIVE-23683) Add enqueue time to compaction

2020-06-16 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23683.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~klcopp] and [~lpinter]!

> Add enqueue time to compaction
> --
>
> Key: HIVE-23683
> URL: https://issues.apache.org/jira/browse/HIVE-23683
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It would be good to report to the user when the transaction is initiated. 
> This info can be used when considering the health status of the compaction 
> system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23418) Investigate why msck command found different partitions at repair.q, msck_repair*, partition_discovery.q

2020-06-16 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-23418:
--
Description: Check [https://reviews.apache.org/r/72485/] for details.

> Investigate why msck command found different partitions at repair.q, 
> msck_repair*, partition_discovery.q
> 
>
> Key: HIVE-23418
> URL: https://issues.apache.org/jira/browse/HIVE-23418
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Miklos Gergely
>Priority: Major
>
> Check [https://reviews.apache.org/r/72485/] for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21109) Support stats replication for ACID tables.

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=446443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446443
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #579:
URL: https://github.com/apache/hive/pull/579


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446443)
Time Spent: 13h 50m  (was: 13h 40m)

> Support stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Statistics, Transactions
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch, 
> HIVE-21109.11.patch, HIVE-21109.12.patch, HIVE-21109.12.patch
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21446) Hive Server going OOM during hive external table replications

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21446?focusedWorklogId=446448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446448
 ]

ASF GitHub Bot logged work on HIVE-21446:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #569:
URL: https://github.com/apache/hive/pull/569


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446448)
Time Spent: 3h  (was: 2h 50m)

> Hive Server going OOM during hive external table replications
> -
>
> Key: HIVE-21446
> URL: https://issues.apache.org/jira/browse/HIVE-21446
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21446.01.patch, HIVE-21446.02.patch, 
> HIVE-21446.03.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The file system objects opened using proxy users are not closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=446440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446440
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #587:
URL: https://github.com/apache/hive/pull/587


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446440)
Time Spent: 2.5h  (was: 2h 20m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21437) Vectorization: Decimal64 division with integer columns

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21437?focusedWorklogId=446447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446447
 ]

ASF GitHub Bot logged work on HIVE-21437:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #570:
URL: https://github.com/apache/hive/pull/570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446447)
Time Spent: 0.5h  (was: 20m)

> Vectorization: Decimal64 division with integer columns
> --
>
> Key: HIVE-21437
> URL: https://issues.apache.org/jira/browse/HIVE-21437
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21437.1.patch, HIVE-21437.2.patch, 
> HIVE-21437.3.patch, HIVE-21437.6.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vectorizer fails for
> {code}
> CREATE temporary TABLE `catalog_Sales`(
>   `cs_quantity` int, 
>   `cs_wholesale_cost` decimal(7,2), 
>   `cs_list_price` decimal(7,2), 
>   `cs_sales_price` decimal(7,2), 
>   `cs_ext_discount_amt` decimal(7,2), 
>   `cs_ext_sales_price` decimal(7,2), 
>   `cs_ext_wholesale_cost` decimal(7,2), 
>   `cs_ext_list_price` decimal(7,2), 
>   `cs_ext_tax` decimal(7,2), 
>   `cs_coupon_amt` decimal(7,2), 
>   `cs_ext_ship_cost` decimal(7,2), 
>   `cs_net_paid` decimal(7,2), 
>   `cs_net_paid_inc_tax` decimal(7,2), 
>   `cs_net_paid_inc_ship` decimal(7,2), 
>   `cs_net_paid_inc_ship_tax` decimal(7,2), 
>   `cs_net_profit` decimal(7,2))
>  ;
> explain vectorization detail select maxcs_ext_list_price - 
> cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
> catalog_sales;
> {code}
> {code}
> 'Map Vectorization:'
> 'enabled: true'
> 'enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true'
> 'inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> 'notVectorizedReason: SELECT operator: Could not instantiate 
> DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], 
> argument classes: [Integer, Integer, Integer], exception: 
> java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 
> stack trace: 
> sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown 
> Source), 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45),
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963),
>  ...'
> 'vectorized: false'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21641) Llap external client returns decimal columns in different precision/scale as compared to beeline

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21641?focusedWorklogId=446436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446436
 ]

ASF GitHub Bot logged work on HIVE-21641:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #601:
URL: https://github.com/apache/hive/pull/601


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446436)
Time Spent: 0.5h  (was: 20m)

> Llap external client returns decimal columns in different precision/scale as 
> compared to beeline
> 
>
> Key: HIVE-21641
> URL: https://issues.apache.org/jira/browse/HIVE-21641
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: Branch3Candidate, pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21641.1.patch, HIVE-21641.2.patch, 
> HIVE-21641.3.patch, HIVE-21641.4.patch, HIVE-21641.5.branch-3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Llap external client gives different precision/scale as compared to when the 
> query is executed beeline. Consider the following results:
> Query:
> {code} 
> select avg(ss_ext_sales_price) my_avg from store_sales;
> {code} 
> Result from Beeline
> {code} 
> ++
> |   my_avg   |
> ++
> | 37.8923531030581611189434  |
> ++
> {code} 
> Result from Llap external client
> {code}
> +-+
> |   my_avg|
> +-+
> |37.892353|
> +-+
> {code}
>  
> This is due to Driver(beeline path) calls 
> [analyzeInternal()|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L328]
>  for getting result set schema which initializes 
> [resultSchema|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L333]
>  after some more transformations as compared to llap-ext-client which calls 
> [genLogicalPlan()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java#L561]
> Replacing {{genLogicalPlan()}} by {{analyze()}} resolves this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21643) Fix Broken support for ISO Time with Zone in Hive UDFs

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21643?focusedWorklogId=446437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446437
 ]

ASF GitHub Bot logged work on HIVE-21643:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #604:
URL: https://github.com/apache/hive/pull/604


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446437)
Time Spent: 0.5h  (was: 20m)

> Fix Broken support for ISO Time with Zone in Hive UDFs
> --
>
> Key: HIVE-21643
> URL: https://issues.apache.org/jira/browse/HIVE-21643
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: RAJKAMAL
>Assignee: Navya Sruthi Sunkarapalli
>Priority: Major
>  Labels: patch-available, pull-request-available
> Attachments: HIVE-21643.1.patch, Hive-21643.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The followings UDFs date_format and to_date used to support ISO dates with 
> timezone and the support has been broken since Hive 3.x release.
> Example:
> date_format('2017-03-16T00:10:42Z', 'y')
> date_format('2017-03-16T00:10:42+01:00', 'y')
> date_format('2017-03-16T00:10:42-01:00', 'y')
> to_date('2015-04-11T01:30:45Z')
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21392) Misconfigurations of DataNucleus log in log4j.properties

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21392?focusedWorklogId=446444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446444
 ]

ASF GitHub Bot logged work on HIVE-21392:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #571:
URL: https://github.com/apache/hive/pull/571


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446444)
Time Spent: 1h 40m  (was: 1.5h)

> Misconfigurations of DataNucleus log in log4j.properties
> 
>
> Key: HIVE-21392
> URL: https://issues.apache.org/jira/browse/HIVE-21392
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Chen Zhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21392.02.patch, HIVE-21392.03.patch, 
> HIVE-21392.04.patch, HIVE-21392.05.patch, HIVE-21392.06.patch, 
> HIVE-21392.07.patch, HIVE-21392.08.patch, HIVE-21392.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In the patch of  
> [HIVE-12020|https://issues.apache.org/jira/browse/HIVE-12020], we changed the 
> DataNucleus related logging configuration from nine fine-grained loggers with 
> three coarse-grained loggers (DataNucleus, Datastore and JPOX). As Prasanth 
> Jayachandran 
> [explain|https://issues.apache.org/jira/browse/HIVE-12020?focusedCommentId=15025612=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15025612],
>  these three loggers are the top-level logger in DataNucleus, so that we 
> don't need to specify other loggers for DataNucleus. However, according to 
> the 
> [documents|http://www.datanucleus.org/products/accessplatform/logging.html] 
> and [source 
> codes|https://github.com/datanucleus/datanucleus-core/blob/master/src/main/java/org/datanucleus/util/NucleusLogger.java#L108]
>  of DataNucleus, the top-level logger in DataNucleus is `DataNucleus`. 
> Therefore, we just need to keep the right one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21430) INSERT into a dynamically partitioned table with hive.stats.autogather = false throws a MetaException

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21430?focusedWorklogId=446446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446446
 ]

ASF GitHub Bot logged work on HIVE-21430:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #572:
URL: https://github.com/apache/hive/pull/572


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446446)
Remaining Estimate: 46h 50m  (was: 47h)
Time Spent: 1h 10m  (was: 1h)

> INSERT into a dynamically partitioned table with hive.stats.autogather = 
> false throws a MetaException
> -
>
> Key: HIVE-21430
> URL: https://issues.apache.org/jira/browse/HIVE-21430
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21430.01.patch, HIVE-21430.02.patch, 
> HIVE-21430.03.patch, HIVE-21430.04.patch, metaexception_repro.patch, 
> org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread-output.txt
>
>   Original Estimate: 48h
>  Time Spent: 1h 10m
>  Remaining Estimate: 46h 50m
>
> When the test TestStatsUpdaterThread#testTxnDynamicPartitions added in the 
> attached patch is run it throws exception (full logs attached.)
> org.apache.hadoop.hive.metastore.api.MetaException: Cannot change stats state 
> for a transactional table default.simple_stats without providing the 
> transactional write state for verification (new write ID 5, valid write IDs 
> null; current state \{"BASIC_STATS":"true","COLUMN_STATS":{"s":"true"}}; new 
> state null
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4328)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20968) Support conversion of managed to external where location set was not owned by hive

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20968?focusedWorklogId=446439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446439
 ]

ASF GitHub Bot logged work on HIVE-20968:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #588:
URL: https://github.com/apache/hive/pull/588


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446439)
Time Spent: 2h 50m  (was: 2h 40m)

> Support conversion of managed to external where location set was not owned by 
> hive
> --
>
> Key: HIVE-20968
> URL: https://issues.apache.org/jira/browse/HIVE-20968
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR, pull-request-available
> Attachments: HIVE-20968.01.patch, HIVE-20968.02.patch, 
> HIVE-20968.03.patch, HIVE-20968.04.patch, HIVE-20968.05.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As per migration rule, if a location is outside the default managed table 
> directory and the location is not owned by "hive" user, then it should be 
> converted to external table after upgrade.
>  So, the same rule is applicable for Hive replication where the data of 
> source managed table is residing outside the default warehouse directory and 
> is not owned by "hive" user.
>  During this conversion, the path should be preserved in target as well so 
> that failover works seamlessly.
>  # If the table location is out side hive warehouse and is not owned by hive, 
> then the table at target will be converted to external table. But the 
> location can not be retained , it will be retained relative to hive external 
> warehouse directory. 
>  #  As the table is not an external table at source, only those data which 
> are added using events will be replicated.
>  # The ownership of the location will be stored in the create table event and 
> will be used to compare it with strict.managed.tables.migration.owner to 
> decide if the flag in replication scope can be set. This flag is used to 
> convert the managed table to external table at target.
> Some of the scenarios needs to be blocked if the database is set for 
> replication from a cluster with non strict managed table setting to strict 
> managed table.
> 1. Block alter table / partition set location for database with source of 
> replication set for managed tables
> 2. If user manually changes the ownership of the location, hive replication 
> may go to a non recoverable state.
> 3. Block add partition if the location ownership is different than table 
> location for managed tables.
> 4. User needs to set strict.managed.tables.migration.owner along with dump 
> command (default to hive user). This value will be used during dump to decide 
> the ownership which will be used during load to decide the table type. The 
> location owner information can be stored in the events during create table. 
> The flag can be stored in replication spec. Check other such configs used in 
> upgrade tool.
> 5. Block conversion from managed to external and vice versa. Pass some flag 
> in upgrade flow to allow this conversion during upgrade flow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21638) Hive execute miss stage

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21638?focusedWorklogId=446442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446442
 ]

ASF GitHub Bot logged work on HIVE-21638:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #600:
URL: https://github.com/apache/hive/pull/600


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446442)
Time Spent: 0.5h  (was: 20m)

> Hive execute miss stage
> ---
>
> Key: HIVE-21638
> URL: https://issues.apache.org/jira/browse/HIVE-21638
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.2
>Reporter: ann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.2
>
> Attachments: stage-miss-bugfix.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When execute query finished , there are some missed stage because of status 
> check failed.
> It need to check after execute finshed , if not executed all stage and throw 
> exception to avoid incorrect result in the end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21555) [SECURITY]newCachedThreadPool() has higher risk in causing OutOfMemoryError

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21555?focusedWorklogId=446441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446441
 ]

ASF GitHub Bot logged work on HIVE-21555:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #585:
URL: https://github.com/apache/hive/pull/585


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446441)
Time Spent: 0.5h  (was: 20m)

> [SECURITY]newCachedThreadPool() has higher risk in causing OutOfMemoryError
> ---
>
> Key: HIVE-21555
> URL: https://issues.apache.org/jira/browse/HIVE-21555
> Project: Hive
>  Issue Type: Bug
>Reporter: bd2019us
>Assignee: bd2019us
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: 1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Location: 
> ql/src/java/org/apache/hadoop/hive/ql/processors/LlapCacheResourceProcessor.java:136
> In the program, since the number of instances (instances = 
> llapRegistryService.getInstances().getAll();) is not known ahead of time, it 
> has a high risk in causing OutOfMemoryError when too many threads are 
> created. Therefore, to ensure security, a fixed number of thread pool should 
> be used, which can be freely configured and also enables running multiple 
> instances concurrently but is free from the above error.
> PR: https://github.com/apache/hive/pull/585



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21606) when creating table with hdfs location,should not check permission of all the children dirs

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21606?focusedWorklogId=446438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446438
 ]

ASF GitHub Bot logged work on HIVE-21606:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #593:
URL: https://github.com/apache/hive/pull/593


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446438)
Time Spent: 40m  (was: 0.5h)

> when creating table with hdfs location,should not check permission of all the 
> children dirs
> ---
>
> Key: HIVE-21606
> URL: https://issues.apache.org/jira/browse/HIVE-21606
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.3.4
>Reporter: philipse
>Assignee: philipse
>Priority: Major
>  Labels: pull-request-available
> Attachments: fix-attach.patch, image-2019-04-12-15-31-30-883.png, 
> image-2019-04-12-15-34-55-942.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> when we create a table with a specific location with login user *bidcadmin*.
> {code:java}
> create table testdb.test6(id int) location 
> '/data/dpdcadmin/test2/test2/test5';
> {code}
> we met the following errors:
> {code:java}
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: Principal [name=bidcadmin, type=USER] does not have 
> following privileges for operation CREATETABLE [[INSERT, DELETE] on Object 
> [type=DFS_URI, name=hdfs://hadoopcluster/data/dpdcadmin/test2/test2/test5]] 
> (state=42000,code=4)
> {code}
>  
> the hdfs permission is as the following
> !image-2019-04-12-15-34-55-942.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21485) Hive desc operation takes more than 100 seconds after upgrading from Hive 1.2.1 to 2.3.4

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21485?focusedWorklogId=446445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446445
 ]

ASF GitHub Bot logged work on HIVE-21485:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #577:
URL: https://github.com/apache/hive/pull/577


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446445)
Time Spent: 1h  (was: 50m)

> Hive desc operation takes more than 100 seconds after upgrading from Hive 
> 1.2.1 to 2.3.4
> 
>
> Key: HIVE-21485
> URL: https://issues.apache.org/jira/browse/HIVE-21485
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Hive
>Affects Versions: 2.3.4
>Reporter: Qingxin Wu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21485.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive desc [formatted|extended] operation cost more than 100 seconds after 
> upgrading from Hive 1.2.1 to 2.3.4. This is mainly caused by showing stats 
> for partitioned tables which was introduced by HIVE-16098 when the 
> partitioned tables have a large amount of partitions. In our case, the number 
> of partition is 187221.
> {code:java}
> hive> desc bus.kafka_data;
> OK
> idstring
> ...
> d map
> stat_date string
> log_idstring
> # Partition Information
> # col_namedata_type   comment
> stat_date string
> log_idstring
> Time taken: 115.342 seconds, Fetched: 42 row(s)
> {code}
> same operation executed in hive-1.2.1 and only cost 2 seconds.
> {code:java}
> hive> desc bus.kafka_data;
> OK
> idstring
> ...
> d map
> stat_date string
> log_idstring
> # Partition Information
> # col_namedata_type   comment
> stat_date string
> log_idstring
> Time taken: 2.037 seconds, Fetched: 42 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-06-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23688:
---
   Attachment: HIVE-23688.patch
Fix Version/s: 3.0.0
   4.0.0
   Status: Patch Available  (was: Open)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   ... 16 

[jira] [Commented] (HIVE-13875) Beeline ignore where clause when it is the last line of file and missing a EOL hence give wrong query result

2020-06-16 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136538#comment-17136538
 ] 

Zhihua Deng commented on HIVE-13875:


The issue has been resolved in  
[HIVE-10541|https://issues.apache.org/jira/browse/HIVE-10541].

> Beeline ignore where clause when it is the last line of file and missing a 
> EOL hence give wrong query result
> 
>
> Key: HIVE-13875
> URL: https://issues.apache.org/jira/browse/HIVE-13875
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Lu Ji
>Priority: Minor
>
> Steps to reproduce:
> Say we have a simple table:
> {code}
> select * from lji.lu_test;
> +---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.04 seconds)
> {code}
> We have a simple query in a file. But note this file missing the last EOL.
> {code}
> cat -A test.hql
> use lji;$
> select * from lu_test$
> where country='us';[lji@~]$
> {code}
> Then if we execute file using both hive CLI and beeline + HS2, we have 
> different result.
> {code}
> [lji@~]$ hive -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Logging initialized using configuration in 
> file:/etc/hive/2.3.4.7-4/0/hive-log4j.properties
> OK
> Time taken: 1.624 seconds
> OK
> johnus
> Time taken: 1.482 seconds, Fetched: 1 row(s)
> [lji@~]$ beeline -u "jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX" 
> -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Connecting to jdbc:hive2://XXXl:1/default;principal=hive/_HOST@XXX
> Connected to: Apache Hive (version 1.2.1.2.3.4.7-4)
> Driver: Hive JDBC (version 1.2.1.2.3.4.7-4)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://XXX> use lji;
> No rows affected (0.06 seconds)
> 0: jdbc:hive2://XXX> select * from lu_test
> 0: jdbc:hive2://XXX> where 
> country='us';+---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.073 seconds)
> 0: jdbc:hive2://XXX>
> Closing: 0: jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX
> {code}
> Obviously, beeline gave the wrong result. It ignore the where clause in the 
> last line.
> I know it is quit weird for a file missing the last EOL, but for whatever 
> reason, we kind of having quit some files in this state. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-13875) Beeline ignore where clause when it is the last line of file and missing a EOL hence give wrong query result

2020-06-16 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136538#comment-17136538
 ] 

Zhihua Deng edited comment on HIVE-13875 at 6/16/20, 10:50 AM:
---

The issue seems have been resolved in  HIVE-10541.


was (Author: dengzh):
The issue has been resolved in  
[HIVE-10541|https://issues.apache.org/jira/browse/HIVE-10541].

> Beeline ignore where clause when it is the last line of file and missing a 
> EOL hence give wrong query result
> 
>
> Key: HIVE-13875
> URL: https://issues.apache.org/jira/browse/HIVE-13875
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Lu Ji
>Priority: Minor
>
> Steps to reproduce:
> Say we have a simple table:
> {code}
> select * from lji.lu_test;
> +---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.04 seconds)
> {code}
> We have a simple query in a file. But note this file missing the last EOL.
> {code}
> cat -A test.hql
> use lji;$
> select * from lu_test$
> where country='us';[lji@~]$
> {code}
> Then if we execute file using both hive CLI and beeline + HS2, we have 
> different result.
> {code}
> [lji@~]$ hive -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Logging initialized using configuration in 
> file:/etc/hive/2.3.4.7-4/0/hive-log4j.properties
> OK
> Time taken: 1.624 seconds
> OK
> johnus
> Time taken: 1.482 seconds, Fetched: 1 row(s)
> [lji@~]$ beeline -u "jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX" 
> -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Connecting to jdbc:hive2://XXXl:1/default;principal=hive/_HOST@XXX
> Connected to: Apache Hive (version 1.2.1.2.3.4.7-4)
> Driver: Hive JDBC (version 1.2.1.2.3.4.7-4)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://XXX> use lji;
> No rows affected (0.06 seconds)
> 0: jdbc:hive2://XXX> select * from lu_test
> 0: jdbc:hive2://XXX> where 
> country='us';+---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.073 seconds)
> 0: jdbc:hive2://XXX>
> Closing: 0: jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX
> {code}
> Obviously, beeline gave the wrong result. It ignore the where clause in the 
> last line.
> I know it is quit weird for a file missing the last EOL, but for whatever 
> reason, we kind of having quit some files in this state. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23612) Option for HiveStrictManagedMigration to impersonate a user for FS operations

2020-06-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136472#comment-17136472
 ] 

Ádám Szita commented on HIVE-23612:
---

Yes, that absolutely make sense [~jdere]. I've amended the change - see [GitHub 
Pull Request #1121|https://github.com/apache/hive/pull/1121]

> Option for HiveStrictManagedMigration to impersonate a user for FS operations
> -
>
> Key: HIVE-23612
> URL: https://issues.apache.org/jira/browse/HIVE-23612
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23612.0.patch, HIVE-23612.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveStrictManagedMigration tool can be used to move HDFS paths and to change 
> ownership on said paths. It may be beneficial to do such file system 
> operations as a different user than the one the tool itself is run.
> Moreover, while creating the external DB directory, the tool will chown the 
> new directory to the user set as DB owner in HMS. If this is unset, no chown 
> command is used. In this case we should make the 'hive' user the directory 
> owner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23683) Add queue time to compaction

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23683?focusedWorklogId=446400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446400
 ]

ASF GitHub Bot logged work on HIVE-23683:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:22
Start Date: 16/Jun/20 10:22
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1101:
URL: https://github.com/apache/hive/pull/1101#issuecomment-644675940


   Thanks @klcopp and @laszlopinter86 for the reviews!
   
   > One more: have you considered displaying a human-readable timestamp (maybe 
specifying UTC) in show compactions output? 
   
   I decided against it, because every timestamp now is printed in epoch time 
in the show compactions output, and also I think this might break some user 
scripts if we change this behavior. Maybe in another jira with its' own config 
would be beneficial to have this.
   
   > Also I recommend running the metastore upgrade tests if you haven't yet.
   
   That was big help in identifying issues (happily not in this script) - See: 
https://issues.apache.org/jira/browse/HIVE-23697



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446400)
Time Spent: 1h  (was: 50m)

> Add queue time to compaction
> 
>
> Key: HIVE-23683
> URL: https://issues.apache.org/jira/browse/HIVE-23683
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It would be good to report to the user when the transaction is initiated. 
> This info can be used when considering the health status of the compaction 
> system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23683) Add queue time to compaction

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23683?focusedWorklogId=446401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446401
 ]

ASF GitHub Bot logged work on HIVE-23683:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:23
Start Date: 16/Jun/20 10:23
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1101:
URL: https://github.com/apache/hive/pull/1101


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446401)
Time Spent: 1h 10m  (was: 1h)

> Add queue time to compaction
> 
>
> Key: HIVE-23683
> URL: https://issues.apache.org/jira/browse/HIVE-23683
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It would be good to report to the user when the transaction is initiated. 
> This info can be used when considering the health status of the compaction 
> system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23683) Add enqueue time to compaction

2020-06-16 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-23683:
--
Summary: Add enqueue time to compaction  (was: Add queue time to compaction)

> Add enqueue time to compaction
> --
>
> Key: HIVE-23683
> URL: https://issues.apache.org/jira/browse/HIVE-23683
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It would be good to report to the user when the transaction is initiated. 
> This info can be used when considering the health status of the compaction 
> system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23702) Add metastore metrics to show age of the oldest initiated compaction

2020-06-16 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-23702:
-


> Add metastore metrics to show age of the oldest initiated compaction
> 
>
> Key: HIVE-23702
> URL: https://issues.apache.org/jira/browse/HIVE-23702
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> It would be good to have a metrics which will show the age of the oldest 
> initiated compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23602) Use Java Concurrent Package for Operation Handle Set

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23602?focusedWorklogId=446515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446515
 ]

ASF GitHub Bot logged work on HIVE-23602:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:37
Start Date: 16/Jun/20 13:37
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1055:
URL: https://github.com/apache/hive/pull/1055


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446515)
Time Spent: 40m  (was: 0.5h)

> Use Java Concurrent Package for Operation Handle Set
> 
>
> Key: HIVE-23602
> URL: https://issues.apache.org/jira/browse/HIVE-23602
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23026) support add a yarn application name for tez on hiveserver2

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23026?focusedWorklogId=446525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446525
 ]

ASF GitHub Bot logged work on HIVE-23026:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:47
Start Date: 16/Jun/20 13:47
Worklog Time Spent: 10m 
  Work Description: xiejiajun opened a new pull request #1082:
URL: https://github.com/apache/hive/pull/1082


   ### What is this PR for?
   - add a configuration item to support setting tez job name
   
   ### What type of PR is it?
   - feature
   
   ### What is the Jira issue?
   - https://issues.apache.org/jira/browse/HIVE-23026
   
   ### How to use?
   - We can use 'set tez.job.name=;' or '--hiveconf 
ez.job.name=' to use this function to customize Yarn 
application name when we run a sql.
   - We recommend setting  the 'customized_job_name' to a Java String.format() 
string that can accept the hive session ID as a single parameter.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446525)
Time Spent: 4h 40m  (was: 4.5h)

> support add a yarn application name for tez on hiveserver2
> --
>
> Key: HIVE-23026
> URL: https://issues.apache.org/jira/browse/HIVE-23026
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Jake Xie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 3.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently tez on hiveServer2 cannot specify yarn application name, which is 
> not very convenient for locating the problem SQL, so i added a configuration 
> item to support setting tez job name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23697) Fix errors in the metastore upgrade script

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23697?focusedWorklogId=446533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446533
 ]

ASF GitHub Bot logged work on HIVE-23697:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 14:04
Start Date: 16/Jun/20 14:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1113:
URL: https://github.com/apache/hive/pull/1113#discussion_r440877189



##
File path: 
standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-3.2.0-to-4.0.0.mysql.sql
##
@@ -88,9 +88,9 @@ PREPARE stmt FROM @s;
 EXECUTE stmt;
 DEALLOCATE PREPARE stmt;
 RENAME TABLE NEXT_TXN_ID TO TXN_LOCK_TBL;
-ALTER TABLE TXN_LOCK_TBL RENAME COLUMN NTXN_NEXT TO TXN_LOCK;
+ALTER TABLE TXN_LOCK_TBL CHANGE NTXN_NEXT TXN_LOCK bigint;

Review comment:
   My guess is that this was caused by the difference between MySQL and 
MariaDB. If I remember correctly the test uses MariaDB. Could you please check 
manually with MySQL 5 and MySQL 8?
   Thanks,
   Peter





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446533)
Time Spent: 20m  (was: 10m)

> Fix errors in the metastore upgrade script
> --
>
> Key: HIVE-23697
> URL: https://issues.apache.org/jira/browse/HIVE-23697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23697.01.patch, HIVE-23697.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix a missing column separator in oracle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23697) Fix errors in the metastore upgrade script

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23697?focusedWorklogId=446552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446552
 ]

ASF GitHub Bot logged work on HIVE-23697:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 14:40
Start Date: 16/Jun/20 14:40
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1113:
URL: https://github.com/apache/hive/pull/1113#discussion_r440903770



##
File path: 
standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-3.2.0-to-4.0.0.mysql.sql
##
@@ -88,9 +88,9 @@ PREPARE stmt FROM @s;
 EXECUTE stmt;
 DEALLOCATE PREPARE stmt;
 RENAME TABLE NEXT_TXN_ID TO TXN_LOCK_TBL;
-ALTER TABLE TXN_LOCK_TBL RENAME COLUMN NTXN_NEXT TO TXN_LOCK;
+ALTER TABLE TXN_LOCK_TBL CHANGE NTXN_NEXT TXN_LOCK bigint;

Review comment:
   In Mysql 8 both the syntax works fine but in MySql 5.7 Rename fails.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446552)
Time Spent: 40m  (was: 0.5h)

> Fix errors in the metastore upgrade script
> --
>
> Key: HIVE-23697
> URL: https://issues.apache.org/jira/browse/HIVE-23697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23697.01.patch, HIVE-23697.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix a missing column separator in oracle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=446568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446568
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:02
Start Date: 16/Jun/20 15:02
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1084:
URL: https://github.com/apache/hive/pull/1084#discussion_r440921202



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -1539,15 +1539,14 @@ public void 
testCheckPointingWithSourceTableDataInserted() throws Throwable {
 .run("insert into t2 values (24)")
 .run("insert into t1 values (4)")
 .dump(primaryDbName, dumpClause);
-
+assertEquals(modifiedTimeTable1CopyFile, 
fs.listStatus(tablet1Path)[0].getModificationTime());

Review comment:
   because data is moved now. We are checking this late





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446568)
Time Spent: 2h  (was: 1h 50m)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=446569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446569
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:02
Start Date: 16/Jun/20 15:02
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1084:
URL: https://github.com/apache/hive/pull/1084#discussion_r440921354



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -1596,6 +1595,10 @@ public void testCheckPointingWithNewTablesAdded() throws 
Throwable {
 .run("insert into t3 values (3)")
 .dump(primaryDbName, dumpClause);
 
+assertEquals(modifiedTimeTable1, 
fs.getFileStatus(tablet1Path).getModificationTime());

Review comment:
   Same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446569)
Time Spent: 2h 10m  (was: 2h)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=446567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446567
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:01
Start Date: 16/Jun/20 15:01
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1084:
URL: https://github.com/apache/hive/pull/1084#discussion_r440920857



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExclusiveReplica.java
##
@@ -179,6 +179,63 @@ public void externalTableReplicationWithLocalStaging() 
throws Throwable {
 .verifyResult("800");
   }
 
+  @Test

Review comment:
   Comment not clear





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446567)
Time Spent: 1h 50m  (was: 1h 40m)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=446576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446576
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:13
Start Date: 16/Jun/20 15:13
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1084:
URL: https://github.com/apache/hive/pull/1084#discussion_r440929491



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##
@@ -243,29 +246,34 @@ private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc
   : LoadFileType.OVERWRITE_EXISTING);
   stagingDir = 
PathUtils.getExternalTmpPath(replicaWarehousePartitionLocation, 
context.pathInfo);
 }
-
-Task copyTask = ReplCopyTask.getLoadCopyTask(
-event.replicationSpec(),
-new Path(event.dataPath() + Path.SEPARATOR + 
getPartitionName(sourceWarehousePartitionLocation)),
-stagingDir,
-context.hiveConf, false, false
-);
-
+Path partDataSrc = new Path(event.dataPath() + File.separator + 
getPartitionName(sourceWarehousePartitionLocation));
+Path moveSource = performOnlyMove ? partDataSrc : stagingDir;
 Task movePartitionTask = null;
 if (loadFileType != LoadFileType.IGNORE) {
   // no need to create move task, if file is moved directly to target 
location.
-  movePartitionTask = movePartitionTask(table, partSpec, stagingDir, 
loadFileType);
+  movePartitionTask = movePartitionTask(table, partSpec, moveSource, 
loadFileType);
 }
-
-if (ptnRootTask == null) {
-  ptnRootTask = copyTask;
+if (performOnlyMove) {
+  if (ptnRootTask == null) {
+ptnRootTask = addPartTask;

Review comment:
   Not changing anything with respect to addPartTask. Just the copy-move is 
getting replaced with single move. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446576)
Time Spent: 2.5h  (was: 2h 20m)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=446577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446577
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:13
Start Date: 16/Jun/20 15:13
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1124:
URL: https://github.com/apache/hive/pull/1124#discussion_r440929783



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.JaninoRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveDefaultTezModelRelMetadataProvider;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Rule to trigger {@link HiveCardinalityPreservingJoinOptimization} on top of 
the plan.
+ */
+public class HiveCardinalityPreservingJoinRule extends HiveFieldTrimmerRule {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveCardinalityPreservingJoinRule.class);
+
+  private final double factor;
+
+  public HiveCardinalityPreservingJoinRule(double factor) {
+super(false, "HiveCardinalityPreservingJoinRule");
+this.factor = Math.max(factor, 0.0);
+  }
+
+  @Override
+  protected RelNode trim(RelOptRuleCall call, RelNode node) {
+RelNode optimized = new 
HiveCardinalityPreservingJoinOptimization().trim(call.builder(), node);
+if (optimized == node) {
+  return node;
+}
+
+JaninoRelMetadataProvider original = 
RelMetadataQuery.THREAD_PROVIDERS.get();

Review comment:
   Can we move this to the top of the method and put the 
`RelMetadataQuery.THREAD_PROVIDERS.set(original);` in a finally block? If there 
is an exception while getting the stats (for instance, if stats are not 
available), we should make sure that we are setting it back to the original one.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446577)
Time Spent: 1h 10m  (was: 1h)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>

[jira] [Assigned] (HIVE-19546) Enable TestStats#partitionedTableDeprecatedCalls, TestStats#partitionedTableInHiveCatalog, and TestStats#partitionedTableOtherCatalog

2020-06-16 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman reassigned HIVE-19546:
---

Assignee: John Sherman

> Enable TestStats#partitionedTableDeprecatedCalls, 
> TestStats#partitionedTableInHiveCatalog, and 
> TestStats#partitionedTableOtherCatalog
> -
>
> Key: HIVE-19546
> URL: https://issues.apache.org/jira/browse/HIVE-19546
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: John Sherman
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23697) Fix errors in the metastore upgrade script

2020-06-16 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23697:
---
Attachment: HIVE-23697.02.patch
Status: Patch Available  (was: In Progress)

> Fix errors in the metastore upgrade script
> --
>
> Key: HIVE-23697
> URL: https://issues.apache.org/jira/browse/HIVE-23697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23697.01.patch, HIVE-23697.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fix a missing column separator in oracle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23673) Maven Standard Directories for accumulo-handler

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23673?focusedWorklogId=446520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446520
 ]

ASF GitHub Bot logged work on HIVE-23673:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:39
Start Date: 16/Jun/20 13:39
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1088:
URL: https://github.com/apache/hive/pull/1088


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446520)
Time Spent: 40m  (was: 0.5h)

> Maven Standard Directories for accumulo-handler
> ---
>
> Key: HIVE-23673
> URL: https://issues.apache.org/jira/browse/HIVE-23673
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23602) Use Java Concurrent Package for Operation Handle Set

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23602?focusedWorklogId=446518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446518
 ]

ASF GitHub Bot logged work on HIVE-23602:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:39
Start Date: 16/Jun/20 13:39
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1055:
URL: https://github.com/apache/hive/pull/1055


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446518)
Time Spent: 50m  (was: 40m)

> Use Java Concurrent Package for Operation Handle Set
> 
>
> Key: HIVE-23602
> URL: https://issues.apache.org/jira/browse/HIVE-23602
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23697) Fix errors in the metastore upgrade script

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23697?focusedWorklogId=446536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446536
 ]

ASF GitHub Bot logged work on HIVE-23697:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 14:07
Start Date: 16/Jun/20 14:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1113:
URL: https://github.com/apache/hive/pull/1113#issuecomment-644788973


   Thanks for all your work around this!
   Could you please check if the mysql stuff works on mysql 5 and 8 too? At 
least manually, or maybe with specific tests?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446536)
Time Spent: 0.5h  (was: 20m)

> Fix errors in the metastore upgrade script
> --
>
> Key: HIVE-23697
> URL: https://issues.apache.org/jira/browse/HIVE-23697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23697.01.patch, HIVE-23697.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Fix a missing column separator in oracle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=446571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446571
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:03
Start Date: 16/Jun/20 15:03
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1084:
URL: https://github.com/apache/hive/pull/1084#discussion_r440922085



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##
@@ -320,7 +334,7 @@ private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc
   private String getPartitionName(Path partitionMetadataFullPath) {
 //Get partition name by removing the metadata base path.
 //Needed for getting the data path
-return 
partitionMetadataFullPath.toString().substring(event.metadataPath().toString().length());
+return 
partitionMetadataFullPath.toString().substring(event.metadataPath().toString().length()
 + 1);

Review comment:
   Existing method. just changing the index





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446571)
Time Spent: 2h 20m  (was: 2h 10m)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file

2020-06-16 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-23703:



> Major QB compaction with multiple FileSinkOperators results in data loss and 
> one original file
> --
>
> Key: HIVE-23703
> URL: https://issues.apache.org/jira/browse/HIVE-23703
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Critical
>  Labels: compaction
>
> h4. Problems
> Example:
> {code:java}
> drop table if exists tbl2;
> create transactional table tbl2 (a int, b int) clustered by (a) into 4 
> buckets stored as ORC 
> TBLPROPERTIES('transactional'='true','transactional_properties'='default');
> insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
> insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
> insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code}
> E.g. in the example above, bucketId=0 when a=2 and a=6.
> 1. Data loss 
>  In non-acid tables, an operator's temp files are named with their task id. 
> Because of this snippet, temp files in the FileSinkOperator for compaction 
> tables are identified by their bucket_id.
> {code:java}
> if (conf.isCompactionTable()) {
>  fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + 
> String.format(AcidUtils.BUCKET_DIGITS, bucketId),
>  isNativeTable(), isSkewedStoredAsSubDirectories);
>  } else {
>  fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), 
> isSkewedStoredAsSubDirectories);
>  }
> {code}
> So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and 
> not 00_0 and 00_1 as they would normally.
>  In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved 
> from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already 
> there with a=6 data, because it too is named bucket_0. You can see in the 
> logs:
> {code:java}
>  WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target 
> path 
> file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0
>  with a size 610 exists. Trying to delete it.
> {code}
> 2. Results in one original file
>  OrcFileMergeOperator merges the results of the FSOp into 1 file named 
> 00_0.
> h4. Fix
> 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0
> 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 
> 00_0, will merge all files named bucket_0 into one file named bucket_0, 
> and so on.
> 3. MoveTask will get rid of the taskId directories if present and only move 
> the bucket files in them, in case OrcMergeFileOp is not run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=446575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446575
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:10
Start Date: 16/Jun/20 15:10
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1124:
URL: https://github.com/apache/hive/pull/1124#discussion_r440927072



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.JaninoRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveDefaultTezModelRelMetadataProvider;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Rule to trigger {@link HiveCardinalityPreservingJoinOptimization} on top of 
the plan.
+ */
+public class HiveCardinalityPreservingJoinRule extends HiveFieldTrimmerRule {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveCardinalityPreservingJoinRule.class);
+
+  private final double factor;
+
+  public HiveCardinalityPreservingJoinRule(double factor) {
+super(false, "HiveCardinalityPreservingJoinRule");
+this.factor = Math.max(factor, 0.0);
+  }
+
+  @Override
+  protected RelNode trim(RelOptRuleCall call, RelNode node) {
+RelNode optimized = new 
HiveCardinalityPreservingJoinOptimization().trim(call.builder(), node);
+if (optimized == node) {
+  return node;
+}
+
+JaninoRelMetadataProvider original = 
RelMetadataQuery.THREAD_PROVIDERS.get();
+RelMetadataQuery.THREAD_PROVIDERS.set(getJaninoRelMetadataProvider());
+RelMetadataQuery metadataQuery = RelMetadataQuery.instance();
+
+RelOptCost optimizedCost = metadataQuery.getCumulativeCost(optimized);
+RelOptCost originalCost = metadataQuery.getCumulativeCost(node);
+originalCost = originalCost.multiplyBy(factor);
+LOG.debug("Original plan cost {} Optimized plan cost {}", originalCost, 
optimizedCost);

Review comment:
   nit. `Original plan cost: {} vs Optimized plan cost: {}` ? Or something 
like that so we can read it more clearly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446575)
Time Spent: 1h  (was: 50m)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is 

[jira] [Resolved] (HIVE-23592) Routine "makeIntPair" is Not Correct

2020-06-16 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23592.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged into master.  Thanks [~mgergely] for the review!

> Routine "makeIntPair" is Not Correct
> 
>
> Key: HIVE-23592
> URL: https://issues.apache.org/jira/browse/HIVE-23592
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code:java|title=BuddyAllocator.java}
>   // Utility methods used to store pairs of ints as long.
>   private static long makeIntPair(int first, int second) {
> return ((long)first) << 32 | second;
>   }
>   private static int getFirstInt(long result) {
> return (int) (result >>> 32);
>   }
>   private static int getSecondInt(long result) {
> return (int) (result & ((1L << 32) - 1));
>   }
> {code}
> {code:java}
> long result = LLAP.makeIntPair(Integer.MIN_VALUE, Integer.MIN_VALUE);
> if (LLAP.getFirstInt(result) != Integer.MIN_VALUE) {
>throw new Exception();
> }
> if (LLAP.getSecondInt(result) != Integer.MIN_VALUE) {
>   throw new Exception();
>}
> /*
>  * Exception in thread "main" java.lang.Exception
>  *at org.test.TestMe.main(TestMe.java:19)
>  */
> {code}
> [https://github.com/apache/hive/blob/4b670877c280b37c5776046f66d766079489b2a8/llap-server/src/java/org/apache/hadoop/hive/llap/cache/BuddyAllocator.java#L1677]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23673) Maven Standard Directories for accumulo-handler

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23673?focusedWorklogId=446522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446522
 ]

ASF GitHub Bot logged work on HIVE-23673:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:40
Start Date: 16/Jun/20 13:40
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1088:
URL: https://github.com/apache/hive/pull/1088


   
https://github.com/apache/hive/blob/4ead9d35eadc997b65ceeb64f1fa33c71e47070d/accumulo-handler/pom.xml#L177-L179
   
   I'm not sure why.  It should be `/src/main/java` `/src/test/java` by Maven 
convention.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446522)
Time Spent: 50m  (was: 40m)

> Maven Standard Directories for accumulo-handler
> ---
>
> Key: HIVE-23673
> URL: https://issues.apache.org/jira/browse/HIVE-23673
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23244?focusedWorklogId=446528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446528
 ]

ASF GitHub Bot logged work on HIVE-23244:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:54
Start Date: 16/Jun/20 13:54
Worklog Time Spent: 10m 
  Work Description: miklosgergely opened a new pull request #1125:
URL: https://github.com/apache/hive/pull/1125


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446528)
Remaining Estimate: 0h
Time Spent: 10m

> Extract Create View analyzer from SemanticAnalyzer
> --
>
> Key: HIVE-23244
> URL: https://issues.apache.org/jira/browse/HIVE-23244
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, 
> HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, 
> HIVE-23244.06.patch, HIVE-23244.07.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=446527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446527
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:53
Start Date: 16/Jun/20 13:53
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1084:
URL: https://github.com/apache/hive/pull/1084#discussion_r440855768



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -1539,15 +1539,14 @@ public void 
testCheckPointingWithSourceTableDataInserted() throws Throwable {
 .run("insert into t2 values (24)")
 .run("insert into t1 values (4)")
 .dump(primaryDbName, dumpClause);
-
+assertEquals(modifiedTimeTable1CopyFile, 
fs.listStatus(tablet1Path)[0].getModificationTime());

Review comment:
   Why is this changed?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##
@@ -243,29 +246,34 @@ private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc
   : LoadFileType.OVERWRITE_EXISTING);
   stagingDir = 
PathUtils.getExternalTmpPath(replicaWarehousePartitionLocation, 
context.pathInfo);
 }
-
-Task copyTask = ReplCopyTask.getLoadCopyTask(
-event.replicationSpec(),
-new Path(event.dataPath() + Path.SEPARATOR + 
getPartitionName(sourceWarehousePartitionLocation)),
-stagingDir,
-context.hiveConf, false, false
-);
-
+Path partDataSrc = new Path(event.dataPath() + File.separator + 
getPartitionName(sourceWarehousePartitionLocation));
+Path moveSource = performOnlyMove ? partDataSrc : stagingDir;
 Task movePartitionTask = null;
 if (loadFileType != LoadFileType.IGNORE) {
   // no need to create move task, if file is moved directly to target 
location.
-  movePartitionTask = movePartitionTask(table, partSpec, stagingDir, 
loadFileType);
+  movePartitionTask = movePartitionTask(table, partSpec, moveSource, 
loadFileType);
 }
-
-if (ptnRootTask == null) {
-  ptnRootTask = copyTask;
+if (performOnlyMove) {
+  if (ptnRootTask == null) {
+ptnRootTask = addPartTask;

Review comment:
   How was addPartTask added before your change?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExclusiveReplica.java
##
@@ -179,6 +179,63 @@ public void externalTableReplicationWithLocalStaging() 
throws Throwable {
 .verifyResult("800");
   }
 
+  @Test
+  public void testHdfsMoveOptimizationOnTargetStaging() throws Throwable {

Review comment:
   Check for empty staging as the data is moved.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -1596,6 +1595,10 @@ public void testCheckPointingWithNewTablesAdded() throws 
Throwable {
 .run("insert into t3 values (3)")
 .dump(primaryDbName, dumpClause);
 
+assertEquals(modifiedTimeTable1, 
fs.getFileStatus(tablet1Path).getModificationTime());

Review comment:
   Why is this changed?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##
@@ -320,7 +334,7 @@ private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc
   private String getPartitionName(Path partitionMetadataFullPath) {
 //Get partition name by removing the metadata base path.
 //Needed for getting the data path
-return 
partitionMetadataFullPath.toString().substring(event.metadataPath().toString().length());
+return 
partitionMetadataFullPath.toString().substring(event.metadataPath().toString().length()
 + 1);

Review comment:
   Can we do a split instead at '/'. This may be error prone

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
##
@@ -686,9 +700,25 @@ private static ImportTableDesc 
getBaseCreateTableDescFromTable(String dbName,
 loadTableWork.setInheritTableSpecs(false);
 moveWork.setLoadTableWork(loadTableWork);
   }
+  Task loadPartTask = TaskFactory.get(moveWork, x.getConf());
+  if (performOnlyMove) {
+if (addPartTask != null) {
+  addPartTask.addDependentTask(loadPartTask);
+}
+x.getTasks().add(loadPartTask);
+return addPartTask == null ? loadPartTask : addPartTask;
+  }
+
+  Task copyTask = null;
+  if (replicationSpec.isInReplicationScope()) {

Review comment:
   This check is already done before. Can be simplified

##
File path: 

[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=446581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446581
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:20
Start Date: 16/Jun/20 15:20
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1124:
URL: https://github.com/apache/hive/pull/1124#discussion_r440936180



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.JaninoRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveDefaultTezModelRelMetadataProvider;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Rule to trigger {@link HiveCardinalityPreservingJoinOptimization} on top of 
the plan.
+ */
+public class HiveCardinalityPreservingJoinRule extends HiveFieldTrimmerRule {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveCardinalityPreservingJoinRule.class);
+
+  private final double factor;
+
+  public HiveCardinalityPreservingJoinRule(double factor) {
+super(false, "HiveCardinalityPreservingJoinRule");
+this.factor = Math.max(factor, 0.0);

Review comment:
   `HiveCost` doesn't allow negative numbers:
   
https://github.com/apache/hive/blob/e74029d4fd5c4bfc50d33a8f1155ffacc151ba8f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java#L78
   
   But there is a check in `CalcitePlanner`: the rule is not added if the 
factor is 0 or negative
   ```
   if (factor > 0.0) {
   generatePartialProgram(program, false, HepMatchOrder.TOP_DOWN,
   new HiveCardinalityPreservingJoinRule(factor));
 }
   ``` 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446581)
Time Spent: 1h 20m  (was: 1h 10m)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>

[jira] [Work logged] (HIVE-23592) Routine "makeIntPair" is Not Correct

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23592?focusedWorklogId=446506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446506
 ]

ASF GitHub Bot logged work on HIVE-23592:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:23
Start Date: 16/Jun/20 13:23
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1052:
URL: https://github.com/apache/hive/pull/1052


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446506)
Time Spent: 1h 40m  (was: 1.5h)

> Routine "makeIntPair" is Not Correct
> 
>
> Key: HIVE-23592
> URL: https://issues.apache.org/jira/browse/HIVE-23592
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code:java|title=BuddyAllocator.java}
>   // Utility methods used to store pairs of ints as long.
>   private static long makeIntPair(int first, int second) {
> return ((long)first) << 32 | second;
>   }
>   private static int getFirstInt(long result) {
> return (int) (result >>> 32);
>   }
>   private static int getSecondInt(long result) {
> return (int) (result & ((1L << 32) - 1));
>   }
> {code}
> {code:java}
> long result = LLAP.makeIntPair(Integer.MIN_VALUE, Integer.MIN_VALUE);
> if (LLAP.getFirstInt(result) != Integer.MIN_VALUE) {
>throw new Exception();
> }
> if (LLAP.getSecondInt(result) != Integer.MIN_VALUE) {
>   throw new Exception();
>}
> /*
>  * Exception in thread "main" java.lang.Exception
>  *at org.test.TestMe.main(TestMe.java:19)
>  */
> {code}
> [https://github.com/apache/hive/blob/4b670877c280b37c5776046f66d766079489b2a8/llap-server/src/java/org/apache/hadoop/hive/llap/cache/BuddyAllocator.java#L1677]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23697) Fix errors in the metastore upgrade script

2020-06-16 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23697:
---
Status: In Progress  (was: Patch Available)

> Fix errors in the metastore upgrade script
> --
>
> Key: HIVE-23697
> URL: https://issues.apache.org/jira/browse/HIVE-23697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23697.01.patch, HIVE-23697.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fix a missing column separator in oracle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23026) support add a yarn application name for tez on hiveserver2

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23026?focusedWorklogId=446523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446523
 ]

ASF GitHub Bot logged work on HIVE-23026:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:46
Start Date: 16/Jun/20 13:46
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1082:
URL: https://github.com/apache/hive/pull/1082


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446523)
Time Spent: 4.5h  (was: 4h 20m)

> support add a yarn application name for tez on hiveserver2
> --
>
> Key: HIVE-23026
> URL: https://issues.apache.org/jira/browse/HIVE-23026
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Jake Xie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 3.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently tez on hiveServer2 cannot specify yarn application name, which is 
> not very convenient for locating the problem SQL, so i added a configuration 
> item to support setting tez job name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23244:
--
Labels: pull-request-available  (was: )

> Extract Create View analyzer from SemanticAnalyzer
> --
>
> Key: HIVE-23244
> URL: https://issues.apache.org/jira/browse/HIVE-23244
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, 
> HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, 
> HIVE-23244.06.patch, HIVE-23244.07.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=446513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446513
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 13:35
Start Date: 16/Jun/20 13:35
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1124:
URL: https://github.com/apache/hive/pull/1124


   Testing done:
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestTezPerfConstraintsCliDriver 
-Dqfile=cbo_query4.q,cbo_query11.q,cbo_query74.q,query4.q,query11.q,query74.q 
-pl itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446513)
Time Spent: 40m  (was: 0.5h)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumcs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2)
>  ) year_total
>,'c' sale_type
>  from customer
>  ,catalog_sales
>  ,date_dim
>  where c_customer_sk = cs_bill_customer_sk
>and cs_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
> union all
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sumws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
>  ) year_total
>,'w' sale_type
>  from customer
>  ,web_sales
>  ,date_dim
>  where c_customer_sk = ws_bill_customer_sk
>and ws_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  ,c_last_name
>  ,c_preferred_cust_flag
>  ,c_birth_country
>  ,c_login
>  ,c_email_address
>  ,d_year
>  )
>   select  
>   t_s_secyear.customer_id
>  

[jira] [Work logged] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23493?focusedWorklogId=446573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446573
 ]

ASF GitHub Bot logged work on HIVE-23493:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 15:08
Start Date: 16/Jun/20 15:08
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1124:
URL: https://github.com/apache/hive/pull/1124#discussion_r440925660



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.JaninoRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveDefaultTezModelRelMetadataProvider;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Rule to trigger {@link HiveCardinalityPreservingJoinOptimization} on top of 
the plan.
+ */
+public class HiveCardinalityPreservingJoinRule extends HiveFieldTrimmerRule {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveCardinalityPreservingJoinRule.class);
+
+  private final double factor;
+
+  public HiveCardinalityPreservingJoinRule(double factor) {
+super(false, "HiveCardinalityPreservingJoinRule");
+this.factor = Math.max(factor, 0.0);

Review comment:
   Maybe you should allow negative numbers (e.g., -1) to disable the 
optimization completely.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446573)
Time Spent: 50m  (was: 40m)

> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> --
>
> Key: HIVE-23493
> URL: https://issues.apache.org/jira/browse/HIVE-23493
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23493.1.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Queries with a pattern where one or more tables joins with a fact table in a 
> CTE. Many columns are projected out those tables and then grouped in the CTE. 
>  The main query joins multiple instances of the CTE and may project a subset 
> of these.
> The optimization is to rewrite the CTE to include only key (PK, non null 
> Unique Key) columns and join the tables back to the resultset of the main 
> query to fetch the rest of the wide columns. This reduces the datasize of the 
> joined back tables that is broadcast/shuffled throughout the DAG processing.
> Example query, tpc-ds query4
> {code}
> with year_total as (
>  select c_customer_id customer_id
>,c_first_name customer_first_name
>,c_last_name customer_last_name
>,c_preferred_cust_flag customer_preferred_cust_flag
>,c_birth_country customer_birth_country
>,c_login customer_login
>,c_email_address customer_email_address
>,d_year dyear
>
> ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
>  year_total
>,'s' sale_type
>  from customer
>  ,store_sales
>  ,date_dim
>  where c_customer_sk = ss_customer_sk
>and ss_sold_date_sk = d_date_sk
>  group by c_customer_id
>  ,c_first_name
>  

  1   2   3   >