[jira] [Work logged] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24210?focusedWorklogId=492791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492791
 ]

ASF GitHub Bot logged work on HIVE-24210:
-

Author: ASF GitHub Bot
Created on: 30/Sep/20 05:40
Start Date: 30/Sep/20 05:40
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #1536:
URL: https://github.com/apache/hive/pull/1536


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492791)
Remaining Estimate: 0h
Time Spent: 10m

> PartitionManagementTask fails if one of tables dropped after fetching 
> TableMeta
> ---
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24210:
--
Labels: pull-request-available  (was: )

> PartitionManagementTask fails if one of tables dropped after fetching 
> TableMeta
> ---
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204472#comment-17204472
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~abstractdog] Could you please review the PR?

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-29 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~gopalv] [~prasanth_j] Gentle reminder :)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-29 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~gopalv] [~rajesh.balamohan] ping for review request!)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetch TableMeta

2020-09-29 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-24210:
-


> PartitionManagementTask fails if one of tables dropped after fetch TableMeta
> 
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
>  
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta

2020-09-29 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24210:
--
Summary: PartitionManagementTask fails if one of tables dropped after 
fetching TableMeta  (was: PartitionManagementTask fails if one of tables 
dropped after fetch TableMeta)

> PartitionManagementTask fails if one of tables dropped after fetching 
> TableMeta
> ---
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
>  
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204451#comment-17204451
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~jcamachorodriguez] Could you please review the PR?

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> 

[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Summary: Incorrect search argument conversion for NOT BETWEEN operation 
when vectorization is enabled  (was: Search argument conversion is incorrect 
for NOT BETWEEN operation when vectorization is enabled)

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA

2020-09-29 Thread lithiumlee-_- (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204413#comment-17204413
 ] 

lithiumlee-_- edited comment on HIVE-16220 at 9/30/20, 3:05 AM:


Same problem in Hive 2.3.4

Too mange instances of "java.util.Hashtable$Entry", 
"java.util.concurrent.ConrrentHashMap$Node"


was (Author: lithiumlee-_-):
Same promble in Hive 2.3.4

Too mange instances of "java.util.Hashtable$Entry", 
"java.util.concurrent.ConrrentHashMap$Node"

> Memory leak when creating a table using location and NameNode in HA
> ---
>
> Key: HIVE-16220
> URL: https://issues.apache.org/jira/browse/HIVE-16220
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 3.0.0
> Environment: HDP-2.4.0.0
> HDP-3.1.0.0
>Reporter: Angel Alvarez Pascua
>Priority: Major
>
> The following simple DDL
> CREATE TABLE `test`(`field` varchar(1)) LOCATION 
> 'hdfs://benderHA/apps/hive/warehouse/test'
> ends up generating a huge memory leak in the HiveServer2 service.
> After two weeks without a restart, the service stops suddenly because of 
> OutOfMemory errors.
> This only happens when we're in an environment in which the NameNode is in 
> HA,  otherwise, nothing (so weird) happens. If the location clause is not 
> present, everything is also fine.
> It seems, multiples instances of Hadoop configuration are created when we're 
> in an HA environment:
> 
> 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 350.263.816 (81,66%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""
> 
> 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 699.901.416 (87,32%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA

2020-09-29 Thread lithiumlee-_- (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204413#comment-17204413
 ] 

lithiumlee-_- edited comment on HIVE-16220 at 9/30/20, 3:01 AM:


Same promble in Hive 2.3.4

Too mange instances of "java.util.Hashtable$Entry", 
"java.util.concurrent.ConrrentHashMap$Node"


was (Author: lithiumlee-_-):
Same promble in Hive 2.3.4

!image-2020-09-30-10-59-42-324.png!

> Memory leak when creating a table using location and NameNode in HA
> ---
>
> Key: HIVE-16220
> URL: https://issues.apache.org/jira/browse/HIVE-16220
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 3.0.0
> Environment: HDP-2.4.0.0
> HDP-3.1.0.0
>Reporter: Angel Alvarez Pascua
>Priority: Major
>
> The following simple DDL
> CREATE TABLE `test`(`field` varchar(1)) LOCATION 
> 'hdfs://benderHA/apps/hive/warehouse/test'
> ends up generating a huge memory leak in the HiveServer2 service.
> After two weeks without a restart, the service stops suddenly because of 
> OutOfMemory errors.
> This only happens when we're in an environment in which the NameNode is in 
> HA,  otherwise, nothing (so weird) happens. If the location clause is not 
> present, everything is also fine.
> It seems, multiples instances of Hadoop configuration are created when we're 
> in an HA environment:
> 
> 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 350.263.816 (81,66%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""
> 
> 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 699.901.416 (87,32%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA

2020-09-29 Thread lithiumlee-_- (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204413#comment-17204413
 ] 

lithiumlee-_- commented on HIVE-16220:
--

Same promble in Hive 2.3.4

!image-2020-09-30-10-59-42-324.png!

> Memory leak when creating a table using location and NameNode in HA
> ---
>
> Key: HIVE-16220
> URL: https://issues.apache.org/jira/browse/HIVE-16220
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 3.0.0
> Environment: HDP-2.4.0.0
> HDP-3.1.0.0
>Reporter: Angel Alvarez Pascua
>Priority: Major
>
> The following simple DDL
> CREATE TABLE `test`(`field` varchar(1)) LOCATION 
> 'hdfs://benderHA/apps/hive/warehouse/test'
> ends up generating a huge memory leak in the HiveServer2 service.
> After two weeks without a restart, the service stops suddenly because of 
> OutOfMemory errors.
> This only happens when we're in an environment in which the NameNode is in 
> HA,  otherwise, nothing (so weird) happens. If the location clause is not 
> present, everything is also fine.
> It seems, multiples instances of Hadoop configuration are created when we're 
> in an HA environment:
> 
> 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 350.263.816 (81,66%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""
> 
> 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 699.901.416 (87,32%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-09-29 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24069:
---
Issue Type: Bug  (was: Improvement)

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-09-29 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24069:
---
Priority: Minor  (was: Major)

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23893?focusedWorklogId=492754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492754
 ]

ASF GitHub Bot logged work on HIVE-23893:
-

Author: ASF GitHub Bot
Created on: 30/Sep/20 00:49
Start Date: 30/Sep/20 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-701095278


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492754)
Time Spent: 2h  (was: 1h 50m)

> Extract deterministic conditions for pdd when the predicate contains 
> non-deterministic function
> ---
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: zhishui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Taken the following query for example, assume unix_timestamp is 
> non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 
> 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down 
> to join op, make the optimizer unable to prune partitions, which may result  
> to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23900) Replace Base64 in exec Package

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23900?focusedWorklogId=492755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492755
 ]

ASF GitHub Bot logged work on HIVE-23900:
-

Author: ASF GitHub Bot
Created on: 30/Sep/20 00:49
Start Date: 30/Sep/20 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1297:
URL: https://github.com/apache/hive/pull/1297


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492755)
Time Spent: 1.5h  (was: 1h 20m)

> Replace Base64 in exec Package
> --
>
> Key: HIVE-23900
> URL: https://issues.apache.org/jira/browse/HIVE-23900
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-13875) Beeline ignore where clause when it is the last line of file and missing a EOL hence give wrong query result

2020-09-29 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-13875:
--

Assignee: Zhihua Deng

> Beeline ignore where clause when it is the last line of file and missing a 
> EOL hence give wrong query result
> 
>
> Key: HIVE-13875
> URL: https://issues.apache.org/jira/browse/HIVE-13875
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Lu Ji
>Assignee: Zhihua Deng
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
>
> Steps to reproduce:
> Say we have a simple table:
> {code}
> select * from lji.lu_test;
> +---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.04 seconds)
> {code}
> We have a simple query in a file. But note this file missing the last EOL.
> {code}
> cat -A test.hql
> use lji;$
> select * from lu_test$
> where country='us';[lji@~]$
> {code}
> Then if we execute file using both hive CLI and beeline + HS2, we have 
> different result.
> {code}
> [lji@~]$ hive -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Logging initialized using configuration in 
> file:/etc/hive/2.3.4.7-4/0/hive-log4j.properties
> OK
> Time taken: 1.624 seconds
> OK
> johnus
> Time taken: 1.482 seconds, Fetched: 1 row(s)
> [lji@~]$ beeline -u "jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX" 
> -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Connecting to jdbc:hive2://XXXl:1/default;principal=hive/_HOST@XXX
> Connected to: Apache Hive (version 1.2.1.2.3.4.7-4)
> Driver: Hive JDBC (version 1.2.1.2.3.4.7-4)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://XXX> use lji;
> No rows affected (0.06 seconds)
> 0: jdbc:hive2://XXX> select * from lu_test
> 0: jdbc:hive2://XXX> where 
> country='us';+---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.073 seconds)
> 0: jdbc:hive2://XXX>
> Closing: 0: jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX
> {code}
> Obviously, beeline gave the wrong result. It ignore the where clause in the 
> last line.
> I know it is quit weird for a file missing the last EOL, but for whatever 
> reason, we kind of having quit some files in this state. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23815) output statistics of underlying datastore

2020-09-29 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-23815:
--

Assignee: Zhihua Deng  (was: Rossetti Wong)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=492679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492679
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 21:06
Start Date: 29/Sep/20 21:06
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r497059455



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,77 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator joins the output of select with the output of 
UDTF.

Review comment:
   Could you provide a bit more details about what the rule does? Most of 
the other rules in this class give a general overview of the cost model they 
implement.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492679)
Time Spent: 0.5h  (was: 20m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24206) DefaultGraphWalker IdentityHashMap causes thread idling

2020-09-29 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204287#comment-17204287
 ] 

Stamatis Zampetakis commented on HIVE-24206:


Do you have some profiler snapshots to attach to this issue? Have you tried 
reproducing the problem with current apache master? 

It seems a bit related to HIVE-24031.

> DefaultGraphWalker IdentityHashMap causes thread idling
> ---
>
> Key: HIVE-24206
> URL: https://issues.apache.org/jira/browse/HIVE-24206
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.3.4
>Reporter: Zhou
>Priority: Major
>
> the sql is very long,about 3M size,use hive parse to parse the sql, it costs 
> more than one hour to finish it. The reason is hive parse use 
> IdentityHashMap, is not multi-thread safe, than it costs more than one hour  
> to execute the code "while(true)"
>  
> my code is :
> ParseDriver pd = new ParseDriver();
> ASTNode tree = pd.parse(query, context);
> while ((tree.getToken() == null) && (tree.getChildCount() > 0))
> { tree = (ASTNode) tree.getChild(0); }
> Map rules = new LinkedHashMap<>();
> Dispatcher disp = new DefaultRuleDispatcher(this, rules, null);
> GraphWalker ogw = new DefaultGraphWalker(disp);
> List topNodes = new ArrayList<>();
> topNodes.add(tree);
> ogw.startWalking(topNodes, null);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20137) Truncate for Transactional tables should use base_x

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20137?focusedWorklogId=492651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492651
 ]

ASF GitHub Bot logged work on HIVE-20137:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 20:16
Start Date: 29/Sep/20 20:16
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1532:
URL: https://github.com/apache/hive/pull/1532#issuecomment-700962064


   @pvary @kuczoram could you review this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492651)
Time Spent: 20m  (was: 10m)

> Truncate for Transactional tables should use base_x
> ---
>
> Key: HIVE-20137
> URL: https://issues.apache.org/jira/browse/HIVE-20137
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow up to HIVE-19387.
> Once we have a lock that blocks writers but not readers (HIVE-19369), it 
> would make sense to make truncate create a new base_x, where is x is a 
> writeId in current txn - the same as Insert Overwrite does.
> This would mean it can work w/o interfering with existing writers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204104#comment-17204104
 ] 

Ganesha Shreedhara commented on HIVE-24209:
---

[~pxiong], [~ashutoshc] Please review the PR. 

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: (was: HIVE-24209.patch)

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: (was: orc_test_ppd)

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?focusedWorklogId=492535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492535
 ]

ASF GitHub Bot logged work on HIVE-24209:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 16:12
Start Date: 29/Sep/20 16:12
Worklog Time Spent: 10m 
  Work Description: ganeshashree opened a new pull request #1535:
URL: https://github.com/apache/hive/pull/1535


   
   
   ### What changes were proposed in this pull request?
   
   We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
operation when vectorization is enabled because of the improvement done as part 
of HIVE-15884. Handling the same during the conversion of filter expression to 
search argument.
   Refer: HIVE-24209
   
   
   ### Why are the changes needed?
   
   
   Not considering the boolean value set in first child of GenericUDFBetween 
while converting filter expression to search argument will lead to incorrect 
predicate (without NOT operation) getting pushed down to storage layer when a 
query has NOT BETWEEN operator in filter expression and vectorization is 
enabled.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Added dataset and test to reproduce and verify that we get the correct 
result when a query has NOT BETWEEN operator in filter expression and 
vectorization is enabled.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492535)
Remaining Estimate: 0h
Time Spent: 10m

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch, orc_test_ppd
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24209:
--
Labels: pull-request-available  (was: )

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch, orc_test_ppd
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24157?focusedWorklogId=492499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492499
 ]

ASF GitHub Bot logged work on HIVE-24157:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 15:04
Start Date: 29/Sep/20 15:04
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1497:
URL: https://github.com/apache/hive/pull/1497#discussion_r496775522



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.udf;
+
+import java.lang.reflect.Method;
+import java.util.List;
+
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFMethodResolver;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping;
+import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+
+public class TimestampCastRestrictorResolver implements UDFMethodResolver {

Review comment:
   Could we add a comment to the class?

##
File path: ql/src/test/queries/clientnegative/strict_numeric_to_timestamp.q
##
@@ -0,0 +1,2 @@
+set hive.strict.timestamp.conversion=true;
+select cast(123 as timestamp);

Review comment:
   Can we add a couple of negative tests on column ref instead of constant? 
And/or with some complex expressions?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492499)
Time Spent: 0.5h  (was: 20m)

> Strict mode to fail on CAST timestamp <-> numeric
> -
>
> Key: HIVE-24157
> URL: https://issues.apache.org/jira/browse/HIVE-24157
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jesus Camacho Rodriguez
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is some interest in enforcing that CAST numeric <\-> timestamp is 
> disallowed to avoid confusion among users, e.g., SQL standard does not allow 
> numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc.
> We should introduce a strict config for timestamp (similar to others before): 
> If the config is true, we shall fail while compiling the query with a 
> meaningful message.
> To provide similar behavior, Hive has multiple functions that provide clearer 
> semantics for numeric to timestamp conversion (and vice versa):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-09-29 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204008#comment-17204008
 ] 

Karen Coppage commented on HIVE-23832:
--

To clarify – "blocking compaction" means that instead of running :

 
{code:java}
ALTER TABLE table_name COMPACT 'major';{code}
this is run:

 
{code:java}
ALTER TABLE table_name COMPACT 'major' AND WAIT;
{code}

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=492493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492493
 ]

ASF GitHub Bot logged work on HIVE-24154:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 14:40
Start Date: 29/Sep/20 14:40
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1492:
URL: https://github.com/apache/hive/pull/1492#issuecomment-700751767


   @kgyrtkirk , addressed your last comment, let me know what you think of 
latest version. Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492493)
Time Spent: 3h  (was: 2h 50m)

> Missing simplification opportunity with IN and EQUALS clauses
> -
>
> Key: HIVE-24154
> URL: https://issues.apache.org/jira/browse/HIVE-24154
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> For instance, in perf driver CBO query 74, there are several filters that 
> could be simplified further:
> {code}
> HiveFilter(condition=[AND(=($1, 1999), IN($1, 1998, 1999))])
> {code}
> This may lead to incorrect estimates and leads to unnecessary execution time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23879) Data has been lost after table location was altered

2020-09-29 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23879 started by Ashish Sharma.

> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Assignee: Ashish Sharma
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code:sql}
> 1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;"
> 2. create test db:
>  create database dbtest1 location 'hdfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);
> {code}
> Actual result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
>  1 row selected (0.097 seconds)
> {code}
> Expected result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
> |1            |
> ++
>  1 row selected (0.097 seconds)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23879) Data has been lost after table location was altered

2020-09-29 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-23879:


Assignee: Ashish Sharma

> Data has been lost after table location was altered
> ---
>
> Key: HIVE-23879
> URL: https://issues.apache.org/jira/browse/HIVE-23879
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Demyd
>Assignee: Ashish Sharma
>Priority: Major
>
> When I alter location for not empty table and inserts data to it. I don't see 
> old data at work with hs2. But I can find there in maprfs by old table 
> location.
> Steps to reproduce:
> {code:sql}
> 1. connect to hs2 by beeline"
>  hive --service beeline -u "jdbc:hive2://:1/;"
> 2. create test db:
>  create database dbtest1 location 'hdfs:///dbtest1.db';
> 3. create test table:
>  create table dbtest1.t1 (id int);
> 4. insert data to table:
>  insert into dbtest1.t1 (id) values (1);
> 5. set new table location:
>  alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';
> 6. insert data to table:
>  insert into dbtest1.t1 (id) values (2);
> {code}
> Actual result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
>  1 row selected (0.097 seconds)
> {code}
> Expected result:
> {code:sql}
> jdbc:hive2://:> select * from dbtest1.t1;
>  ++
> |t1.id      |
> ++
> |2            |
> ++
> |1            |
> ++
>  1 row selected (0.097 seconds)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24124) NPE occurs when bucket_version different bucket tables are joined

2020-09-29 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24124 started by Ashish Sharma.

> NPE occurs when bucket_version different bucket tables are joined
> -
>
> Key: HIVE-24124
> URL: https://issues.apache.org/jira/browse/HIVE-24124
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: GuangMing Lu
>Assignee: Ashish Sharma
>Priority: Major
>
> {code:java}
> create table z_tab_1(
>     task_id  string,    
>     data_date  string,  
>     accno  string,  
>     curr_type  string,  
>     ifrs9_pd12_value  double,
>     ifrs9_ccf_value  double,
>     ifrs9_lgd_value  double
> )partitioned by(pt_dt string)
> STORED AS ORCFILE
> TBLPROPERTIES ('bucketing_version'='1');
> alter table z_tab_1 add partition(pt_dt = '2020-7-31');
> insert into z_tab_1 partition(pt_dt = '2020-7-31') values
> ('123','2020-7-31','accno-','curr_type-x', 0.1, 0.2 ,0.3),
> ('1','2020-1-31','a','1-curr_type-a', 0.1, 0.2 ,0.3),
> ('2','2020-2-31','b','2-curr_type-b', 0.1, 0.2 ,0.3),
> ('3','2020-3-31','c','3-curr_type-c', 0.1, 0.2 ,0.3),
> ('4','2020-4-31','d','4-curr_type-d', 0.1, 0.2 ,0.3),
> ('5','2020-5-31','e','5-curr_type-e', 0.1, 0.2 ,0.3),
> ('6','2020-6-31','f','6-curr_type-f', 0.1, 0.2 ,0.3),
> ('7','2020-7-31','g','7-curr_type-g', 0.1, 0.2 ,0.3),
> ('8','2020-8-31','h','8-curr_type-h', 0.1, 0.2 ,0.3),
> ('9','2020-9-31','i','9-curr_type-i', 0.1, 0.2 ,0.3);
> drop table if exists z_tab_2;
> CREATE TABLE z_tab_2(  
>     task_id  string,    
>     data_date  string,  
>     accno  string,  
>     curr_type  string,  
>     ifrs9_pd12_value  double,   
>     ifrs9_ccf_value  double,    
>     ifrs9_lgd_value  double
> ) 
> CLUSTERED BY (TASK_ID, DATA_DATE, ACCNO, CURR_TYPE)  SORTED by (TASK_ID, 
> ACCNO, CURR_TYPE) INTO 2000 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS ORCFILE;
> set hive.enforce.bucketing=true;
> INSERT OVERWRITE TABLE z_tab_2
> SELECT  DCCR.TASK_ID
>    ,DCCR.DATA_DATE
>    ,DCCR.ACCNO
>    ,DCCR.CURR_TYPE
>    ,DCCR.IFRS9_PD12_VALUE
>    ,DCCR.IFRS9_CCF_VALUE
>    ,DCCR.IFRS9_LGD_VALUE 
> FROM z_tab_1 DCCR
> WHERE pt_dt = '2020-7-31';
> {code}
> {noformat}
> Caused by: java.lang.NullPointerException  
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:1072)
>   
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:988)
>   
> at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)  
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)  
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)  
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) 
>  
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:237) 
>  
> ... 7 more{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24124) NPE occurs when bucket_version different bucket tables are joined

2020-09-29 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-24124:


Assignee: Ashish Sharma

> NPE occurs when bucket_version different bucket tables are joined
> -
>
> Key: HIVE-24124
> URL: https://issues.apache.org/jira/browse/HIVE-24124
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: GuangMing Lu
>Assignee: Ashish Sharma
>Priority: Major
>
> {code:java}
> create table z_tab_1(
>     task_id  string,    
>     data_date  string,  
>     accno  string,  
>     curr_type  string,  
>     ifrs9_pd12_value  double,
>     ifrs9_ccf_value  double,
>     ifrs9_lgd_value  double
> )partitioned by(pt_dt string)
> STORED AS ORCFILE
> TBLPROPERTIES ('bucketing_version'='1');
> alter table z_tab_1 add partition(pt_dt = '2020-7-31');
> insert into z_tab_1 partition(pt_dt = '2020-7-31') values
> ('123','2020-7-31','accno-','curr_type-x', 0.1, 0.2 ,0.3),
> ('1','2020-1-31','a','1-curr_type-a', 0.1, 0.2 ,0.3),
> ('2','2020-2-31','b','2-curr_type-b', 0.1, 0.2 ,0.3),
> ('3','2020-3-31','c','3-curr_type-c', 0.1, 0.2 ,0.3),
> ('4','2020-4-31','d','4-curr_type-d', 0.1, 0.2 ,0.3),
> ('5','2020-5-31','e','5-curr_type-e', 0.1, 0.2 ,0.3),
> ('6','2020-6-31','f','6-curr_type-f', 0.1, 0.2 ,0.3),
> ('7','2020-7-31','g','7-curr_type-g', 0.1, 0.2 ,0.3),
> ('8','2020-8-31','h','8-curr_type-h', 0.1, 0.2 ,0.3),
> ('9','2020-9-31','i','9-curr_type-i', 0.1, 0.2 ,0.3);
> drop table if exists z_tab_2;
> CREATE TABLE z_tab_2(  
>     task_id  string,    
>     data_date  string,  
>     accno  string,  
>     curr_type  string,  
>     ifrs9_pd12_value  double,   
>     ifrs9_ccf_value  double,    
>     ifrs9_lgd_value  double
> ) 
> CLUSTERED BY (TASK_ID, DATA_DATE, ACCNO, CURR_TYPE)  SORTED by (TASK_ID, 
> ACCNO, CURR_TYPE) INTO 2000 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS ORCFILE;
> set hive.enforce.bucketing=true;
> INSERT OVERWRITE TABLE z_tab_2
> SELECT  DCCR.TASK_ID
>    ,DCCR.DATA_DATE
>    ,DCCR.ACCNO
>    ,DCCR.CURR_TYPE
>    ,DCCR.IFRS9_PD12_VALUE
>    ,DCCR.IFRS9_CCF_VALUE
>    ,DCCR.IFRS9_LGD_VALUE 
> FROM z_tab_1 DCCR
> WHERE pt_dt = '2020-7-31';
> {code}
> {noformat}
> Caused by: java.lang.NullPointerException  
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:1072)
>   
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:988)
>   
> at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)  
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)  
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)  
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) 
>  
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:237) 
>  
> ... 7 more{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: orc_test_ppd

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch, orc_test_ppd
>
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Status: Patch Available  (was: Open)

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch
>
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: HIVE-24209.patch

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch
>
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-24209:
-


> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24208) LLAP: query job stuck due to race conditions

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24208:
--
Labels: pull-request-available  (was: )

> LLAP: query job stuck due to race conditions
> 
>
> Key: HIVE-24208
> URL: https://issues.apache.org/jira/browse/HIVE-24208
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When issuing an LLAP query, sometimes the TEZ job on LLAP server never ends 
> and it never returns the data reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24208) LLAP: query job stuck due to race conditions

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24208?focusedWorklogId=492393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492393
 ]

ASF GitHub Bot logged work on HIVE-24208:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 11:36
Start Date: 29/Sep/20 11:36
Worklog Time Spent: 10m 
  Work Description: bymm opened a new pull request #1534:
URL: https://github.com/apache/hive/pull/1534


   
   
   ### What changes were proposed in this pull request?
   
This PR tries to fix the bug. The bug manifests itself in the way that 
sometimes LLAP client query never gets result and the related Tez job hangs on 
the LLAP server. 
   
   ### Why are the changes needed?
   
   It fixes the bug that caused the Tez job that served LLAP query, never ended 
in some circumstances.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   IT was tested on AWS EMR 5.23 with Hive 2.3.4 and LLAP installed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492393)
Remaining Estimate: 0h
Time Spent: 10m

> LLAP: query job stuck due to race conditions
> 
>
> Key: HIVE-24208
> URL: https://issues.apache.org/jira/browse/HIVE-24208
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When issuing an LLAP query, sometimes the TEZ job on LLAP server never ends 
> and it never returns the data reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

2020-09-29 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-24207:
---

Assignee: László Bodor

> LimitOperator can leverage ObjectCache to bail out quickly
> --
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24208) LLAP: query job stuck due to race conditions

2020-09-29 Thread Yuriy Baltovskyy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuriy Baltovskyy reassigned HIVE-24208:
---


> LLAP: query job stuck due to race conditions
> 
>
> Key: HIVE-24208
> URL: https://issues.apache.org/jira/browse/HIVE-24208
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Major
>
> When issuing an LLAP query, sometimes the TEZ job on LLAP server never ends 
> and it never returns the data reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?focusedWorklogId=492356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492356
 ]

ASF GitHub Bot logged work on HIVE-24069:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 09:59
Start Date: 29/Sep/20 09:59
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1429:
URL: https://github.com/apache/hive/pull/1429#issuecomment-700599684


   @kgyrtkirk Could you please take a quick look at the changes? thank you



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492356)
Time Spent: 0.5h  (was: 20m)

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=492311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492311
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 07:44
Start Date: 29/Sep/20 07:44
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1271:
URL: https://github.com/apache/hive/pull/1271#issuecomment-700514324


   @kgyrtkirk So the new approach is that
   
   - We have introduced a new flag 
**metastore.decode.filter.expression.tostring** which will deserizalize the 
expr byte array to string when the deserialization of expr byte array to 
ExprNode fails
   
   - So when MSCK with partition filter tries to drop partition it will first 
try to deserialize the expr [] to ExprNode it will fail and then goes on to 
deserialize to string
   
   Does this approach makes sense?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492311)
Time Spent: 4.5h  (was: 4h 20m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition 

[jira] [Commented] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-09-29 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203681#comment-17203681
 ] 

okumin commented on HIVE-24203:
---

All tests have passed.

[https://github.com/apache/hive/pull/1531]

 

[~jcamachorodriguez] Could you please review the PR when you have a chance? 
Sorry if you are not the right person.

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=492282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-492282
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 29/Sep/20 06:15
Start Date: 29/Sep/20 06:15
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r496441090



##
File path: ql/src/test/queries/clientpositive/annotate_stats_lateral_view_join.q
##
@@ -0,0 +1,38 @@
+set hive.fetch.task.conversion=none;

Review comment:
   To make EXPLAIN show Statistics. I'm thinking to create another ticket 
and add this line to other `annotate_stats_*.q`.
   
   e.g. 
https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/llap/annotate_stats_select.q.out





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 492282)
Time Spent: 20m  (was: 10m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)