[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-20 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269040#comment-17269040
 ] 

Zhihua Deng commented on HIVE-24666:


Thanks [~gopalv] for pointing it out. I've updated the codes to fix the wrongly 
adaptor,  could you please take another look? 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-20 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24675:
---
Attachment: HIVE-24675.01.patch

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?focusedWorklogId=538858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538858
 ]

ASF GitHub Bot logged work on HIVE-24675:
-

Author: ASF GitHub Bot
Created on: 21/Jan/21 04:55
Start Date: 21/Jan/21 04:55
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1898:
URL: https://github.com/apache/hive/pull/1898


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538858)
Remaining Estimate: 0h
Time Spent: 10m

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
> Attachments: HIVE-24675.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24675:
--
Labels: pull-request-available  (was: )

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-20 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24675:
--


> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24674) Set repl.source.for property in the db if db is under replication

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24674?focusedWorklogId=538852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538852
 ]

ASF GitHub Bot logged work on HIVE-24674:
-

Author: ASF GitHub Bot
Created on: 21/Jan/21 04:27
Start Date: 21/Jan/21 04:27
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #1897:
URL: https://github.com/apache/hive/pull/1897


   https://issues.apache.org/jira/browse/HIVE-24674



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538852)
Remaining Estimate: 0h
Time Spent: 10m

> Set repl.source.for property in the db if db is under replication
> -
>
> Key: HIVE-24674
> URL: https://issues.apache.org/jira/browse/HIVE-24674
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add repl.source.for property in the database in case not already set, if the 
> database is under replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24674) Set repl.source.for property in the db if db is under replication

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24674:
--
Labels: pull-request-available  (was: )

> Set repl.source.for property in the db if db is under replication
> -
>
> Key: HIVE-24674
> URL: https://issues.apache.org/jira/browse/HIVE-24674
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add repl.source.for property in the database in case not already set, if the 
> database is under replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24674) Set repl.source.for property in the db if db is under replication

2021-01-20 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-24674:
---


> Set repl.source.for property in the db if db is under replication
> -
>
> Key: HIVE-24674
> URL: https://issues.apache.org/jira/browse/HIVE-24674
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Add repl.source.for property in the database in case not already set, if the 
> database is under replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24636) Memory leak due to stacking UDFClassLoader in Apache Commons LogFactory

2021-01-20 Thread dohongdayi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dohongdayi updated HIVE-24636:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Memory leak due to stacking UDFClassLoader in Apache Commons LogFactory
> ---
>
> Key: HIVE-24636
> URL: https://issues.apache.org/jira/browse/HIVE-24636
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: dohongdayi
>Assignee: dohongdayi
>Priority: Major
> Attachments: HIVE-24636.1.patch.txt
>
>
> Much the same as [HIVE-7563|https://issues.apache.org/jira/browse/HIVE-7563], 
> after ClassLoader is closed in JavaUtils, it should be released by Apache 
> Commons LogFactory, or the ClassLoader can't be Garbage Collected, which 
> leads to memory leak, exactly our PROD met.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24660) Remove Commons Logger from jdbc-handler Package

2021-01-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24660.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master! Thanks [~mgergely] for the review!

> Remove Commons Logger from jdbc-handler Package
> ---
>
> Key: HIVE-24660
> URL: https://issues.apache.org/jira/browse/HIVE-24660
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All Hive libraries should be using SLF4J (or slf4j-log4j in server 
> applications).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24660) Remove Commons Logger from jdbc-handler Package

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24660?focusedWorklogId=538769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538769
 ]

ASF GitHub Bot logged work on HIVE-24660:
-

Author: ASF GitHub Bot
Created on: 21/Jan/21 00:04
Start Date: 21/Jan/21 00:04
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1888:
URL: https://github.com/apache/hive/pull/1888


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538769)
Time Spent: 20m  (was: 10m)

> Remove Commons Logger from jdbc-handler Package
> ---
>
> Key: HIVE-24660
> URL: https://issues.apache.org/jira/browse/HIVE-24660
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All Hive libraries should be using SLF4J (or slf4j-log4j in server 
> applications).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24673) Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap

2021-01-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-24673:
---


> Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap
> -
>
> Key: HIVE-24673
> URL: https://issues.apache.org/jira/browse/HIVE-24673
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> These test drivers should run on llap. Otherwise we can run into situations 
> where certain queries correctly fail on MapReduce but not on Tez.
> Also, it is better if negative cli drivers does not mask "Caused by" lines in 
> test output. Otherwise, a query may start to fail for other reasons than the 
> expected one and we do not realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24672) compute_stats_long.q fails for wrong reasons

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24672:
--
Labels: pull-request-available  (was: )

> compute_stats_long.q fails for wrong reasons
> 
>
> Key: HIVE-24672
> URL: https://issues.apache.org/jira/browse/HIVE-24672
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestNegativeCliDriver[compute_stats_long] intends to test fmsketch has a hard 
> limit on number of bit vectors (1024). However, the test fails for the 
> following wrong reason.
> {code:java}
> Caused by: java.lang.RuntimeException: Can not recognize 1Caused by: 
> java.lang.RuntimeException: Can not recognize 1 at 
> org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory.getEmptyNumDistinctValueEstimator(NumDistinctValueEstimatorFactory.java:71)
> {code}
> Instead it should fail with 
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum 
> allowed value for number of bit vectors  is 1024, but was passed 1 bit 
> vectorsCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: The 
> maximum allowed value for number of bit vectors  is 1024, but was passed 
> 1 bit vectors at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeBitVectorFMSketch$NumericStatsEvaluator.iterate(GenericUDAFComputeBitVectorFMSketch.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {code}
> Since this function is superseeded by compute_bit_vector_fm, it is best if we 
> add the same test for compute_bit_vector_fm too.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24672) compute_stats_long.q fails for wrong reasons

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24672?focusedWorklogId=538758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538758
 ]

ASF GitHub Bot logged work on HIVE-24672:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 23:13
Start Date: 20/Jan/21 23:13
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1896:
URL: https://github.com/apache/hive/pull/1896


   Change-Id: I61d941dcbf86fb2dd45772fc658b3dc887325bd0
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538758)
Remaining Estimate: 0h
Time Spent: 10m

> compute_stats_long.q fails for wrong reasons
> 
>
> Key: HIVE-24672
> URL: https://issues.apache.org/jira/browse/HIVE-24672
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestNegativeCliDriver[compute_stats_long] intends to test fmsketch has a hard 
> limit on number of bit vectors (1024). However, the test fails for the 
> following wrong reason.
> {code:java}
> Caused by: java.lang.RuntimeException: Can not recognize 1Caused by: 
> java.lang.RuntimeException: Can not recognize 1 at 
> org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory.getEmptyNumDistinctValueEstimator(NumDistinctValueEstimatorFactory.java:71)
> {code}
> Instead it should fail with 
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum 
> allowed value for number of bit vectors  is 1024, but was passed 1 bit 
> vectorsCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: The 
> maximum allowed value for number of bit vectors  is 1024, but was passed 
> 1 bit vectors at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeBitVectorFMSketch$NumericStatsEvaluator.iterate(GenericUDAFComputeBitVectorFMSketch.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {code}
> Since this function is superseeded by compute_bit_vector_fm, it is best if we 
> add the same test for compute_bit_vector_fm too.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24672) compute_stats_long.q fails for wrong reasons

2021-01-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-24672:
---


> compute_stats_long.q fails for wrong reasons
> 
>
> Key: HIVE-24672
> URL: https://issues.apache.org/jira/browse/HIVE-24672
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> TestNegativeCliDriver[compute_stats_long] intends to test fmsketch has a hard 
> limit on number of bit vectors (1024). However, the test fails for the 
> following wrong reason.
> {code:java}
> Caused by: java.lang.RuntimeException: Can not recognize 1Caused by: 
> java.lang.RuntimeException: Can not recognize 1 at 
> org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory.getEmptyNumDistinctValueEstimator(NumDistinctValueEstimatorFactory.java:71)
> {code}
> Instead it should fail with 
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum 
> allowed value for number of bit vectors  is 1024, but was passed 1 bit 
> vectorsCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: The 
> maximum allowed value for number of bit vectors  is 1024, but was passed 
> 1 bit vectors at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeBitVectorFMSketch$NumericStatsEvaluator.iterate(GenericUDAFComputeBitVectorFMSketch.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {code}
> Since this function is superseeded by compute_bit_vector_fm, it is best if we 
> add the same test for compute_bit_vector_fm too.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24645) UDF configure not called when fetch task conversion occurs

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24645?focusedWorklogId=538742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538742
 ]

ASF GitHub Bot logged work on HIVE-24645:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 22:46
Start Date: 20/Jan/21 22:46
Worklog Time Spent: 10m 
  Work Description: jfsii commented on pull request #1876:
URL: https://github.com/apache/hive/pull/1876#issuecomment-764002826


   Rebased - also ran tests locally and it seemed to pass. Going to hope it was 
an infra issue prior.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538742)
Time Spent: 20m  (was: 10m)

> UDF configure not called when fetch task conversion occurs
> --
>
> Key: HIVE-24645
> URL: https://issues.apache.org/jira/browse/HIVE-24645
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When hive.fetch.task.conversion kicks in - UDF configure is not called.
> This is likely due to MapredContext not being available when this conversion 
> occurs.
> The approach I suggest is to create a dummy MapredContext and provide it with 
> the current configuration from ExprNodeGenericFuncEvaluator.
> It is slightly unfortunate that the UDF API relies on MapredContext since 
> some aspects of the context do not apply to the variety of engines and 
> invocation paths for UDFs which makes it difficult to make a fully formed 
> dummy object such as the Reporter objects and the boolean around if it is a 
> Map context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24432) Delete Notification Events in Batches

2021-01-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24432.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master! Thanks [~ngangam] and [~aasha] for the reviews!

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-20 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268817#comment-17268817
 ] 

Gopal Vijayaraghavan commented on HIVE-24666:
-

This looks like the wrong fix for the problem (i.e switching to UDFAdaptor, 
instead of fixing vectorized filter).

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.

2021-01-20 Thread Rajkumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh resolved HIVE-24491.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> setting custom job name is ineffective if the tez session pool is configured 
> or in case of session reuse.
> -
>
> Key: HIVE-24491
> URL: https://issues.apache.org/jira/browse/HIVE-24491
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
> session pool manager is configured or tez session reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.

2021-01-20 Thread Rajkumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268710#comment-17268710
 ] 

Rajkumar Singh commented on HIVE-24491:
---

Thanks [~ashutoshc] for review, PR is merged into the master.

> setting custom job name is ineffective if the tez session pool is configured 
> or in case of session reuse.
> -
>
> Key: HIVE-24491
> URL: https://issues.apache.org/jira/browse/HIVE-24491
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
> session pool manager is configured or tez session reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24659) Remove Commons Logger from serde Package

2021-01-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24659.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~mgergely] for the review (yet again)!!!

> Remove Commons Logger from serde Package
> 
>
> Key: HIVE-24659
> URL: https://issues.apache.org/jira/browse/HIVE-24659
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All Hive libraries should be using SLF4J (or slf4j-log4j in server 
> applications).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24659) Remove Commons Logger from serde Package

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24659?focusedWorklogId=538553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538553
 ]

ASF GitHub Bot logged work on HIVE-24659:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 17:07
Start Date: 20/Jan/21 17:07
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1887:
URL: https://github.com/apache/hive/pull/1887


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538553)
Time Spent: 20m  (was: 10m)

> Remove Commons Logger from serde Package
> 
>
> Key: HIVE-24659
> URL: https://issues.apache.org/jira/browse/HIVE-24659
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All Hive libraries should be using SLF4J (or slf4j-log4j in server 
> applications).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24491?focusedWorklogId=538548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538548
 ]

ASF GitHub Bot logged work on HIVE-24491:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 17:01
Start Date: 20/Jan/21 17:01
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1746:
URL: https://github.com/apache/hive/pull/1746


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538548)
Time Spent: 0.5h  (was: 20m)

> setting custom job name is ineffective if the tez session pool is configured 
> or in case of session reuse.
> -
>
> Key: HIVE-24491
> URL: https://issues.apache.org/jira/browse/HIVE-24491
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
> session pool manager is configured or tez session reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24432) Delete Notification Events in Batches

2021-01-20 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268691#comment-17268691
 ] 

Pravin Sinha commented on HIVE-24432:
-

[~belugabehr] This patch has already been merged to master. Could we resolve 
this jira?

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538524
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:58
Start Date: 20/Jan/21 15:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561077658



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   gotcha!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538524)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538522
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:56
Start Date: 20/Jan/21 15:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561076023



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
 }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a 
given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the 
delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+private final Path basePath;
+
+public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+  super(writeId, stmtId);
+  this.basePath = basePath;
+}
+@Override
+public boolean accept(Path path) {
+  do {
+if (super.accept(path)) {
+  return true;
+}
+path = path.getParent();
+  } while (!path.equals(basePath));

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538522)
Time Spent: 2h 10m  (was: 2h)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24473) Make Hive buildable with HBase 2.x GA versions

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24473?focusedWorklogId=538513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538513
 ]

ASF GitHub Bot logged work on HIVE-24473:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:40
Start Date: 20/Jan/21 15:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1729:
URL: https://github.com/apache/hive/pull/1729


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538513)
Time Spent: 20m  (was: 10m)

> Make Hive buildable with HBase 2.x GA versions
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Unfortunately the HBase project doesn't provide public maven artifacts that 
> are binary compatible with Hadoop 3.1, so unless we add a build step that 
> recompiles HBase with Hadoop3, we cannot update a to GA HBase 2 version.
> We should at least make sure that Hive can be built with GA HBase 2 releases.
> -Update HBase to more recent version.-
> -We cannot use anything later than 2.2.4 because of HBASE-22394-
> -So the options are 2.1.10 and 2.2.4-
> -I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility with HBase server deployments.-
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24473) Make Hive buildable with HBase 2.x GA versions

2021-01-20 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24473:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged into master. Thank you [~stoty]!

> Make Hive buildable with HBase 2.x GA versions
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Unfortunately the HBase project doesn't provide public maven artifacts that 
> are binary compatible with Hadoop 3.1, so unless we add a build step that 
> recompiles HBase with Hadoop3, we cannot update a to GA HBase 2 version.
> We should at least make sure that Hive can be built with GA HBase 2 releases.
> -Update HBase to more recent version.-
> -We cannot use anything later than 2.2.4 because of HBASE-22394-
> -So the options are 2.1.10 and 2.2.4-
> -I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility with HBase server deployments.-
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24483) Bump protobuf version to 3.12.0

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24483:
--
Labels: pull-request-available  (was: )

> Bump protobuf version to 3.12.0
> ---
>
> Key: HIVE-24483
> URL: https://issues.apache.org/jira/browse/HIVE-24483
> Project: Hive
>  Issue Type: Improvement
>Reporter: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following protoc version's used in hive is very old i.e. 2.5.0 
> [https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/] .  The 
> v2.5.0 does not have aarch64 support. But the AArch64 support started from 
> v3.5.0 on-words in google's protobuf project release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24483) Bump protobuf version to 3.12.0

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24483?focusedWorklogId=538509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538509
 ]

ASF GitHub Bot logged work on HIVE-24483:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:35
Start Date: 20/Jan/21 15:35
Worklog Time Spent: 10m 
  Work Description: dataproc-metastore opened a new pull request #1895:
URL: https://github.com/apache/hive/pull/1895


   
   
   ### What changes were proposed in this pull request?
   Upgrade gRPC library version in Hive to 3.12.0
   
   
   
   ### Why are the changes needed?
   Using grpc 3 will allow us to use more gRPC features further down the line
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No, just a different grpc compiler will be used.
   
   
   
   ### How was this patch tested?
   Existing unit tests, building/running manually
   Not additional tests were added since this is just a library version change.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538509)
Remaining Estimate: 0h
Time Spent: 10m

> Bump protobuf version to 3.12.0
> ---
>
> Key: HIVE-24483
> URL: https://issues.apache.org/jira/browse/HIVE-24483
> Project: Hive
>  Issue Type: Improvement
>Reporter: Cameron Moberg
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following protoc version's used in hive is very old i.e. 2.5.0 
> [https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/] .  The 
> v2.5.0 does not have aarch64 support. But the AArch64 support started from 
> v3.5.0 on-words in google's protobuf project release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538500
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:19
Start Date: 20/Jan/21 15:19
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561046305



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
 }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a 
given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the 
delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+private final Path basePath;
+
+public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+  super(writeId, stmtId);
+  this.basePath = basePath;
+}
+@Override
+public boolean accept(Path path) {
+  do {
+if (super.accept(path)) {
+  return true;
+}
+path = path.getParent();
+  } while (!path.equals(basePath));

Review comment:
   It will fail with a nullpointer if you call it with an unrelated path, I 
will add the nullcheck





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538500)
Time Spent: 2h  (was: 1h 50m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538498
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:18
Start Date: 20/Jan/21 15:18
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561044735



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   No, we need the filestatuses, so we can reuse them later for populating 
the quickstats. In a follow up patch I will try to do the other way around and 
collect FileStatus everywhere in loadPartition instead of Path, because 
writeNotification could also benefit having the filestatuses ready. But it is 
not that trivial change in the non direct insert cases.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538498)
Time Spent: 1h 50m  (was: 1h 40m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538492
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:12
Start Date: 20/Jan/21 15:12
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561040097



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   It was used in the original insertOverWrite patch, but later it was 
removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538492)
Time Spent: 1h 40m  (was: 1.5h)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538483
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:02
Start Date: 20/Jan/21 15:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   Can we return List'<'Path'>' to avoid extra map(FileStatus::getPath) and 
collect in the method usages?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538483)
Time Spent: 1.5h  (was: 1h 20m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538482
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 15:01
Start Date: 20/Jan/21 15:01
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   Can we return List'<'Path'>' to avoid extra map(FileStatus::getPath) in 
the method usages?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538482)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538481
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:59
Start Date: 20/Jan/21 14:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   Can we return List'<'String'>' to avoid extra map(FileStatus::getPath) 
in the method usages?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538481)
Time Spent: 1h 10m  (was: 1h)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538480
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:59
Start Date: 20/Jan/21 14:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   Can we return List to avoid extra map(FileStatus::getPath) in 
the method usages?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538480)
Time Spent: 1h  (was: 50m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538479
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:56
Start Date: 20/Jan/21 14:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561023487



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path 
acidDir, FileSystem srcFs,
 }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int 
stmtId, boolean isInsertOverwrite,
-  List newFiles) throws HiveException {
+  private List listFilesCreatedByQuery(Path loadPath, long 
writeId, int stmtId) throws HiveException {

Review comment:
   insertOverwrite flag was never used? :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538479)
Time Spent: 50m  (was: 40m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24671) Semijoinremoval should not run into an NPE in case the SJ filter contains an UDF

2021-01-20 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24671:
---


> Semijoinremoval should not run into an NPE in case the SJ filter contains an 
> UDF
> 
>
> Key: HIVE-24671
> URL: https://issues.apache.org/jira/browse/HIVE-24671
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code}
> set hive.optimize.index.filter=true;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.vectorized.execution.enabled=true;
> drop table if exists t1;
> drop table if exists t2;
> create table t1 (
> v1 string
> );
> create table t2 (
> v2 string
> );
> insert into t1 values ('e123456789'),('x123456789');
> insert into t2 values
> ('123'),
>  ('e123456789');
> -- alter table t1 update statistics set 
> ('numRows'='9348843574','rawDataSize'='0');
> alter table t1 update statistics set 
> ('numRows'='934884357','rawDataSize'='0');
> alter table t2 update statistics set ('numRows'='9348','rawDataSize'='0');
> alter table t1 update statistics for column v1 set 
> ('numNulls'='0','numDVs'='15541355','avgColLen'='10.0','maxColLen'='10');
> alter table t2 update statistics for column v2 set 
> ('numNulls'='0','numDVs'='155','avgColLen'='5.0','maxColLen'='10');
> -- alter table t2 update statistics for column k set 
> ('numNulls'='0','numDVs'='13876472','avgColLen'='15.9836','maxColLen'='16');
> explain
> select v1,v2 from t1 join t2 on (substr(v1,1,3) = v2);
> {code}
> results in:
> {code}
>  java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1944)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:544)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:240)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:161)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:12467)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12672)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24670?focusedWorklogId=538476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538476
 ]

ASF GitHub Bot logged work on HIVE-24670:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:49
Start Date: 20/Jan/21 14:49
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1894:
URL: https://github.com/apache/hive/pull/1894


   If delete delta caching is turned off, the plain record reader inside 
DeleteReaderValue allocates a batch with a schema that is equivalent to that of 
an insert delta.
   
   This is unnecessary as the struct part in a delete delta file is always 
empty. In cases where we have many delete delta files (e.g. due to compaction 
failures) and a wide table definition (e.g. 200+ cols) this puts a significant 
amount of memory pressure on the executor, while these empty structures will 
never be filled or otherwise utilized.
   
   I propose we specify an ACID schema with an empty struct part to this record 
reader to counter this.
   
   Options



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538476)
Remaining Estimate: 0h
Time Spent: 10m

> DeleteReaderValue should not allocate empty vectors for delete delta files
> --
>
> Key: HIVE-24670
> URL: https://issues.apache.org/jira/browse/HIVE-24670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If delete delta caching is turned off, the plain record reader inside 
> DeleteReaderValue allocates a batch with a schema that is equivalent to that 
> of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always 
> empty. In cases where we have many delete delta files (e.g. due to compaction 
> failures) and a wide table definition (e.g. 200+ cols) this puts a 
> significant amount of memory pressure on the executor, while these empty 
> structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record 
> reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24670:
--
Labels: pull-request-available  (was: )

> DeleteReaderValue should not allocate empty vectors for delete delta files
> --
>
> Key: HIVE-24670
> URL: https://issues.apache.org/jira/browse/HIVE-24670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If delete delta caching is turned off, the plain record reader inside 
> DeleteReaderValue allocates a batch with a schema that is equivalent to that 
> of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always 
> empty. In cases where we have many delete delta files (e.g. due to compaction 
> failures) and a wide table definition (e.g. 200+ cols) this puts a 
> significant amount of memory pressure on the executor, while these empty 
> structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record 
> reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538473
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:44
Start Date: 20/Jan/21 14:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561009721



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
 }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a 
given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the 
delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+private final Path basePath;
+
+public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+  super(writeId, stmtId);
+  this.basePath = basePath;
+}
+@Override
+public boolean accept(Path path) {
+  do {
+if (super.accept(path)) {
+  return true;
+}
+path = path.getParent();
+  } while (!path.equals(basePath));

Review comment:
   Is it possible that `path.equals(basePath)` won't ever be true? or path 
becomes null?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538473)
Time Spent: 40m  (was: 0.5h)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538472
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:41
Start Date: 20/Jan/21 14:41
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561009721



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
 }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a 
given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the 
delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+private final Path basePath;
+
+public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+  super(writeId, stmtId);
+  this.basePath = basePath;
+}
+@Override
+public boolean accept(Path path) {
+  do {
+if (super.accept(path)) {
+  return true;
+}
+path = path.getParent();
+  } while (!path.equals(basePath));

Review comment:
   Is it possible that `path.equals(basePath)` won't ever be true?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538472)
Time Spent: 0.5h  (was: 20m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538468
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:39
Start Date: 20/Jan/21 14:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561009721



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
 }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a 
given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the 
delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+private final Path basePath;
+
+public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+  super(writeId, stmtId);
+  this.basePath = basePath;
+}
+@Override
+public boolean accept(Path path) {
+  do {
+if (super.accept(path)) {
+  return true;
+}
+path = path.getParent();
+  } while (!path.equals(basePath));

Review comment:
   Should we have null check for path here so we won't get an infinite loop?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538468)
Time Spent: 20m  (was: 10m)

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita reassigned HIVE-24670:
-


> DeleteReaderValue should not allocate empty vectors for delete delta files
> --
>
> Key: HIVE-24670
> URL: https://issues.apache.org/jira/browse/HIVE-24670
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> If delete delta caching is turned off, the plain record reader inside 
> DeleteReaderValue allocates a batch with a schema that is equivalent to that 
> of an insert delta.
> This is unnecessary as the struct part in a delete delta file is always 
> empty. In cases where we have many delete delta files (e.g. due to compaction 
> failures) and a wide table definition (e.g. 200+ cols) this puts a 
> significant amount of memory pressure on the executor, while these empty 
> structures will never be filled or otherwise utilized.
> I propose we specify an ACID schema with an empty struct part to this record 
> reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread Peter Varga (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268605#comment-17268605
 ] 

Peter Varga commented on HIVE-24669:


Local performance measurement results:
Doing insert select from unpartition table on S3 to a partitioned table on S3, 
creating 100 new dynamic partition.

Avg loadPartitionInternal before patch: 2433 ms
Avg loadPartitionInternal after patch: 1279 ms

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24669:
--
Labels: pull-request-available  (was: )

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?focusedWorklogId=538458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538458
 ]

ASF GitHub Bot logged work on HIVE-24669:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 14:25
Start Date: 20/Jan/21 14:25
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1893:
URL: https://github.com/apache/hive/pull/1893


   
   ### What changes were proposed in this pull request?
   Improve FileSystem usage in Hive::loadPartitionInternal to improve 
performance n S3
   
   
   ### Why are the changes needed?
   Performance improvement
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Current unit tests + local performance measurements.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538458)
Remaining Estimate: 0h
Time Spent: 10m

> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24669:
--


> Improve Filesystem usage in Hive::loadPartitionInternal
> ---
>
> Key: HIVE-24669
> URL: https://issues.apache.org/jira/browse/HIVE-24669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> * Use native recursive listing instead doing it on the Hive side
>  * Reuse the file list determined for writeNotificationlogs in quickstat 
> generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24668) Improve FileSystem usage in dynamic partition handling

2021-01-20 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24668:
--


> Improve FileSystem usage in dynamic partition handling
> --
>
> Key: HIVE-24668
> URL: https://issues.apache.org/jira/browse/HIVE-24668
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Possible improvements:
>  * In the Movetask process both getFullDPSpecs and later 
> Hive::getValidPartitionsInPath do a listing for dynamic partitions in the 
> table, the result of the first can be reused
>  * Hive::listFilesCreatedByQuery does the recursive listing on Hive side, the 
> native recursive listing should be used
>  * if we add a new partition we populate the quickstats, that will do another 
> listing for the new partition, the files are already collected for the 
> writeNotificationlogs, that can be used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24667) Truncate optimization to avoid unnecessary per partition DB get operations

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24667?focusedWorklogId=538446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538446
 ]

ASF GitHub Bot logged work on HIVE-24667:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 13:49
Start Date: 20/Jan/21 13:49
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1891:
URL: https://github.com/apache/hive/pull/1891#issuecomment-763617047


   LGTM +1 pending tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538446)
Time Spent: 20m  (was: 10m)

> Truncate optimization to avoid unnecessary per partition DB get operations
> --
>
> Key: HIVE-24667
> URL: https://issues.apache.org/jira/browse/HIVE-24667
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24664) Support column aliases in Values clause

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24664:
--
Labels: pull-request-available  (was: )

> Support column aliases in Values clause
> ---
>
> Key: HIVE-24664
> URL: https://issues.apache.org/jira/browse/HIVE-24664
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Enable explicitly specify column aliases in the first row of Values clause. 
> If not all the columns has alias specified generate one.
> {code:java}
> values(1, 2 b, 3 c),(4, 5, 6);
> {code}
> {code:java}
> _col1   b   c
>   1 2   3
>   4 5   6
> {code}
>  This is not an standard SQL feature but some database engines like Impala 
> supports it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24664) Support column aliases in Values clause

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24664?focusedWorklogId=538440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538440
 ]

ASF GitHub Bot logged work on HIVE-24664:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 13:36
Start Date: 20/Jan/21 13:36
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1892:
URL: https://github.com/apache/hive/pull/1892


   ### What changes were proposed in this pull request?
   1. Declare a list in `IdentifiersParser` for storing aliases specified after 
expressions in the values clause first row. If no alias is specified generate 
one implicitly.
   2. Add methods to manipulate and query the list.
   3. Create a new `firstValueRowConstructor` from parser rule 
`valueRowConstructor` to parse the first and rest of the rows in a different 
way. Inline the brackets to these new rules like in `expressionsInParenthesis`
   4. Create `firstExpressionsWithAlias` and `moreExpressionsWithAlias` rules 
and call them from `firstValueRowConstructor`/`valueRowConstructor` rules based 
on the merge of existing rules: `expressionsNotInParenthesis` and 
`expressionPart`: since `isStruct` and `forceStruct` is always true when 
calling from `valueRowConstructor` keep only the  branch which is used when 
these two parameters are true in the new rules.
   5. Create parser rule `expressionWithAlias`. This rule handles expressions 
with and without alias and it stores the parsed/generated alias in a list 
mentioned in 1.
   6. Create parser rule `expressionWithStoredAlias` to parse a single 
expression without an alias explicitly specified but assign one from the stored 
ones.
   7. Rename the first parameter of parser rule `expressionPart`.
   
   ### Why are the changes needed?
   Support column aliases in Values clause. See jira or `values_alias.q` for 
example.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, but existing queries are not affected. With this patch users can define 
column aliases in the first row of Values clause. This type of alias definition 
is not supported in `INSERT INTO [] VALUES([actual 
values])` statements.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestValuesClause -pl parser
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=values.q,values_alias.q -pl 
itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538440)
Remaining Estimate: 0h
Time Spent: 10m

> Support column aliases in Values clause
> ---
>
> Key: HIVE-24664
> URL: https://issues.apache.org/jira/browse/HIVE-24664
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Enable explicitly specify column aliases in the first row of Values clause. 
> If not all the columns has alias specified generate one.
> {code:java}
> values(1, 2 b, 3 c),(4, 5, 6);
> {code}
> {code:java}
> _col1   b   c
>   1 2   3
>   4 5   6
> {code}
>  This is not an standard SQL feature but some database engines like Impala 
> supports it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24667) Truncate optimization to avoid unnecessary per partition DB get operations

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24667?focusedWorklogId=538402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538402
 ]

ASF GitHub Bot logged work on HIVE-24667:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 12:51
Start Date: 20/Jan/21 12:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #1891:
URL: https://github.com/apache/hive/pull/1891


   
   
   ### What changes were proposed in this pull request?
   
   Moved DB fetch out of partition loop
   
   ### Why are the changes needed?
   
   Performance improvement
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538402)
Remaining Estimate: 0h
Time Spent: 10m

> Truncate optimization to avoid unnecessary per partition DB get operations
> --
>
> Key: HIVE-24667
> URL: https://issues.apache.org/jira/browse/HIVE-24667
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24667) Truncate optimization to avoid unnecessary per partition DB get operations

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24667:
--
Labels: pull-request-available  (was: )

> Truncate optimization to avoid unnecessary per partition DB get operations
> --
>
> Key: HIVE-24667
> URL: https://issues.apache.org/jira/browse/HIVE-24667
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24656) CBO fails for queries with is null on map and array types

2021-01-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268554#comment-17268554
 ] 

Ádám Szita commented on HIVE-24656:
---

Ah nice! So that's why you saw that NPE - cool, I was actually looking for such 
an option to control CBO fallback, so I think HIVE-24601 will be very useful to 
have.

> CBO fails for queries with is null on map and array types
> -
>
> Key: HIVE-24656
> URL: https://issues.apache.org/jira/browse/HIVE-24656
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently CBO throws an NPE for queries that have is null or is not null as 
> where clause:
> {code:java}
> CREATE EXTERNAL TABLE `oft2`(                      
>   `mi` int,                                        
>   `ms` string,                                     
>   `mst` struct<`a`:string, `b`:string>,            
>   `mm1` map,                           
>   `mm2` map>,  
>   `ma` array);{code}
> {code:java}
> select * from oft2 where ma is null;{code}
> {code:java}
> select * from oft2 where mm1 is null;{code}
> will cause NPE and skip CBO:
> {code:java}
> 2021-01-19T04:47:31,696 ERROR [0de7af8c-b9a5-4914-b967-8827e9ea09e4 
> HiveServer2-Handler-Pool: Thread-916] parse.CalcitePlanner: CBO failed, 
> skipping CBO.
> java.lang.NullPointerException: null
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:796)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:618)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexLiteral.accept(RexLiteral.java:1137) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:226)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1648)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:577)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12526)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
>  [hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:267)
>  [hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> 

[jira] [Work logged] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24639?focusedWorklogId=538368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538368
 ]

ASF GitHub Bot logged work on HIVE-24639:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 12:01
Start Date: 20/Jan/21 12:01
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1871:
URL: https://github.com/apache/hive/pull/1871#issuecomment-763557258


   > Hey @dengzhhu653 are you going to open a new PR for this? Happy to take a 
look
   
   Thanks much @pgaref! maybe we should fix the small HIVE-24666 firstly before 
we move on



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538368)
Time Spent: 40m  (was: 0.5h)

> Raises SemanticException other than ClassCastException when filter has 
> non-boolean expressions
> --
>
> Key: HIVE-24639
> URL: https://issues.apache.org/jira/browse/HIVE-24639
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Sometimes we see ClassCastException in filters when fetching some rows of a 
> table or executing the query.  The 
> GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their 
> conditions should be a boolean,  but there is no garanteed.  For example: 
> _select * from ccn_table where src + 1;_ 
> will throw ClassCastException:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Boolean
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553)
> ...{code}
> We'd better to validate the filter during analyzing instead of at runtime and 
> bring more meaningful messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24666?focusedWorklogId=538363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538363
 ]

ASF GitHub Bot logged work on HIVE-24666:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 11:57
Start Date: 20/Jan/21 11:57
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1890:
URL: https://github.com/apache/hive/pull/1890


   … string
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   mvn test -Dtest=TestVector*
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex='vector.*'
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538363)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24666:
--
Labels: pull-request-available  (was: )

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24666:
--


> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24656) CBO fails for queries with is null on map and array types

2021-01-20 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268509#comment-17268509
 ] 

Stamatis Zampetakis commented on HIVE-24656:


Yes, this is one of the reasons for pushing HIVE-24601 forward. Hopefully, it 
will also tell us if we have more bugs like this one hidden somewhere.

> CBO fails for queries with is null on map and array types
> -
>
> Key: HIVE-24656
> URL: https://issues.apache.org/jira/browse/HIVE-24656
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently CBO throws an NPE for queries that have is null or is not null as 
> where clause:
> {code:java}
> CREATE EXTERNAL TABLE `oft2`(                      
>   `mi` int,                                        
>   `ms` string,                                     
>   `mst` struct<`a`:string, `b`:string>,            
>   `mm1` map,                           
>   `mm2` map>,  
>   `ma` array);{code}
> {code:java}
> select * from oft2 where ma is null;{code}
> {code:java}
> select * from oft2 where mm1 is null;{code}
> will cause NPE and skip CBO:
> {code:java}
> 2021-01-19T04:47:31,696 ERROR [0de7af8c-b9a5-4914-b967-8827e9ea09e4 
> HiveServer2-Handler-Pool: Thread-916] parse.CalcitePlanner: CBO failed, 
> skipping CBO.
> java.lang.NullPointerException: null
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:796)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:618)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexLiteral.accept(RexLiteral.java:1137) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:226)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1648)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:577)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12526)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
>  [hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:267)
>  [hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> 

[jira] [Commented] (HIVE-24656) CBO fails for queries with is null on map and array types

2021-01-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268499#comment-17268499
 ] 

Ádám Szita commented on HIVE-24656:
---

Actually that test should always produce the NPE when the query goes against 
either the map or the array column.

The reason this is not expected to be visible in the test results is that when 
CBO fails, Hive falls back to the plain old rule based optimizer. So seeing 
this actually coming up as a failure in the link above is a surprise to me - I 
was under the impression that this has been a rather sleeping bug.

I wonder how many more of these do we have though..

> CBO fails for queries with is null on map and array types
> -
>
> Key: HIVE-24656
> URL: https://issues.apache.org/jira/browse/HIVE-24656
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently CBO throws an NPE for queries that have is null or is not null as 
> where clause:
> {code:java}
> CREATE EXTERNAL TABLE `oft2`(                      
>   `mi` int,                                        
>   `ms` string,                                     
>   `mst` struct<`a`:string, `b`:string>,            
>   `mm1` map,                           
>   `mm2` map>,  
>   `ma` array);{code}
> {code:java}
> select * from oft2 where ma is null;{code}
> {code:java}
> select * from oft2 where mm1 is null;{code}
> will cause NPE and skip CBO:
> {code:java}
> 2021-01-19T04:47:31,696 ERROR [0de7af8c-b9a5-4914-b967-8827e9ea09e4 
> HiveServer2-Handler-Pool: Thread-916] parse.CalcitePlanner: CBO failed, 
> skipping CBO.
> java.lang.NullPointerException: null
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:796)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:618)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexLiteral.accept(RexLiteral.java:1137) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:226)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1648)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:577)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12526)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> 

[jira] [Work logged] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24639?focusedWorklogId=538325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538325
 ]

ASF GitHub Bot logged work on HIVE-24639:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 10:20
Start Date: 20/Jan/21 10:20
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1871:
URL: https://github.com/apache/hive/pull/1871#issuecomment-763499730


   Hey @dengzhhu653  are you going to open a new PR for this? Happy to take a 
look



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538325)
Time Spent: 0.5h  (was: 20m)

> Raises SemanticException other than ClassCastException when filter has 
> non-boolean expressions
> --
>
> Key: HIVE-24639
> URL: https://issues.apache.org/jira/browse/HIVE-24639
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Sometimes we see ClassCastException in filters when fetching some rows of a 
> table or executing the query.  The 
> GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their 
> conditions should be a boolean,  but there is no garanteed.  For example: 
> _select * from ccn_table where src + 1;_ 
> will throw ClassCastException:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Boolean
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553)
> ...{code}
> We'd better to validate the filter during analyzing instead of at runtime and 
> bring more meaningful messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24656) CBO fails for queries with is null on map and array types

2021-01-20 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268494#comment-17268494
 ] 

Stamatis Zampetakis commented on HIVE-24656:


Interestingly we already have tests that fail 
([analyze_npe|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1875/3/tests])
 due to this under certain conditions.

> CBO fails for queries with is null on map and array types
> -
>
> Key: HIVE-24656
> URL: https://issues.apache.org/jira/browse/HIVE-24656
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently CBO throws an NPE for queries that have is null or is not null as 
> where clause:
> {code:java}
> CREATE EXTERNAL TABLE `oft2`(                      
>   `mi` int,                                        
>   `ms` string,                                     
>   `mst` struct<`a`:string, `b`:string>,            
>   `mm1` map,                           
>   `mm2` map>,  
>   `ma` array);{code}
> {code:java}
> select * from oft2 where ma is null;{code}
> {code:java}
> select * from oft2 where mm1 is null;{code}
> will cause NPE and skip CBO:
> {code:java}
> 2021-01-19T04:47:31,696 ERROR [0de7af8c-b9a5-4914-b967-8827e9ea09e4 
> HiveServer2-Handler-Pool: Thread-916] parse.CalcitePlanner: CBO failed, 
> skipping CBO.
> java.lang.NullPointerException: null
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:796)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:618)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitLiteral(ASTConverter.java:547)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.calcite.rex.RexLiteral.accept(RexLiteral.java:1137) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:226)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1648)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:577)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12526)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
>  [hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:267)
>  

[jira] [Resolved] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2021-01-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24428.
---
Resolution: Fixed

> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2021-01-20 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268492#comment-17268492
 ] 

Denys Kuzmenko commented on HIVE-24428:
---

Thank you [~kgyrtkirk] for your patch.
Merged to master.

> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24428?focusedWorklogId=538281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-538281
 ]

ASF GitHub Bot logged work on HIVE-24428:
-

Author: ASF GitHub Bot
Created on: 20/Jan/21 09:07
Start Date: 20/Jan/21 09:07
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1724:
URL: https://github.com/apache/hive/pull/1724


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 538281)
Time Spent: 6h  (was: 5h 50m)

> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface

2021-01-20 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-24665:
-


> Add commitAlterTable method to the HiveMetaHook interface
> -
>
> Key: HIVE-24665
> URL: https://issues.apache.org/jira/browse/HIVE-24665
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently we have pre and post hooks for create table and drop table 
> commands, but only a pre hook for alter table commands. We should add a post 
> hook as well (with a default implementation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)