[jira] [Work logged] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?focusedWorklogId=505545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505545
 ]

ASF GitHub Bot logged work on HIVE-24316:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 04:43
Start Date: 28/Oct/20 04:43
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun opened a new pull request #1616:
URL: https://github.com/apache/hive/pull/1616


   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade Apache ORC from 1.5.6 to 1.5.8.
   
   ### Why are the changes needed?
   
   This will bring the latest bug fixes.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the CI with the existing test cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505545)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24316:
--
Labels: pull-request-available  (was: )

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-12679) Allow users to be able to specify an implementation of IMetaStoreClient via HiveConf

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12679?focusedWorklogId=505481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505481
 ]

ASF GitHub Bot logged work on HIVE-12679:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 01:01
Start Date: 28/Oct/20 01:01
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1402:
URL: https://github.com/apache/hive/pull/1402#issuecomment-717629984


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505481)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow users to be able to specify an implementation of IMetaStoreClient via 
> HiveConf
> 
>
> Key: HIVE-12679
> URL: https://issues.apache.org/jira/browse/HIVE-12679
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Metastore, Query Planning
>Reporter: Austin Lee
>Assignee: Noritaka Sekiyama
>Priority: Minor
>  Labels: metastore, pull-request-available
> Attachments: HIVE-12679.1.patch, HIVE-12679.2.patch, 
> HIVE-12679.branch-1.2.patch, HIVE-12679.branch-2.3.patch, HIVE-12679.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hi,
> I would like to propose a change that would make it possible for users to 
> choose an implementation of IMetaStoreClient via HiveConf, i.e. 
> hive-site.xml.  Currently, in Hive the choice is hard coded to be 
> SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.  There 
> is no other direct reference to SessionHiveMetaStoreClient other than the 
> hard coded class name in Hive.java and the QL component operates only on the 
> IMetaStoreClient interface so the change would be minimal and it would be 
> quite similar to how an implementation of RawStore is specified and loaded in 
> hive-metastore.  One use case this change would serve would be one where a 
> user wishes to use an implementation of this interface without the dependency 
> on the Thrift server.
>   
> Thank you,
> Austin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24037) Parallelize hash table constructions in map joins

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24037?focusedWorklogId=505480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505480
 ]

ASF GitHub Bot logged work on HIVE-24037:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 01:01
Start Date: 28/Oct/20 01:01
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1401:
URL: https://github.com/apache/hive/pull/1401


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505480)
Time Spent: 0.5h  (was: 20m)

> Parallelize hash table constructions in map joins
> -
>
> Key: HIVE-24037
> URL: https://issues.apache.org/jira/browse/HIVE-24037
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Parallelize hash table constructions in map joins



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24053) Pluggable HttpRequestInterceptor for Hive JDBC

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24053?focusedWorklogId=505479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505479
 ]

ASF GitHub Bot logged work on HIVE-24053:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 01:01
Start Date: 28/Oct/20 01:01
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1417:
URL: https://github.com/apache/hive/pull/1417


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505479)
Time Spent: 0.5h  (was: 20m)

> Pluggable HttpRequestInterceptor for Hive JDBC
> --
>
> Key: HIVE-24053
> URL: https://issues.apache.org/jira/browse/HIVE-24053
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Affects Versions: 3.1.2
>Reporter: Ying Wang
>Assignee: Ying Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Allows client to pass in the name of a customize HttpRequestInterceptor, 
> instantiate the class and adds it to HttpClient.
> Example usage: We would like to pass in a HttpRequestInterceptor for OAuth2.0 
> Authentication purpose. The HttpRequestInterceptor will acquire and/or 
> refresh the access token and add it as authentication header each time 
> HiveConnection sends the HttpRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=505416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505416
 ]

ASF GitHub Bot logged work on HIVE-24222:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 21:18
Start Date: 27/Oct/20 21:18
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun opened a new pull request #1615:
URL: https://github.com/apache/hive/pull/1615


   ### What changes were proposed in this pull request?
   
   This is a backport of HIVE-24222 and this PR aims to upgrade Apache ORC from 
1.5.6 to 1.5.12.
   
   ### Why are the changes needed?
   
   This will bring the latest bug fixes.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the CI with the existing test cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505416)
Time Spent: 2h 10m  (was: 2h)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?focusedWorklogId=505392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505392
 ]

ASF GitHub Bot logged work on HIVE-24288:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 19:58
Start Date: 27/Oct/20 19:58
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1590:
URL: https://github.com/apache/hive/pull/1590#discussion_r512992042



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -254,6 +276,9 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 
 if (ss != null){
   ss.add_resource(ResourceType.JAR, testArchive.getAbsolutePath());
+  try {
+testArchive.deleteOnExit();

Review comment:
   So each time a compile query is run, it creates new source and new jar 
files and that are added as resources. I was not certain at what point these 
resources get distributed to the cluster, (is it when add jar is run or when 
you create a function using the resource, or when you run a query using the 
UDF). so I thought deleting on exit is safe enough because the permissions are 
already very narrow on this jar.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505392)
Time Spent: 50m  (was: 40m)

> Files created by CompileProcessor have incorrect permissions
> 
>
> Key: HIVE-24288
> URL: https://issues.apache.org/jira/browse/HIVE-24288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Compile processor generates some temporary files as part of processing. These 
> need to be cleaned up on exit from CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?focusedWorklogId=505389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505389
 ]

ASF GitHub Bot logged work on HIVE-24288:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 19:51
Start Date: 27/Oct/20 19:51
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1590:
URL: https://github.com/apache/hive/pull/1590#discussion_r512987958



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -241,6 +255,14 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 out.closeArchiveEntry();
   }
   out.finish();
+  try {
+Set perms = EnumSet.of(
+  PosixFilePermission.OWNER_READ,
+  PosixFilePermission.OWNER_WRITE);
+Files.setPosixFilePermissions(Paths.get(testArchive.toURI()), perms);
+  } catch (IOException ioe) {
+LOG.warn("Lockdown permissions could not be set for the jar archive. 
JAR file could be open to other users depending on default FS permissions");

Review comment:
   if there is an exception here, I am marking the file to be deleted when 
the JVM exists.
   testArchive.deleteOnExit();
   
   I assumed this would be used from its location in case of CLI fat client. So 
we cannot delete the jar until the CLI exits.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505389)
Time Spent: 40m  (was: 0.5h)

> Files created by CompileProcessor have incorrect permissions
> 
>
> Key: HIVE-24288
> URL: https://issues.apache.org/jira/browse/HIVE-24288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compile processor generates some temporary files as part of processing. These 
> need to be cleaned up on exit from CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?focusedWorklogId=505387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505387
 ]

ASF GitHub Bot logged work on HIVE-24288:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 19:49
Start Date: 27/Oct/20 19:49
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1590:
URL: https://github.com/apache/hive/pull/1590#discussion_r512985518



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -241,6 +255,14 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 out.closeArchiveEntry();
   }
   out.finish();
+  try {
+Set perms = EnumSet.of(
+  PosixFilePermission.OWNER_READ,

Review comment:
   We are setting it at create time right? either closeArchiveEntry() or 
out.finish() flushes to disk. We set permissions right after.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505387)
Time Spent: 0.5h  (was: 20m)

> Files created by CompileProcessor have incorrect permissions
> 
>
> Key: HIVE-24288
> URL: https://issues.apache.org/jira/browse/HIVE-24288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compile processor generates some temporary files as part of processing. These 
> need to be cleaned up on exit from CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?focusedWorklogId=505380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505380
 ]

ASF GitHub Bot logged work on HIVE-24288:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 19:21
Start Date: 27/Oct/20 19:21
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on a change in pull request #1590:
URL: https://github.com/apache/hive/pull/1590#discussion_r512948993



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -241,6 +255,14 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 out.closeArchiveEntry();
   }
   out.finish();
+  try {
+Set perms = EnumSet.of(
+  PosixFilePermission.OWNER_READ,

Review comment:
   Should the permission set at the files create time? 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -241,6 +255,14 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 out.closeArchiveEntry();
   }
   out.finish();
+  try {
+Set perms = EnumSet.of(
+  PosixFilePermission.OWNER_READ,
+  PosixFilePermission.OWNER_WRITE);
+Files.setPosixFilePermissions(Paths.get(testArchive.toURI()), perms);
+  } catch (IOException ioe) {
+LOG.warn("Lockdown permissions could not be set for the jar archive. 
JAR file could be open to other users depending on default FS permissions");

Review comment:
   If the IOException here, can you still delete the file later(has the 
permission)?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java
##
@@ -254,6 +276,9 @@ CommandProcessorResponse compile(SessionState ss) throws 
CommandProcessorExcepti
 
 if (ss != null){
   ss.add_resource(ResourceType.JAR, testArchive.getAbsolutePath());
+  try {
+testArchive.deleteOnExit();

Review comment:
   Will this jar file be added several times to resource? If not, could it 
is possible to just delete it and use what is in resource? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505380)
Time Spent: 20m  (was: 10m)

> Files created by CompileProcessor have incorrect permissions
> 
>
> Key: HIVE-24288
> URL: https://issues.apache.org/jira/browse/HIVE-24288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compile processor generates some temporary files as part of processing. These 
> need to be cleaned up on exit from CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL

2020-10-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24315:



> Improve validation and semantic analysis in HPL/SQL 
> 
>
> Key: HIVE-24315
> URL: https://issues.apache.org/jira/browse/HIVE-24315
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some known issues that need to be fixed. For example it seems that 
> arity of a function is not checked when calling it, and same is true for 
> parameter types. Calling an undefined function is evaluated to null and 
> sometimes it seems that incorrect syntax is silently ignored. 
> In cases like this a helpful error message would be expected, thought we 
> should also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24297:
--
Status: Patch Available  (was: In Progress)

> LLAP buffer collision causes NPE
> 
>
> Key: HIVE-24297
> URL: https://issues.apache.org/jira/browse/HIVE-24297
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23741 introduced an optimization so that CacheTags are not stored on 
> buffer level, but rather on file level, as one cache tag can only relate to 
> one file. With this change a buffer->filecache reference was introduced so 
> that the buffer's tag can be calculated with an extra indirection i.e. 
> buffer.filecache.tag.
> However during buffer collision in putFileData method, we don't set the 
> filecache reference of the collided (new) buffer: 
> [https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]
> Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
> at 
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
> ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24297 started by Ádám Szita.
-
> LLAP buffer collision causes NPE
> 
>
> Key: HIVE-24297
> URL: https://issues.apache.org/jira/browse/HIVE-24297
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23741 introduced an optimization so that CacheTags are not stored on 
> buffer level, but rather on file level, as one cache tag can only relate to 
> one file. With this change a buffer->filecache reference was introduced so 
> that the buffer's tag can be calculated with an extra indirection i.e. 
> buffer.filecache.tag.
> However during buffer collision in putFileData method, we don't set the 
> filecache reference of the collided (new) buffer: 
> [https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]
> Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
> at 
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
> ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24297?focusedWorklogId=505215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505215
 ]

ASF GitHub Bot logged work on HIVE-24297:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 13:30
Start Date: 27/Oct/20 13:30
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1614:
URL: https://github.com/apache/hive/pull/1614


   HIVE-23741 introduced an optimization so that CacheTags are not stored on 
buffer level, but rather on file level, as one cache tag can only relate to one 
file. With this change a buffer->filecache reference was introduced so that the 
buffer's tag can be calculated with an extra indirection i.e. 
buffer.filecache.tag.
   
   However during buffer collision in putFileData method, we don't set the 
filecache reference of the collided (new) buffer: 
https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311
   
   Later this cases NPE when the new (instantly decRef'ed) buffer is evicted.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505215)
Remaining Estimate: 0h
Time Spent: 10m

> LLAP buffer collision causes NPE
> 
>
> Key: HIVE-24297
> URL: https://issues.apache.org/jira/browse/HIVE-24297
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23741 introduced an optimization so that CacheTags are not stored on 
> buffer level, but rather on file level, as one cache tag can only relate to 
> one file. With this change a buffer->filecache reference was introduced so 
> that the buffer's tag can be calculated with an extra indirection i.e. 
> buffer.filecache.tag.
> However during buffer collision in putFileData method, we don't set the 
> filecache reference of the collided (new) buffer: 
> [https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]
> Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
> at 
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
> ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24297:
--
Labels: pull-request-available  (was: )

> LLAP buffer collision causes NPE
> 
>
> Key: HIVE-24297
> URL: https://issues.apache.org/jira/browse/HIVE-24297
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23741 introduced an optimization so that CacheTags are not stored on 
> buffer level, but rather on file level, as one cache tag can only relate to 
> one file. With this change a buffer->filecache reference was introduced so 
> that the buffer's tag can be calculated with an extra indirection i.e. 
> buffer.filecache.tag.
> However during buffer collision in putFileData method, we don't set the 
> filecache reference of the collided (new) buffer: 
> [https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]
> Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
> at 
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
> ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221435#comment-17221435
 ] 

Ádám Szita commented on HIVE-24297:
---

Looking into the issue in more detail with [~asinkovits] we found that the root 
cause is actually the setting of declaredCacheLength on the new buffer. This 
causes the new buffer to be inserted into the cache policy too - this should 
not be happening. If we have buffer with the same content already in cache the 
new one should be discarded.

BTW this can also cause CacheContentsTracker to report negative buffer and byte 
counts, because when 2 buffers collide, they only report one cache event, but 
both will be evicted in the future, resulting in negative balance..(this 
happens without HIVE-23741 too, it just doesn't throw an error).

If we unset declaredCacheLength on the new buffer then unlockBuffer method 
(called later after the put, in processCollisions) will immediately deallocate 
the new, unused buffer, rather than inserting it to cache policy with top 
priority, and be evicted much much later.

> LLAP buffer collision causes NPE
> 
>
> Key: HIVE-24297
> URL: https://issues.apache.org/jira/browse/HIVE-24297
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> HIVE-23741 introduced an optimization so that CacheTags are not stored on 
> buffer level, but rather on file level, as one cache tag can only relate to 
> one file. With this change a buffer->filecache reference was introduced so 
> that the buffer's tag can be calculated with an extra indirection i.e. 
> buffer.filecache.tag.
> However during buffer collision in putFileData method, we don't set the 
> filecache reference of the collided (new) buffer: 
> [https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]
> Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
> at 
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
> at 
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
> at 
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
> ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=505197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505197
 ]

ASF GitHub Bot logged work on HIVE-19253:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 12:26
Start Date: 27/Oct/20 12:26
Worklog Time Spent: 10m 
  Work Description: mliroz commented on pull request #1537:
URL: https://github.com/apache/hive/pull/1537#issuecomment-717208594


   Hello! With this change it does not seem possible to change a table from 
EXTERNAL_TABLE to MANAGED_TABLE using a DDL:
   
   `ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='FALSE')`
   
   as it was made possible in 
[HIVE-1329](https://issues.apache.org/jira/browse/HIVE-1329).
   
   Is there an alternative? Otherwise the solution might just be to override 
the tableType by the property when set. In other words:
   
   - tableType=EXTERNAL_TABLE and EXTERNAL='FALSE' -> isTableExternal() = false



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505197)
Time Spent: 1h 10m  (was: 1h)

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED_TABLE.toString();
>   }
> }
> {code}
> So if the EXTERNAL parameter is not set, table type is changed to managed 
> even if it was external in the first place - which is wrong.
> More over, in other places code looks at the table property to decide table 
> type and some places look at parameter. HMS should really make its mind which 
> one to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-24314:
-
Description: If the Cleaner didn't remove any files, don't mark the 
compaction queue entry as "succeeded" but instead leave it in "ready for 
cleaning" state for later cleaning. If it removed at least one, then the 
compaction queue entry as "succeeded". This is a partial fix, HIVE-24291 is the 
complete fix.  (was: If the Cleaner didn't remove any files, don't mark the 
compaction queue entry as "succeeded" but instead leave it in "ready for 
cleaning" state for later cleaning. If it removed at least one, then the 
compaction queue entry as "succeeded". This is a partial fix, )

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24314?focusedWorklogId=505165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505165
 ]

ASF GitHub Bot logged work on HIVE-24314:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 11:03
Start Date: 27/Oct/20 11:03
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1613:
URL: https://github.com/apache/hive/pull/1613


   #57 # What changes were proposed in this pull request?
   If the Cleaner didn't remove any files, don't mark the compaction queue 
entry as "succeeded" but instead leave it in "ready for cleaning" state for 
later cleaning. If it removed at least one, then the compaction queue entry as 
"succeeded". This is a partial fix



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505165)
Remaining Estimate: 0h
Time Spent: 10m

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24314:
--
Labels: pull-request-available  (was: )

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-24314:
-
Description: If the Cleaner didn't remove any files, don't mark the 
compaction queue entry as "succeeded" but instead leave it in "ready for 
cleaning" state for later cleaning. If it removed at least one file, then the 
compaction queue entry as "succeeded". This is a partial fix, HIVE-24291 is the 
complete fix.  (was: If the Cleaner didn't remove any files, don't mark the 
compaction queue entry as "succeeded" but instead leave it in "ready for 
cleaning" state for later cleaning. If it removed at least one, then the 
compaction queue entry as "succeeded". This is a partial fix, HIVE-24291 is the 
complete fix.)

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-24314:
-
Description: If the Cleaner didn't remove any files, don't mark the 
compaction queue entry as "succeeded" but instead leave it in "ready for 
cleaning" state for later cleaning. If it removed at least one, then the 
compaction queue entry as "succeeded". This is a partial fix, 

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one, then the compaction queue entry as 
> "succeeded". This is a partial fix, 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505163
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:59
Start Date: 27/Oct/20 10:59
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r512594710



##
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query92.q.out
##
@@ -164,7 +164,7 @@ Stage-0
 Select Operator [SEL_115] (rows=143966864 
width=119)
   Output:["_col0","_col1","_col2"]
   Filter Operator [FIL_113] 
(rows=143966864 width=119)
-predicate:(ws_sold_date_sk is not null 
and ws_item_sk BETWEEN DynamicValue(RS_28_item_i_item_sk_min) AND 
DynamicValue(RS_28_item_i_item_sk_max) and in_bloom_filter(ws_item_sk, 
DynamicValue(RS_28_item_i_item_sk_bloom_filter)))

Review comment:
   changes are gone





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505163)
Time Spent: 1h 20m  (was: 1h 10m)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505161
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:58
Start Date: 27/Oct/20 10:58
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r512594208



##
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query54.q.out
##
@@ -202,156 +202,154 @@ Stage-0
   predicate:(_col1 <= _col3)
   Merge Join Operator [MERGEJOIN_294] 
(rows=15218525 width=12)
 
Conds:(Inner),Output:["_col0","_col1","_col3"]
-  <-Reducer 15 [CUSTOM_SIMPLE_EDGE]
+  <-Reducer 20 [CUSTOM_SIMPLE_EDGE]
 PARTITION_ONLY_SHUFFLE [RS_99]
   Filter Operator [FIL_98] 
(rows=608741 width=12)
 predicate:(_col2 <= _col1)
 Merge Join Operator 
[MERGEJOIN_291] (rows=1826225 width=12)
   
Conds:(Inner),Output:["_col0","_col1","_col2"]
 <-Map 9 [CUSTOM_SIMPLE_EDGE] 
vectorized
-  PARTITION_ONLY_SHUFFLE 
[RS_327]

Review comment:
   this is hightly unfortunate:
   the jsonexplain api "tells" the vertex about the outgoing edge type by 
calling [this 
method](https://github.com/apache/hive/blob/db895f374bf63b77b683574fdf678bfac91a5ac6/common/src/java/org/apache/hadoop/hive/common/jsonexplain/Vertex.java#L308)
 from 
[here](https://github.com/apache/hive/blob/db895f374bf63b77b683574fdf678bfac91a5ac6/common/src/java/org/apache/hadoop/hive/common/jsonexplain/Stage.java#L115)
   
   since a single vertex can have multiple outgoing edges - setting the type of 
one-of-them is problematic - I think we may want to consider to simple remove 
this tagging of vertices
   
   instead...we should consider renaming some of the edge types...like 
`CUSTOM_SIMPLE_EDGE` to `PARTITION_ONLY`
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505161)
Time Spent: 1h 10m  (was: 1h)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-24314:



> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=505157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505157
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:53
Start Date: 27/Oct/20 10:53
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r512590211



##
File path: 
standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-3.2.0-to-4.0.0.mysql.sql
##
@@ -109,6 +109,20 @@ CREATE TABLE IF NOT EXISTS REPLICATION_METRICS (
 CREATE INDEX POLICY_IDX ON REPLICATION_METRICS (RM_POLICY);
 CREATE INDEX DUMP_IDX ON REPLICATION_METRICS (RM_DUMP_EXECUTION_ID);
 
+-- Create stored procedure tables
+CREATE TABLE STORED_PROCS (
+  `SP_ID` BIGINT(20) NOT NULL,
+  `CREATE_TIME` INT(11) NOT NULL,
+  `DB_ID` BIGINT(20) NOT NULL,
+  `NAME` VARCHAR(256) NOT NULL,
+  `OWNER_NAME` VARCHAR(128) NOT NULL,
+  `SOURCE` LONGTEXT NOT NULL,
+  PRIMARY KEY (`SP_ID`)
+);
+
+CREATE UNIQUE INDEX UNIQUESTOREDPROC ON STORED_PROCS (NAME, DB_ID);
+ALTER TABLE `STORED_PROCS` ADD CONSTRAINT `STOREDPROC_FK1` FOREIGN KEY 
(`DB_ID`) REFERENCES DBS (`DB_ID`);

Review comment:
   good catch, fixed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505157)
Time Spent: 4.5h  (was: 4h 20m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=505158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505158
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:53
Start Date: 27/Oct/20 10:53
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r512590386



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MPosParam.java
##
@@ -0,0 +1,69 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  * http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing, software
+ *  * distributed under the License is distributed on an "AS IS" BASIS,
+ *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  * See the License for the specific language governing permissions and
+ *  * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hive.metastore.model;
+
+public class MPosParam {

Review comment:
   removed

##
File path: hplsql/src/main/java/org/apache/hive/hplsql/functions/Function.java
##
@@ -1,780 +1,30 @@
 /*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
  *
- * http://www.apache.org/licenses/LICENSE-2.0
+ *  * Licensed to the Apache Software Foundation (ASF) under one

Review comment:
   fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505158)
Time Spent: 4h 40m  (was: 4.5h)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505156
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:52
Start Date: 27/Oct/20 10:52
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r512589390



##
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query1b.q.out
##
@@ -210,7 +210,7 @@ STAGE PLANS:
 Statistics: Num rows: 16855704 Data size: 
2008197920 Basic stats: COMPLETE Column stats: COMPLETE
 value expressions: _col2 (type: decimal(17,2))
   Filter Operator
-predicate: (sr_store_sk is not null and 
sr_returned_date_sk is not null and sr_store_sk BETWEEN 
DynamicValue(RS_40_store_s_store_sk_min) AND 
DynamicValue(RS_40_store_s_store_sk_max) and in_bloom_filter(sr_store_sk, 
DynamicValue(RS_40_store_s_store_sk_bloom_filter))) (type: boolean)
+predicate: (sr_store_sk is not null and 
sr_returned_date_sk is not null) (type: boolean)

Review comment:
   changes are gone in this file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505156)
Time Spent: 1h  (was: 50m)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505155
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:51
Start Date: 27/Oct/20 10:51
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r512588759



##
File path: ql/src/test/results/clientpositive/llap/sharedwork_semi.q.out
##
@@ -541,7 +541,7 @@ STAGE PLANS:
 Map Operator Tree:
 TableScan
   alias: s
-  filterExpr: (ss_sold_date_sk is not null and 
((ss_sold_date_sk BETWEEN DynamicValue(RS_7_d_d_date_sk_min) AND 
DynamicValue(RS_7_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_7_d_d_date_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN 
DynamicValue(RS_21_d_d_date_sk_min) AND DynamicValue(RS_21_d_d_date_sk_max) and 
in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_21_d_d_date_sk_bloom_filter) (type: boolean)
+  filterExpr: (((ss_sold_date_sk BETWEEN 
DynamicValue(RS_7_d_d_date_sk_min) AND DynamicValue(RS_7_d_d_date_sk_max) and 
in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_7_d_d_date_sk_bloom_filter))) 
or (ss_sold_date_sk BETWEEN DynamicValue(RS_21_d_d_date_sk_min) AND 
DynamicValue(RS_21_d_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_21_d_d_date_sk_bloom_filter and ss_sold_date_sk is not 
null) (type: boolean)

Review comment:
   I've tried to retain the order - which have placed the bloom related 
checks at the end.
   
   I recall that there was a ticket about ordering conditionals - but I can't 
find the related ticket; do I remember incorrectly?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505155)
Time Spent: 50m  (was: 40m)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505154
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:45
Start Date: 27/Oct/20 10:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r512584825



##
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query32.q.out
##
@@ -160,7 +160,7 @@ Stage-0
 Select Operator [SEL_115] (rows=286549727 
width=119)
   Output:["_col0","_col1","_col2"]
   Filter Operator [FIL_113] 
(rows=286549727 width=119)
-predicate:(cs_sold_date_sk is not null 
and cs_item_sk BETWEEN DynamicValue(RS_28_item_i_item_sk_min) AND 
DynamicValue(RS_28_item_i_item_sk_max) and in_bloom_filter(cs_item_sk, 
DynamicValue(RS_28_item_i_item_sk_bloom_filter)))

Review comment:
   conditional was not reconstructed properly during filter creation - fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505154)
Time Spent: 40m  (was: 0.5h)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

2020-10-27 Thread Manish Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221306#comment-17221306
 ] 

Manish Maheshwari commented on HIVE-24313:
--

Also it would be good to persist the stats collected into HMS to ensure that 
they can be used for subsequent queries.

> Optimise stats collection for file sizes on cloud storage
> -
>
> Key: HIVE-24313
> URL: https://issues.apache.org/jira/browse/HIVE-24313
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>
> When stats information is not present (e.g external table), RelOptHiveTable 
> computes basic stats at runtime.
> Following is the codepath.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
> {code:java}
> Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
> hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats, colStatsCached,
> nonPartColNamesThatRqrStats, true);
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
> {code:java}
> for (Partition p : partList.getNotDeniedPartns()) {
> BasicStats basicStats = 
> basicStatsFactory.build(Partish.buildFor(table, p));
> partStats.add(basicStats);
>   }
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]
>  
> {code:java}
> try {
> ds = getFileSizeForPath(path);
>   } catch (IOException e) {
> ds = 0L;
>   }
>  {code}
>  
> For a table & query with large number of partitions, this takes long time to 
> compute statistics and increases compilation time.  It would be good to fix 
> it with "ForkJoinPool" ( 
> partList.getNotDeniedPartns().parallelStream().forEach((p) )
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=505140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505140
 ]

ASF GitHub Bot logged work on HIVE-24259:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:14
Start Date: 27/Oct/20 10:14
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1610:
URL: https://github.com/apache/hive/pull/1610#discussion_r512565056



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2836,14 +2836,32 @@ long getPartsFound() {
   @Override
   public SQLAllTableConstraints getAllTableConstraints(String catName, String 
dbName, String tblName)
   throws MetaException, NoSuchObjectException {
-SQLAllTableConstraints sqlAllTableConstraints = new 
SQLAllTableConstraints();
-sqlAllTableConstraints.setPrimaryKeys(getPrimaryKeys(catName, dbName, 
tblName));
-sqlAllTableConstraints.setForeignKeys(getForeignKeys(catName, null, null, 
dbName, tblName));
-sqlAllTableConstraints.setUniqueConstraints(getUniqueConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setDefaultConstraints(getDefaultConstraints(catName, 
dbName, tblName));
-sqlAllTableConstraints.setCheckConstraints(getCheckConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setNotNullConstraints(getNotNullConstraints(catName, 
dbName, tblName));
-return sqlAllTableConstraints;
+
+catName = StringUtils.normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the constraints is not yet loaded in cache
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+SQLAllTableConstraints constraints = 
sharedCache.listCachedAllTableConstraints(catName, dbName, tblName);
+
+// if any of the constraint value is missing then there might be the case 
of partial constraints are stored in cached.
+// So fall back to raw store for correct values
+if (constraints != null && 
CollectionUtils.isNotEmpty(constraints.getPrimaryKeys()) && CollectionUtils

Review comment:
   Good idea. I will check how we can implement a flag.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505140)
Time Spent: 40m  (was: 0.5h)

> [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1
> 
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Description -
> currently inorder to get all constraint form the cachedstore. 6 different 
> call is made to the store. Instead combine that 6 call in 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=505141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505141
 ]

ASF GitHub Bot logged work on HIVE-24259:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:14
Start Date: 27/Oct/20 10:14
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1610:
URL: https://github.com/apache/hive/pull/1610#discussion_r512564828



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2836,14 +2836,32 @@ long getPartsFound() {
   @Override
   public SQLAllTableConstraints getAllTableConstraints(String catName, String 
dbName, String tblName)
   throws MetaException, NoSuchObjectException {
-SQLAllTableConstraints sqlAllTableConstraints = new 
SQLAllTableConstraints();
-sqlAllTableConstraints.setPrimaryKeys(getPrimaryKeys(catName, dbName, 
tblName));
-sqlAllTableConstraints.setForeignKeys(getForeignKeys(catName, null, null, 
dbName, tblName));
-sqlAllTableConstraints.setUniqueConstraints(getUniqueConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setDefaultConstraints(getDefaultConstraints(catName, 
dbName, tblName));
-sqlAllTableConstraints.setCheckConstraints(getCheckConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setNotNullConstraints(getNotNullConstraints(catName, 
dbName, tblName));
-return sqlAllTableConstraints;
+
+catName = StringUtils.normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the constraints is not yet loaded in cache
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+SQLAllTableConstraints constraints = 
sharedCache.listCachedAllTableConstraints(catName, dbName, tblName);
+
+// if any of the constraint value is missing then there might be the case 
of partial constraints are stored in cached.
+// So fall back to raw store for correct values
+if (constraints != null && 
CollectionUtils.isNotEmpty(constraints.getPrimaryKeys()) && CollectionUtils

Review comment:
   Also, what if the table just has primary keys and no other constraints?
   
   nonEmpty(otherConstraints) will return true and we will not cache anything?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505141)
Time Spent: 50m  (was: 40m)

> [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1
> 
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Description -
> currently inorder to get all constraint form the cachedstore. 6 different 
> call is made to the store. Instead combine that 6 call in 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24259) [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24259?focusedWorklogId=505139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505139
 ]

ASF GitHub Bot logged work on HIVE-24259:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 10:14
Start Date: 27/Oct/20 10:14
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1610:
URL: https://github.com/apache/hive/pull/1610#discussion_r512562376



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2836,14 +2836,32 @@ long getPartsFound() {
   @Override
   public SQLAllTableConstraints getAllTableConstraints(String catName, String 
dbName, String tblName)
   throws MetaException, NoSuchObjectException {
-SQLAllTableConstraints sqlAllTableConstraints = new 
SQLAllTableConstraints();
-sqlAllTableConstraints.setPrimaryKeys(getPrimaryKeys(catName, dbName, 
tblName));
-sqlAllTableConstraints.setForeignKeys(getForeignKeys(catName, null, null, 
dbName, tblName));
-sqlAllTableConstraints.setUniqueConstraints(getUniqueConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setDefaultConstraints(getDefaultConstraints(catName, 
dbName, tblName));
-sqlAllTableConstraints.setCheckConstraints(getCheckConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setNotNullConstraints(getNotNullConstraints(catName, 
dbName, tblName));
-return sqlAllTableConstraints;
+
+catName = StringUtils.normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the constraints is not yet loaded in cache
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+SQLAllTableConstraints constraints = 
sharedCache.listCachedAllTableConstraints(catName, dbName, tblName);
+
+// if any of the constraint value is missing then there might be the case 
of partial constraints are stored in cached.
+// So fall back to raw store for correct values
+if (constraints != null && 
CollectionUtils.isNotEmpty(constraints.getPrimaryKeys()) && CollectionUtils

Review comment:
   +1.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2836,14 +2836,32 @@ long getPartsFound() {
   @Override
   public SQLAllTableConstraints getAllTableConstraints(String catName, String 
dbName, String tblName)
   throws MetaException, NoSuchObjectException {
-SQLAllTableConstraints sqlAllTableConstraints = new 
SQLAllTableConstraints();
-sqlAllTableConstraints.setPrimaryKeys(getPrimaryKeys(catName, dbName, 
tblName));
-sqlAllTableConstraints.setForeignKeys(getForeignKeys(catName, null, null, 
dbName, tblName));
-sqlAllTableConstraints.setUniqueConstraints(getUniqueConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setDefaultConstraints(getDefaultConstraints(catName, 
dbName, tblName));
-sqlAllTableConstraints.setCheckConstraints(getCheckConstraints(catName, 
dbName, tblName));
-
sqlAllTableConstraints.setNotNullConstraints(getNotNullConstraints(catName, 
dbName, tblName));
-return sqlAllTableConstraints;
+
+catName = StringUtils.normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the constraints is not yet loaded in cache
+  return rawStore.getAllTableConstraints(catName, dbName, tblName);
+}
+SQLAllTableConstraints constraints = 
sharedCache.listCachedAllTableConstraints(catName, dbName, tblName);
+
+// if any of the constraint value is missing then there might be the case 
of partial constraints are stored in cached.
+// So fall back to raw store for correct values
+if (constraints != null && 
CollectionUtils.isNotEmpty(constraints.getPrimaryKeys()) && CollectionUtils

Review comment:
   Also, what if the table just a primary keys and not other constraints?
   
   nonEmpty(otherConstraints) will return true and we will not cache anything?





This is an 

[jira] [Updated] (HIVE-24259) [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1

2020-10-27 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-24259:
-
Summary: [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 
1  (was: [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1.)

> [CachedStore] Optimise getAlltableConstraint from 6 cache calls to 1
> 
>
> Key: HIVE-24259
> URL: https://issues.apache.org/jira/browse/HIVE-24259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Description -
> currently inorder to get all constraint form the cachedstore. 6 different 
> call is made to the store. Instead combine that 6 call in 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24310) Allow specified number of deserialize errors to be ignored

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24310?focusedWorklogId=505061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505061
 ]

ASF GitHub Bot logged work on HIVE-24310:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 06:36
Start Date: 27/Oct/20 06:36
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1607:
URL: https://github.com/apache/hive/pull/1607


   
   
   ### What changes were proposed in this pull request?
   Allow specified number of deserialize errors to be ignored
   
   
   
   ### Why are the changes needed?
   Sometimes we see some corrupted records in user's raw data,  like one 
corrupted in a file which contains over thousands of records, user has to 
either give up all records or replay the whole data in order to run 
successfully on hive, we should provide a way to ignore such corrupted records. 
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   unit tests
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505061)
Time Spent: 0.5h  (was: 20m)

> Allow specified number of deserialize errors to be ignored
> --
>
> Key: HIVE-24310
> URL: https://issues.apache.org/jira/browse/HIVE-24310
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Sometimes we see some corrupted records in user's raw data,  like one 
> corrupted in a file which contains over thousands of records, user has to 
> either give up all records or replay the whole data in order to run 
> successfully on hive, we should provide a way to ignore such corrupted 
> records. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24310) Allow specified number of deserialize errors to be ignored

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24310?focusedWorklogId=505058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505058
 ]

ASF GitHub Bot logged work on HIVE-24310:
-

Author: ASF GitHub Bot
Created on: 27/Oct/20 06:30
Start Date: 27/Oct/20 06:30
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1607:
URL: https://github.com/apache/hive/pull/1607


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505058)
Time Spent: 20m  (was: 10m)

> Allow specified number of deserialize errors to be ignored
> --
>
> Key: HIVE-24310
> URL: https://issues.apache.org/jira/browse/HIVE-24310
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sometimes we see some corrupted records in user's raw data,  like one 
> corrupted in a file which contains over thousands of records, user has to 
> either give up all records or replay the whole data in order to run 
> successfully on hive, we should provide a way to ignore such corrupted 
> records. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)