[jira] [Work logged] (HIVE-24349) Client connection count is not printed correctly in HiveMetastoreClient

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24349?focusedWorklogId=514498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514498
 ]

ASF GitHub Bot logged work on HIVE-24349:
-

Author: ASF GitHub Bot
Created on: 20/Nov/20 07:48
Start Date: 20/Nov/20 07:48
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1655:
URL: https://github.com/apache/hive/pull/1655#discussion_r527496746



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -790,6 +790,9 @@ public void close() {
 try {
   if (null != client) {
 client.shutdown();
+if ((transport == null) || !transport.isOpen()) {
+  LOG.info("Closed a connection to metastore, current connections: " + 
connCount.decrementAndGet());

Review comment:
   An existing test, TestMetaStoreMetrics.testConnections is used for 
testing the counter.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514498)
Time Spent: 40m  (was: 0.5h)

> Client connection count is not printed correctly in HiveMetastoreClient
> ---
>
> Key: HIVE-24349
> URL: https://issues.apache.org/jira/browse/HIVE-24349
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24349.01.patch, HIVE-24349.02.patch, 
> HIVE-24349.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24349) Client connection count is not printed correctly in HiveMetastoreClient

2020-11-19 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24349:
---
Attachment: HIVE-24349.03.patch

> Client connection count is not printed correctly in HiveMetastoreClient
> ---
>
> Key: HIVE-24349
> URL: https://issues.apache.org/jira/browse/HIVE-24349
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24349.01.patch, HIVE-24349.02.patch, 
> HIVE-24349.03.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24405) Missing datatype for table column in oracle

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24405?focusedWorklogId=514497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514497
 ]

ASF GitHub Bot logged work on HIVE-24405:
-

Author: ASF GitHub Bot
Created on: 20/Nov/20 07:47
Start Date: 20/Nov/20 07:47
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1691:
URL: https://github.com/apache/hive/pull/1691#issuecomment-730972108


   +1 Thanks for this fix, I don't know how I missed this :(



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514497)
Time Spent: 20m  (was: 10m)

> Missing datatype for table column in oracle
> ---
>
> Key: HIVE-24405
> URL: https://issues.apache.org/jira/browse/HIVE-24405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The parent change introduces an issue in the oracle schema script.  No 
> datatype is specified.
> {noformat}
> 1 row created.
>   CQ_COMMIT_TIME(19)
> *
> ERROR at line 19:
> ORA-00902: invalid datatype
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24389) Trailing zeros of constant decimal numbers are removed

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24389?focusedWorklogId=514462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514462
 ]

ASF GitHub Bot logged work on HIVE-24389:
-

Author: ASF GitHub Bot
Created on: 20/Nov/20 05:48
Start Date: 20/Nov/20 05:48
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1676:
URL: https://github.com/apache/hive/pull/1676#discussion_r527426293



##
File path: 
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantHiveDecimalObjectInspector.java
##
@@ -62,13 +62,4 @@ public int precision() {
 }

Review comment:
   Why does precision need to be overridden here (while scale does not)? 
Could we possibly remove this too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514462)
Time Spent: 20m  (was: 10m)

> Trailing zeros of constant decimal numbers are removed
> --
>
> Key: HIVE-24389
> URL: https://issues.apache.org/jira/browse/HIVE-24389
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some case Hive removes trailing zeros of constant decimal numbers
> {code}
> select cast(1.1 as decimal(22, 2)) 
> 1.1
> {code}
> In this case *WritableConstantHiveDecimalObjectInspector* is used and this 
> object inspector takes it's wrapped HiveDecimal scale instead of the scale 
> specified in the wrapped typeinfo: 
> {code}
> this = {WritableConstantHiveDecimalObjectInspector@14415} 
>  value = {HiveDecimalWritable@14426} "1.1"
>  typeInfo = {DecimalTypeInfo@14421} "decimal(22,2)"{code}
> However in case of an expression with an aggregate function 
> *WritableHiveDecimalObjectInspector* is used
> {code}
> select cast(sum(1.1) as decimal(22, 2))
> 1.10
> {code}
> {code}
> o = {HiveDecimalWritable@16633} "1.1"
> oi = {WritableHiveDecimalObjectInspector@16634} 
>  typeInfo = {DecimalTypeInfo@16640} "decimal(22,2)"
> {code}
> Casting the expressions to string
> {code:java}
> select cast(cast(1.1 as decimal(22, 2)) as string), cast(cast(sum(1.1) as 
> decimal(22, 2)) as string)
> 1.1   1.10
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?focusedWorklogId=514459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514459
 ]

ASF GitHub Bot logged work on HIVE-24387:
-

Author: ASF GitHub Bot
Created on: 20/Nov/20 05:36
Start Date: 20/Nov/20 05:36
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1673:
URL: https://github.com/apache/hive/pull/1673


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514459)
Time Spent: 20m  (was: 10m)

> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-19 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24387:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24022) Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24022?focusedWorklogId=514390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514390
 ]

ASF GitHub Bot logged work on HIVE-24022:
-

Author: ASF GitHub Bot
Created on: 20/Nov/20 00:41
Start Date: 20/Nov/20 00:41
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1385:
URL: https://github.com/apache/hive/pull/1385


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514390)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer
> --
>
> Key: HIVE-24022
> URL: https://issues.apache.org/jira/browse/HIVE-24022
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Sam An
>Priority: Minor
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> For a table with 3000+ partitions, analyze table takes a lot longer time as 
> HiveMetaStoreAuthorizer tries to create HiveConf for every partition request.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L319]
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L447]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24405) Missing datatype for table column in oracle

2020-11-19 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235748#comment-17235748
 ] 

Naveen Gangam commented on HIVE-24405:
--

[~pvargacl] I ran into this issue when testing some schema changes with oracle 
DB. Could you please review this change? Thanks

> Missing datatype for table column in oracle
> ---
>
> Key: HIVE-24405
> URL: https://issues.apache.org/jira/browse/HIVE-24405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The parent change introduces an issue in the oracle schema script.  No 
> datatype is specified.
> {noformat}
> 1 row created.
>   CQ_COMMIT_TIME(19)
> *
> ERROR at line 19:
> ORA-00902: invalid datatype
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24405) Missing datatype for table column in oracle

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24405?focusedWorklogId=514324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514324
 ]

ASF GitHub Bot logged work on HIVE-24405:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 20:56
Start Date: 19/Nov/20 20:56
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #1691:
URL: https://github.com/apache/hive/pull/1691


   …gam)
   
   
   ### What changes were proposed in this pull request?
   Creation of hive schema fails on oracle. There is no datatype defined for 
this column that has been added recently.
   1 row created.
   
 CQ_COMMIT_TIME(19)
   *
   ERROR at line 19:
   ORA-00902: invalid datatype
   
   ### Why are the changes needed?
   HMS schema installation fails otherwise.
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   manually with a real oracle database.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514324)
Remaining Estimate: 0h
Time Spent: 10m

> Missing datatype for table column in oracle
> ---
>
> Key: HIVE-24405
> URL: https://issues.apache.org/jira/browse/HIVE-24405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The parent change introduces an issue in the oracle schema script.  No 
> datatype is specified.
> {noformat}
> 1 row created.
>   CQ_COMMIT_TIME(19)
> *
> ERROR at line 19:
> ORA-00902: invalid datatype
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24405) Missing datatype for table column in oracle

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24405:
--
Labels: pull-request-available  (was: )

> Missing datatype for table column in oracle
> ---
>
> Key: HIVE-24405
> URL: https://issues.apache.org/jira/browse/HIVE-24405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The parent change introduces an issue in the oracle schema script.  No 
> datatype is specified.
> {noformat}
> 1 row created.
>   CQ_COMMIT_TIME(19)
> *
> ERROR at line 19:
> ORA-00902: invalid datatype
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24405) Missing datatype for table column in oracle

2020-11-19 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-24405:



> Missing datatype for table column in oracle
> ---
>
> Key: HIVE-24405
> URL: https://issues.apache.org/jira/browse/HIVE-24405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> The parent change introduces an issue in the oracle schema script.  No 
> datatype is specified.
> {noformat}
> 1 row created.
>   CQ_COMMIT_TIME(19)
> *
> ERROR at line 19:
> ORA-00902: invalid datatype
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24404) Hive getUserName close db makes client operations lost metaStoreClient connection

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24404?focusedWorklogId=514252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514252
 ]

ASF GitHub Bot logged work on HIVE-24404:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 18:02
Start Date: 19/Nov/20 18:02
Worklog Time Spent: 10m 
  Work Description: artiship commented on pull request #1685:
URL: https://github.com/apache/hive/pull/1685#issuecomment-730543048


   @kgyrtkirk The failed tests seems to have no relationship with my 
modification.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514252)
Time Spent: 20m  (was: 10m)

> Hive getUserName close db makes client operations lost metaStoreClient 
> connection
> -
>
> Key: HIVE-24404
> URL: https://issues.apache.org/jira/browse/HIVE-24404
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 2.3.7
> Environment: os: centos 7
> spark: 3.0.1
> hive: 2.3.7
>Reporter: Lichuanliang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I'm using spark to execute a drop partition sql will always encounter a lost 
> metastore connection warning.
>  Spark ql:
> {code:java}
> alter table mydb.some_table drop if exists partition(dt = '2020-11-12',hh = 
> '17');
> {code}
> Execution log:
> {code:java}
> 20/11/12 19:37:57 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.20/11/12 19:37:57 WARN SessionState: 
> METASTORE_FILTER_HOOK will be ignored, since 
> hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.20/11/12 19:37:57 WARN RetryingMetaStoreClient: 
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> listPartitionsWithAuthInfoorg.apache.thrift.transport.TTransportException: 
> Cannot write to null outputStream at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:142)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:185) 
> at 
> org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:116)
>  at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:70) at 
> org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_partitions_ps_with_auth(ThriftHiveMetastore.java:2562)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_ps_with_auth(ThriftHiveMetastore.java:2549)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsWithAuthInfo(HiveMetaStoreClient.java:1209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy32.listPartitionsWithAuthInfo(Unknown Source) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
>  at com.sun.proxy.$Proxy32.listPartitionsWithAuthInfo(Unknown Source) at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2555) at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2581) at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$dropPartitions$2(HiveClientImpl.scala:628)
>  at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at 
> 

[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=514247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514247
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 17:59
Start Date: 19/Nov/20 17:59
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on a change in pull 
request #1503:
URL: https://github.com/apache/hive/pull/1503#discussion_r527088402



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -448,8 +450,10 @@ public SessionState(HiveConf conf, String userName) {
 parentLoader, Collections.emptyList(), true);
 final ClassLoader currentLoader = AccessController.doPrivileged(addAction);
 this.sessionConf.setClassLoader(currentLoader);
+Map udfCacheMap = getUDFCacheMap();

Review comment:
   Addressed in https://github.com/apache/hive/pull/1690/





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514247)
Time Spent: 1.5h  (was: 1h 20m)

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=514246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514246
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 17:59
Start Date: 19/Nov/20 17:59
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on a change in pull 
request #1503:
URL: https://github.com/apache/hive/pull/1503#discussion_r527088254



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -491,6 +495,16 @@ public void resetThreadName() {
   Thread.currentThread().setName(names[names.length - 1].trim());
 }
   }
+  public static Map getUDFCacheMap(){
+return udfLocalResource;
+  }
+
+  public synchronized static File getUdfFileDir(){
+if(udfFileDir == null){

Review comment:
   Addressed in https://github.com/apache/hive/pull/1690/





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514246)
Time Spent: 1h 20m  (was: 1h 10m)

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-18728) Secure webHCat with SSL

2020-11-19 Thread Hunter Logan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hunter Logan reassigned HIVE-18728:
---

Assignee: Hunter Logan  (was: Oleksiy Sayankin)

> Secure webHCat with SSL
> ---
>
> Key: HIVE-18728
> URL: https://issues.apache.org/jira/browse/HIVE-18728
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Hunter Logan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, 
> HIVE-18728.3.patch
>
>
> Doc for the issue:
> *Configure WebHCat server to use SSL encryption*
> You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) 
> encryption. The following WebHCat properties are added to enable SSL. 
> {{templeton.use.ssl}}
> Default value: {{false}}
> Description: Set this to true for using SSL encryption for  WebHCat server
> {{templeton.keystore.path}}
> Default value: {{}}
> Description: SSL certificate keystore location for WebHCat server
> {{templeton.keystore.password}}
> Default value: {{}}
> Description: SSL certificate keystore password for WebHCat server
> {{templeton.ssl.protocol.blacklist}}
> Default value: {{SSLv2,SSLv3}}
> Description: SSL Versions to disable for WebHCat server
> {{templeton.host}}
> Default value: {{0.0.0.0}}
> Description: The host address the WebHCat server will listen on.
> *Modifying the {{webhcat-site.xml}} file*
> Configure the following properties in the {{webhcat-site.xml}} file to enable 
> SSL encryption on each node where WebHCat is installed: 
> {code}
> 
> 
>   templeton.use.ssl
>   true
> 
> 
>   templeton.keystore.path
>   /path/to/ssl_keystore
> 
> 
>   templeton.keystore.password
>   password
> 
> {code}
> *Example:* To check status of WebHCat server configured for SSL encryption 
> use following command
> {code}
> curl -k 'https://:@:50111/templeton/v1/status'
> {code}
> replace {{}} and {{}} with valid user/password.  Replace 
> {{}} with your host name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24329) Add HMS notification for compaction commit

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24329?focusedWorklogId=514225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514225
 ]

ASF GitHub Bot logged work on HIVE-24329:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 17:13
Start Date: 19/Nov/20 17:13
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1647:
URL: https://github.com/apache/hive/pull/1647#discussion_r527057925



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -1176,6 +1186,55 @@ protected void 
updateWSCommitIdAndCleanUpMetadata(Statement stmt, long txnid, Tx
   TxnDbUtil.getEpochFn(dbProduct) + " WHERE \"CQ_TXN_ID\" = " + 
txnid);
 }
   }
+
+  private CompactionInfo getCompactionByTxnId(Connection dbConn, long txnid) 
throws SQLException, MetaException {
+CompactionInfo info = null;

Review comment:
   Wouldn't it be better if we wrap this with Optional?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514225)
Time Spent: 2h 50m  (was: 2h 40m)

> Add HMS notification for compaction commit
> --
>
> Key: HIVE-24329
> URL: https://issues.apache.org/jira/browse/HIVE-24329
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This could be used by file metadata caches, to invalidate the cache content



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24380) NullScanTaskDispatcher should liststatus in parallel

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24380?focusedWorklogId=514222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514222
 ]

ASF GitHub Bot logged work on HIVE-24380:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 17:09
Start Date: 19/Nov/20 17:09
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #1670:
URL: https://github.com/apache/hive/pull/1670#issuecomment-730513087


   @rbalamohan can you have a second look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514222)
Time Spent: 40m  (was: 0.5h)

> NullScanTaskDispatcher should liststatus in parallel
> 
>
> Key: HIVE-24380
> URL: https://issues.apache.org/jira/browse/HIVE-24380
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> NullScanTaskDispatcher does listStatus for hundreds of partition directories 
> in case of external tables. This is big problem in cloud installations where 
> directory listings are in object store like S3. We can do this in parallel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24370) Make the GetPartitionsProjectionSpec generic and add builder methods for tables and partitions in HiveMetaStoreClient

2020-11-19 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24370.
--
Fix Version/s: 4.0.0
 Hadoop Flags: Incompatible change,Reviewed
   Resolution: Fixed

> Make the GetPartitionsProjectionSpec generic and add builder methods for 
> tables and partitions in HiveMetaStoreClient
> -
>
> Key: HIVE-24370
> URL: https://issues.apache.org/jira/browse/HIVE-24370
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-20306 defines a projection struct called GetPartitionsProjectionSpec 
> While the name has Partition in its name, this is a fairly generic struct 
> with nothing specific to partitions. This should be renamed to a more generic 
> name (GetProjectionSpec ?) and builder methods of this class for tables and 
> partitions must be added to HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24370) Make the GetPartitionsProjectionSpec generic and add builder methods for tables and partitions in HiveMetaStoreClient

2020-11-19 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235582#comment-17235582
 ] 

Naveen Gangam commented on HIVE-24370:
--

[~vnhive] Fix has been committed to master. Thank you for your contribution.

> Make the GetPartitionsProjectionSpec generic and add builder methods for 
> tables and partitions in HiveMetaStoreClient
> -
>
> Key: HIVE-24370
> URL: https://issues.apache.org/jira/browse/HIVE-24370
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-20306 defines a projection struct called GetPartitionsProjectionSpec 
> While the name has Partition in its name, this is a fairly generic struct 
> with nothing specific to partitions. This should be renamed to a more generic 
> name (GetProjectionSpec ?) and builder methods of this class for tables and 
> partitions must be added to HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24370) Make the GetPartitionsProjectionSpec generic and add builder methods for tables and partitions in HiveMetaStoreClient

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24370?focusedWorklogId=514208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514208
 ]

ASF GitHub Bot logged work on HIVE-24370:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 16:49
Start Date: 19/Nov/20 16:49
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1664:
URL: https://github.com/apache/hive/pull/1664#issuecomment-730500913


   @vnhive Fix has now been committed to master. Thank you for the contribution.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514208)
Time Spent: 2h  (was: 1h 50m)

> Make the GetPartitionsProjectionSpec generic and add builder methods for 
> tables and partitions in HiveMetaStoreClient
> -
>
> Key: HIVE-24370
> URL: https://issues.apache.org/jira/browse/HIVE-24370
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-20306 defines a projection struct called GetPartitionsProjectionSpec 
> While the name has Partition in its name, this is a fairly generic struct 
> with nothing specific to partitions. This should be renamed to a more generic 
> name (GetProjectionSpec ?) and builder methods of this class for tables and 
> partitions must be added to HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=514205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514205
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 16:44
Start Date: 19/Nov/20 16:44
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on a change in pull 
request #1503:
URL: https://github.com/apache/hive/pull/1503#discussion_r527035374



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##
@@ -583,7 +583,13 @@ public void unregisterFunction(String functionName) throws 
HiveException {
 }
 mFunctions.remove(functionName);
 fi.discarded();
+FunctionResource[] resources = fi.getResources();
 if (fi.isPersistent()) {
+  Map udfCacheMap = SessionState.getUDFCacheMap();
+  for(FunctionResource fr : resources){
+//remove from udf cache if it's saved.
+udfCacheMap.remove(fr.getResourceURI());

Review comment:
   Yeah we need to clear the downloaded files, when unregistering a 
function.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514205)
Time Spent: 1h  (was: 50m)

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=514204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514204
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 16:43
Start Date: 19/Nov/20 16:43
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #1690:
URL: https://github.com/apache/hive/pull/1690


   Signed-off-by: saihemanth 
   
   
   
   ### What changes were proposed in this pull request?
   When UDF jars are downloaded from external storage, it is being cached.
   
   
   
   ### Why are the changes needed?
   If it is not cached, then the UDF jars are downloaded every time for a new 
session.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   Locally.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514204)
Time Spent: 50m  (was: 40m)

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=514206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514206
 ]

ASF GitHub Bot logged work on HIVE-24169:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 16:44
Start Date: 19/Nov/20 16:44
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on a change in pull 
request #1503:
URL: https://github.com/apache/hive/pull/1503#discussion_r527035484



##
File path: ql/src/java/org/apache/hadoop/hive/ql/util/ResourceDownloader.java
##
@@ -71,42 +87,61 @@ public static boolean isFileUri(String value) {
 }
   }
 
-  public List resolveAndDownload(String source, boolean convertToUnix)
+  public List resolveAndDownload(String source, boolean convertToUnix, 
boolean useCache)
   throws URISyntaxException, IOException {
-return resolveAndDownloadInternal(createURI(source), null, convertToUnix, 
true);
+return resolveAndDownloadInternal(createURI(source), null, convertToUnix, 
true, useCache);
   }
 
   public List downloadExternal(URI source, String subDir, boolean 
convertToUnix)
   throws URISyntaxException, IOException {
-return resolveAndDownloadInternal(source, subDir, convertToUnix, false);
+return resolveAndDownloadInternal(source, subDir, convertToUnix, false, 
false);
+  }
+  public List downloadExternal(URI source, String subDir, boolean 
convertToUnix, boolean useCache)
+  throws URISyntaxException, IOException {
+return resolveAndDownloadInternal(source, subDir, convertToUnix, false, 
useCache);
   }
 
   private List resolveAndDownloadInternal(URI source, String subDir,
-  boolean convertToUnix, boolean isLocalAllowed) throws 
URISyntaxException, IOException {
+  boolean convertToUnix, boolean isLocalAllowed, boolean useCache) throws 
URISyntaxException, IOException {
 switch (getURLType(source)) {
 case FILE: return isLocalAllowed ? Collections.singletonList(source) : 
null;
 case IVY: return dependencyResolver.downloadDependencies(source);
 case HDFS:
 case OTHER:
-  return Collections.singletonList(createURI(downloadResource(source, 
subDir, convertToUnix)));
+  return Collections.singletonList(createURI(downloadResource(source, 
subDir, convertToUnix, useCache)));

Review comment:
   Caching for HDFS is more abstract optimization than what we are working 
on. It can be tracked in a separate JIRA.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514206)
Time Spent: 1h 10m  (was: 1h)

> HiveServer2 UDF cache
> -
>
> Key: HIVE-24169
> URL: https://issues.apache.org/jira/browse/HIVE-24169
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access 
> in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18728) Secure webHCat with SSL

2020-11-19 Thread Hunter Logan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235575#comment-17235575
 ] 

Hunter Logan commented on HIVE-18728:
-

This seems to be abandoned by the original contributor. Taking assignment and 
moving the code changes into a PR over on Github to get this into Hive 3.2/4 
(whatever is next).

> Secure webHCat with SSL
> ---
>
> Key: HIVE-18728
> URL: https://issues.apache.org/jira/browse/HIVE-18728
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, 
> HIVE-18728.3.patch
>
>
> Doc for the issue:
> *Configure WebHCat server to use SSL encryption*
> You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) 
> encryption. The following WebHCat properties are added to enable SSL. 
> {{templeton.use.ssl}}
> Default value: {{false}}
> Description: Set this to true for using SSL encryption for  WebHCat server
> {{templeton.keystore.path}}
> Default value: {{}}
> Description: SSL certificate keystore location for WebHCat server
> {{templeton.keystore.password}}
> Default value: {{}}
> Description: SSL certificate keystore password for WebHCat server
> {{templeton.ssl.protocol.blacklist}}
> Default value: {{SSLv2,SSLv3}}
> Description: SSL Versions to disable for WebHCat server
> {{templeton.host}}
> Default value: {{0.0.0.0}}
> Description: The host address the WebHCat server will listen on.
> *Modifying the {{webhcat-site.xml}} file*
> Configure the following properties in the {{webhcat-site.xml}} file to enable 
> SSL encryption on each node where WebHCat is installed: 
> {code}
> 
> 
>   templeton.use.ssl
>   true
> 
> 
>   templeton.keystore.path
>   /path/to/ssl_keystore
> 
> 
>   templeton.keystore.password
>   password
> 
> {code}
> *Example:* To check status of WebHCat server configured for SSL encryption 
> use following command
> {code}
> curl -k 'https://:@:50111/templeton/v1/status'
> {code}
> replace {{}} and {{}} with valid user/password.  Replace 
> {{}} with your host name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-18728) Secure webHCat with SSL

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-18728:
--
Labels: pull-request-available  (was: )

> Secure webHCat with SSL
> ---
>
> Key: HIVE-18728
> URL: https://issues.apache.org/jira/browse/HIVE-18728
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
> Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, 
> HIVE-18728.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Doc for the issue:
> *Configure WebHCat server to use SSL encryption*
> You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) 
> encryption. The following WebHCat properties are added to enable SSL. 
> {{templeton.use.ssl}}
> Default value: {{false}}
> Description: Set this to true for using SSL encryption for  WebHCat server
> {{templeton.keystore.path}}
> Default value: {{}}
> Description: SSL certificate keystore location for WebHCat server
> {{templeton.keystore.password}}
> Default value: {{}}
> Description: SSL certificate keystore password for WebHCat server
> {{templeton.ssl.protocol.blacklist}}
> Default value: {{SSLv2,SSLv3}}
> Description: SSL Versions to disable for WebHCat server
> {{templeton.host}}
> Default value: {{0.0.0.0}}
> Description: The host address the WebHCat server will listen on.
> *Modifying the {{webhcat-site.xml}} file*
> Configure the following properties in the {{webhcat-site.xml}} file to enable 
> SSL encryption on each node where WebHCat is installed: 
> {code}
> 
> 
>   templeton.use.ssl
>   true
> 
> 
>   templeton.keystore.path
>   /path/to/ssl_keystore
> 
> 
>   templeton.keystore.password
>   password
> 
> {code}
> *Example:* To check status of WebHCat server configured for SSL encryption 
> use following command
> {code}
> curl -k 'https://:@:50111/templeton/v1/status'
> {code}
> replace {{}} and {{}} with valid user/password.  Replace 
> {{}} with your host name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-18728) Secure webHCat with SSL

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18728?focusedWorklogId=514188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514188
 ]

ASF GitHub Bot logged work on HIVE-18728:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 16:30
Start Date: 19/Nov/20 16:30
Worklog Time Spent: 10m 
  Work Description: HunterL opened a new pull request #1689:
URL: https://github.com/apache/hive/pull/1689


   ### What changes were proposed in this pull request?
   Adds templeton configuration options for enabling TLS
   
   ### Why are the changes needed?
   Allows more secure connections to WebHCat
   
   The code for this change has been sitting around since 2018. Targeting has 
moved from 3.0 to 3.1 to now 3.2, I figure moving it over to Github will help 
get this in.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, added configuration options and docs should be updated. The update to 
the docs has already been made in the original ticket, need someone with 
editing permissions to do so.
   
   ### How was this patch tested?
   No tests were added, but you can verify this with a simple curl command.
   curl -k 'https://:@:50111/templeton/v1/status'
   
   Any guidance on adding a test for this would be appreciated.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514188)
Remaining Estimate: 0h
Time Spent: 10m

> Secure webHCat with SSL
> ---
>
> Key: HIVE-18728
> URL: https://issues.apache.org/jira/browse/HIVE-18728
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Hunter Logan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, 
> HIVE-18728.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Doc for the issue:
> *Configure WebHCat server to use SSL encryption*
> You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) 
> encryption. The following WebHCat properties are added to enable SSL. 
> {{templeton.use.ssl}}
> Default value: {{false}}
> Description: Set this to true for using SSL encryption for  WebHCat server
> {{templeton.keystore.path}}
> Default value: {{}}
> Description: SSL certificate keystore location for WebHCat server
> {{templeton.keystore.password}}
> Default value: {{}}
> Description: SSL certificate keystore password for WebHCat server
> {{templeton.ssl.protocol.blacklist}}
> Default value: {{SSLv2,SSLv3}}
> Description: SSL Versions to disable for WebHCat server
> {{templeton.host}}
> Default value: {{0.0.0.0}}
> Description: The host address the WebHCat server will listen on.
> *Modifying the {{webhcat-site.xml}} file*
> Configure the following properties in the {{webhcat-site.xml}} file to enable 
> SSL encryption on each node where WebHCat is installed: 
> {code}
> 
> 
>   templeton.use.ssl
>   true
> 
> 
>   templeton.keystore.path
>   /path/to/ssl_keystore
> 
> 
>   templeton.keystore.password
>   password
> 
> {code}
> *Example:* To check status of WebHCat server configured for SSL encryption 
> use following command
> {code}
> curl -k 'https://:@:50111/templeton/v1/status'
> {code}
> replace {{}} and {{}} with valid user/password.  Replace 
> {{}} with your host name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24401) COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated

2020-11-19 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-24401.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks for the patch [~pvargacl]!

> COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated
> --
>
> Key: HIVE-24401
> URL: https://issues.apache.org/jira/browse/HIVE-24401
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> minor query based compaction is implemented



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24401) COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24401?focusedWorklogId=514108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514108
 ]

ASF GitHub Bot logged work on HIVE-24401:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 13:23
Start Date: 19/Nov/20 13:23
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #1683:
URL: https://github.com/apache/hive/pull/1683


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514108)
Time Spent: 1h 10m  (was: 1h)

> COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated
> --
>
> Key: HIVE-24401
> URL: https://issues.apache.org/jira/browse/HIVE-24401
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> minor query based compaction is implemented



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24401) COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24401?focusedWorklogId=514107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514107
 ]

ASF GitHub Bot logged work on HIVE-24401:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 13:23
Start Date: 19/Nov/20 13:23
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1683:
URL: https://github.com/apache/hive/pull/1683#issuecomment-730371544


   Skipping tests as this is just a config change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514107)
Time Spent: 1h  (was: 50m)

> COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated
> --
>
> Key: HIVE-24401
> URL: https://issues.apache.org/jira/browse/HIVE-24401
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> minor query based compaction is implemented



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=514074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514074
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 12:33
Start Date: 19/Nov/20 12:33
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1347:
URL: https://github.com/apache/hive/pull/1347#issuecomment-730345183


   > this patch also fully removes TestTezPerfConstraints driver - is that 
because from now on we will run with constraints all the time?
   
   That's right. If we want to run with and without constraints then I guess we 
can truncate the respective table ("KEY_CONSTRAITS") didn't try it though. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514074)
Time Spent: 5h 10m  (was: 5h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=514059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514059
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 11:54
Start Date: 19/Nov/20 11:54
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1347:
URL: https://github.com/apache/hive/pull/1347#issuecomment-730323909


   this patch also fully removes TestTezPerfConstraints driver - is that 
because from now on we will run with constraints all the time?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514059)
Time Spent: 5h  (was: 4h 50m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24401) COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24401?focusedWorklogId=514032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514032
 ]

ASF GitHub Bot logged work on HIVE-24401:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 09:51
Start Date: 19/Nov/20 09:51
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1683:
URL: https://github.com/apache/hive/pull/1683#discussion_r526728324



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3113,8 +3113,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 HIVE_COMPACTOR_COMPACT_MM("hive.compactor.compact.insert.only", true,
 "Whether the compactor should compact insert-only tables. A safety 
switch."),
 COMPACTOR_CRUD_QUERY_BASED("hive.compactor.crud.query.based", false,
-"Means Major compaction on full CRUD tables is done as a query, "
-+ "and minor compaction will be disabled."),
+"Means compaction on full CRUD tables is done as a query. "
++ "Compactions on insert-only tables will always run as a query."),

Review comment:
   Thanks for taking care of this!
   
   I'd add to the end: "regardless of the value of this configuration" just so 
it's clear.
   
   and "as **a** query" isn't exactly accurate, I'd change it to something 
like: "via queries"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514032)
Time Spent: 50m  (was: 40m)

> COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated
> --
>
> Key: HIVE-24401
> URL: https://issues.apache.org/jira/browse/HIVE-24401
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> minor query based compaction is implemented



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=514021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514021
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 09:09
Start Date: 19/Nov/20 09:09
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #1347:
URL: https://github.com/apache/hive/pull/1347


   ### What changes were proposed in this pull request and why?
   
   1. Use Dockerized postgres metastore with TPC-DS 30TB dump
   2. Use Hive config properties obtained and curated from real-life usages
   3. Allow AbstractCliConfig to override metastore DB type
   4. Rework CorePerfCliDriver to allow pre-initialized metastores
   
   Extract TPCDS specific code to subclass and document appropriately
   clarifying inaccuracies regarding the size of the tables in the javadoc.
   
   Remove system property settings in the initialization of the driver and
   leave in the configuration to set it up if needed. This is necessary to
   be able to use the driver with a preinitialised metastore.
   
   Remove redundant logs in System.err. Logging and throwing an exception
   is an anti-pattern.
   
   Replace assertions with exceptions and improve the messages.
   
   5. Give more meaningful names to TPCDS related CLI configurations
   6. Disable queries 30, 74, 84 with appropriate JIRA reference
   7. Add TPC-DS query plans for the new driver (TestTezTPCDS30TBCliDriver)
   8. Consider hive.current.database property when returning the session's
   current db to avoid prefixing every query with the name of the database
   9. Upgrade postgres JDBC driver to version 42.2.14 to be compatible
   with the docker image used.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   `mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver  -Dtest.output.overwrite`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514021)
Time Spent: 4h 50m  (was: 4h 40m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=514019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514019
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 09:08
Start Date: 19/Nov/20 09:08
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #1347:
URL: https://github.com/apache/hive/pull/1347


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514019)
Time Spent: 4.5h  (was: 4h 20m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=514020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514020
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 09:08
Start Date: 19/Nov/20 09:08
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1347:
URL: https://github.com/apache/hive/pull/1347#issuecomment-730233527


   Closed and reopened to trigger tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514020)
Time Spent: 4h 40m  (was: 4.5h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24401) COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24401?focusedWorklogId=514006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514006
 ]

ASF GitHub Bot logged work on HIVE-24401:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 08:39
Start Date: 19/Nov/20 08:39
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1683:
URL: https://github.com/apache/hive/pull/1683#issuecomment-730217834


   @klcopp Could you review and merge this? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514006)
Time Spent: 40m  (was: 0.5h)

> COMPACTOR_CRUD_QUERY_BASED description in HiveConf is outdated
> --
>
> Key: HIVE-24401
> URL: https://issues.apache.org/jira/browse/HIVE-24401
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> minor query based compaction is implemented



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24399) Optimize Deserializer creation

2020-11-19 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-24399.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the review [~szita]!

> Optimize Deserializer creation
> --
>
> Key: HIVE-24399
> URL: https://issues.apache.org/jira/browse/HIVE-24399
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running a query on a table using a non-default SerDe we often recreate 
> the Deserializer object. This could be costly and often not necessary.
> We should optimize this as much as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24399) Optimize Deserializer creation

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24399?focusedWorklogId=514000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514000
 ]

ASF GitHub Bot logged work on HIVE-24399:
-

Author: ASF GitHub Bot
Created on: 19/Nov/20 08:24
Start Date: 19/Nov/20 08:24
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1682:
URL: https://github.com/apache/hive/pull/1682


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514000)
Time Spent: 20m  (was: 10m)

> Optimize Deserializer creation
> --
>
> Key: HIVE-24399
> URL: https://issues.apache.org/jira/browse/HIVE-24399
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running a query on a table using a non-default SerDe we often recreate 
> the Deserializer object. This could be costly and often not necessary.
> We should optimize this as much as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)