[jira] [Work logged] (HIVE-26169) Set non-vectorized mode as default when accessing iceberg tables in avro fileformat

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26169?focusedWorklogId=760820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760820
 ]

ASF GitHub Bot logged work on HIVE-26169:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 13:15
Start Date: 22/Apr/22 13:15
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on code in PR #3236:
URL: https://github.com/apache/hive/pull/3236#discussion_r856217749


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -744,8 +745,17 @@ private String 
collectColumnAndReplaceDummyValues(ExprNodeDesc node, String foun
 return column;
   }
 
-  private void fallbackToNonVectorizedModeForV2(Properties tableProps) {
-if ("2".equals(tableProps.get(TableProperties.FORMAT_VERSION))) {
+  /**
+   * If any of the following checks is true we fall back to non vectorized 
mode:
+   * 
+   *   iceberg format-version is "2"
+   *   fileformat is set to avro
+   * 
+   * @param tableProps table properties, must be not null
+   */
+  private void fallbackToNonVectorizedModeBasedOnProperties(Properties 
tableProps) {
+if ("2".equals(tableProps.get(TableProperties.FORMAT_VERSION)) ||
+FileFormat.AVRO.name().equalsIgnoreCase((String) 
tableProps.get(TableProperties.DEFAULT_FILE_FORMAT))) {

Review Comment:
   yepp, fixed it.





Issue Time Tracking
---

Worklog Id: (was: 760820)
Time Spent: 0.5h  (was: 20m)

> Set non-vectorized mode as default when accessing iceberg tables in avro 
> fileformat
> ---
>
> Key: HIVE-26169
> URL: https://issues.apache.org/jira/browse/HIVE-26169
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vectorization for iceberg tables in avro format is not yet supported. We 
> should disable vectorization when we want to read/write avro tables. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26169) Set non-vectorized mode as default when accessing iceberg tables in avro fileformat

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26169?focusedWorklogId=760818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760818
 ]

ASF GitHub Bot logged work on HIVE-26169:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 13:10
Start Date: 22/Apr/22 13:10
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3236:
URL: https://github.com/apache/hive/pull/3236#discussion_r856213274


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -744,8 +745,17 @@ private String 
collectColumnAndReplaceDummyValues(ExprNodeDesc node, String foun
 return column;
   }
 
-  private void fallbackToNonVectorizedModeForV2(Properties tableProps) {
-if ("2".equals(tableProps.get(TableProperties.FORMAT_VERSION))) {
+  /**
+   * If any of the following checks is true we fall back to non vectorized 
mode:
+   * 
+   *   iceberg format-version is "2"
+   *   fileformat is set to avro
+   * 
+   * @param tableProps table properties, must be not null
+   */
+  private void fallbackToNonVectorizedModeBasedOnProperties(Properties 
tableProps) {
+if ("2".equals(tableProps.get(TableProperties.FORMAT_VERSION)) ||
+FileFormat.AVRO.name().equalsIgnoreCase((String) 
tableProps.get(TableProperties.DEFAULT_FILE_FORMAT))) {

Review Comment:
   nit: you can use `tablePros.getProperty()` instead of casting





Issue Time Tracking
---

Worklog Id: (was: 760818)
Time Spent: 20m  (was: 10m)

> Set non-vectorized mode as default when accessing iceberg tables in avro 
> fileformat
> ---
>
> Key: HIVE-26169
> URL: https://issues.apache.org/jira/browse/HIVE-26169
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Vectorization for iceberg tables in avro format is not yet supported. We 
> should disable vectorization when we want to read/write avro tables. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26169) Set non-vectorized mode as default when accessing iceberg tables in avro fileformat

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26169?focusedWorklogId=760807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760807
 ]

ASF GitHub Bot logged work on HIVE-26169:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 12:55
Start Date: 22/Apr/22 12:55
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request, #3236:
URL: https://github.com/apache/hive/pull/3236

   
   
   
   ### What changes were proposed in this pull request?
   Set vectorization to false when reading/writing iceberg tables in avro 
format.
   
   
   
   ### Why are the changes needed?
   Vectorization for iceberg tables in avro is not yet supported.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Manual test, unit test
   
   




Issue Time Tracking
---

Worklog Id: (was: 760807)
Remaining Estimate: 0h
Time Spent: 10m

> Set non-vectorized mode as default when accessing iceberg tables in avro 
> fileformat
> ---
>
> Key: HIVE-26169
> URL: https://issues.apache.org/jira/browse/HIVE-26169
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorization for iceberg tables in avro format is not yet supported. We 
> should disable vectorization when we want to read/write avro tables. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26169) Set non-vectorized mode as default when accessing iceberg tables in avro fileformat

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26169:
--
Labels: pull-request-available  (was: )

> Set non-vectorized mode as default when accessing iceberg tables in avro 
> fileformat
> ---
>
> Key: HIVE-26169
> URL: https://issues.apache.org/jira/browse/HIVE-26169
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorization for iceberg tables in avro format is not yet supported. We 
> should disable vectorization when we want to read/write avro tables. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26169) Set non-vectorized mode as default when accessing iceberg tables in avro fileformat

2022-04-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-26169:



> Set non-vectorized mode as default when accessing iceberg tables in avro 
> fileformat
> ---
>
> Key: HIVE-26169
> URL: https://issues.apache.org/jira/browse/HIVE-26169
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> Vectorization for iceberg tables in avro format is not yet supported. We 
> should disable vectorization when we want to read/write avro tables. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26155) Create a new connection pool for compaction

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26155?focusedWorklogId=760798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760798
 ]

ASF GitHub Bot logged work on HIVE-26155:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 12:46
Start Date: 22/Apr/22 12:46
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on code in PR #3223:
URL: https://github.com/apache/hive/pull/3223#discussion_r856193879


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5836,11 +5835,11 @@ private void removeTxnsFromMinHistoryLevel(Connection 
dbConn, List txnids)
 }
   }
 
-  private synchronized static DataSource setupJdbcConnectionPool(Configuration 
conf, int maxPoolSize, long getConnectionTimeoutMs) {
+  protected synchronized static DataSource 
setupJdbcConnectionPool(Configuration conf, int maxPoolSize) {

Review Comment:
   I'm not overriding this method, just changed the visibility and removed 
getConnectionTimeoutMs param as it was not used.





Issue Time Tracking
---

Worklog Id: (was: 760798)
Time Spent: 40m  (was: 0.5h)

> Create a new connection pool for compaction
> ---
>
> Key: HIVE-26155
> URL: https://issues.apache.org/jira/browse/HIVE-26155
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: compaction, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the TxnHandler uses 2 connection pools to communicate with the HMS: 
> the default one and one for mutexing. If compaction is configured incorrectly 
> (e.g. too many Initiators are running on the same db) then compaction can use 
> up all the connections in the default connection pool and all user queries 
> can get stuck.
> We should have a separate connection pool (configurable size) just for 
> compaction-related activities.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26167) QueryStateMap in SessionState is maintained correctly

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26167?focusedWorklogId=760774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760774
 ]

ASF GitHub Bot logged work on HIVE-26167:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 11:58
Start Date: 22/Apr/22 11:58
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on PR #3234:
URL: https://github.com/apache/hive/pull/3234#issuecomment-1106442980

   > The patch looks good, but one question: when we send in a statement via 
SQL in the normaly way, where does the query state get initialized?
   
   It is initialized in the driver constructor. 




Issue Time Tracking
---

Worklog Id: (was: 760774)
Time Spent: 0.5h  (was: 20m)

> QueryStateMap in SessionState is maintained correctly
> -
>
> Key: HIVE-26167
> URL: https://issues.apache.org/jira/browse/HIVE-26167
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the Driver is the QueryStateMap is also initialized with the query ID 
> and the current queryState object. This record is kept in the map until the 
> execution of the query is completed. 
> There are many unit tests that initialise the driver object once during the 
> setup phase, and use the same object to execute all the different queries. As 
> a consequence, after the first execution, the QueryStateMap will be cleaned 
> and all subsequent queries will run into null pointer exception while trying 
> to fetch the current querystate from the SessionState. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26167) QueryStateMap in SessionState is maintained correctly

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26167?focusedWorklogId=760755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760755
 ]

ASF GitHub Bot logged work on HIVE-26167:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 11:46
Start Date: 22/Apr/22 11:46
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on PR #3234:
URL: https://github.com/apache/hive/pull/3234#issuecomment-1106434642

   The patch looks good, but one question: when we send in a statement via SQL 
in the normaly way, where does the query state get initialized?




Issue Time Tracking
---

Worklog Id: (was: 760755)
Time Spent: 20m  (was: 10m)

> QueryStateMap in SessionState is maintained correctly
> -
>
> Key: HIVE-26167
> URL: https://issues.apache.org/jira/browse/HIVE-26167
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the Driver is the QueryStateMap is also initialized with the query ID 
> and the current queryState object. This record is kept in the map until the 
> execution of the query is completed. 
> There are many unit tests that initialise the driver object once during the 
> setup phase, and use the same object to execute all the different queries. As 
> a consequence, after the first execution, the QueryStateMap will be cleaned 
> and all subsequent queries will run into null pointer exception while trying 
> to fetch the current querystate from the SessionState. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-19711) Refactor Hive Schema Tool

2022-04-22 Thread Miklos Gergely (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526335#comment-17526335
 ] 

Miklos Gergely commented on HIVE-19711:
---

[~szita] I can't tell for sure, it was so long ago, and I don't remember. My 
hunch is that it was not intentional. Feel free to restore the original 
functionality if you believe that it was the correct one, i.e. the failure of 
these validations should not fail the whole process.

> Refactor Hive Schema Tool
> -
>
> Key: HIVE-19711
> URL: https://issues.apache.org/jira/browse/HIVE-19711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-19711.01.patch, HIVE-19711.02.patch, 
> HIVE-19711.03.patch, HIVE-19711.04.patch, HIVE-19711.05.patch, 
> HIVE-19711.06.patch, HIVE-19711.07.patch, HIVE-19711.08.patch
>
>
> HiveSchemaTool is an 1500 lines long class trying to do everything It shold 
> be cut into multiple classes doing smaller components.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26165) Remove READ locks for ACID tables with SoftDelete enabled

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26165:
--
Labels: pull-request-available  (was: )

> Remove READ locks for ACID tables with SoftDelete enabled
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26165) Remove READ locks for ACID tables with SoftDelete enabled

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?focusedWorklogId=760741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760741
 ]

ASF GitHub Bot logged work on HIVE-26165:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 11:19
Start Date: 22/Apr/22 11:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request, #3235:
URL: https://github.com/apache/hive/pull/3235

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 760741)
Remaining Estimate: 0h
Time Spent: 10m

> Remove READ locks for ACID tables with SoftDelete enabled
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26168) EXPLAIN DDL command output is not deterministic

2022-04-22 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26168:
---
Description: 
The EXPLAIN DDL command (HIVE-24596) can be used to recreate the schema for a 
given query in order to debug planner issues. This is achieved by fetching 
information from the metastore and outputting series of DDL commands. 

The output commands though may appear in different order among runs since there 
is no mechanism to enforce an explicit order.

Consider for instance the following scenario.

{code:sql}
CREATE TABLE customer
(
`c_custkey` bigint,
`c_name`string,
`c_address` string
);

INSERT INTO customer VALUES (1, 'Bob', '12 avenue Mansart'), (2, 'Alice', '24 
avenue Mansart');

EXPLAIN DDL SELECT c_custkey FROM customer WHERE c_name = 'Bob'; 
{code}

+Result 1+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS 
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address 
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey 
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name 
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT 
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

+Result 2+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS 
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey 
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address 
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF  
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name 
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT 
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

The two results are equivalent but the statements appear in a different order. 
This is not a big issue cause the results remain correct but it may lead to 
test flakiness so it might be worth addressing.

  was:
The EXPLAIN DDL command (HIVE-24596) can be used to recreate the schema for a 
given query in order to debug planner issues. This is achieved by fetching 
information from the metastore and outputting series of DDL commands. 

The output commands though may appear in different order among runs since there 
is no mechanism to enforce an explicit order.

Consider for instance the following scenario.

{code:sql}
CREATE TABLE customer
(
`c_custkey` bigint,
`c_name`string,
`c_address` string
);

INSERT INTO customer VALUES (1, 'Bob', '12 avenue Mansart'), (2, 'Alice', '24 
avenue Mansart');

EXPLAIN DDL SELECT c_custkey FROM customer WHERE c_name = 'Bob'; 
{code}

+Result 1+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS 
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address 
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey 
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE 
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name 
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT 
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

+Result 2+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey 
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT 

[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=760720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760720
 ]

ASF GitHub Bot logged work on HIVE-26157:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 10:04
Start Date: 22/Apr/22 10:04
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on PR #3226:
URL: https://github.com/apache/hive/pull/3226#issuecomment-1106322792

   PR #3234 will resolve the test failures. 




Issue Time Tracking
---

Worklog Id: (was: 760720)
Time Spent: 1h 10m  (was: 1h)

> Change Iceberg storage handler authz URI to metadata location
> -
>
> Key: HIVE-26157
> URL: https://issues.apache.org/jira/browse/HIVE-26157
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In HIVE-25964, the authz URI has been changed to "iceberg://db.table".
> It is possible to set the metadata pointers of table A to point to table B, 
> and therefore you could read table B's data via querying table A.
> {code:sql}
> alter table A set tblproperties 
> ('metadata_location'='/path/to/B/snapshot.json', 
> 'previous_metadata_location'='/path/to/B/prev_snapshot.json');  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-04-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26158:
---

Assignee: Zoltan Haindrich

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26167) QueryStateMap in SessionState is maintained correctly

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26167:
--
Labels: pull-request-available  (was: )

> QueryStateMap in SessionState is maintained correctly
> -
>
> Key: HIVE-26167
> URL: https://issues.apache.org/jira/browse/HIVE-26167
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the Driver is the QueryStateMap is also initialized with the query ID 
> and the current queryState object. This record is kept in the map until the 
> execution of the query is completed. 
> There are many unit tests that initialise the driver object once during the 
> setup phase, and use the same object to execute all the different queries. As 
> a consequence, after the first execution, the QueryStateMap will be cleaned 
> and all subsequent queries will run into null pointer exception while trying 
> to fetch the current querystate from the SessionState. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26167) QueryStateMap in SessionState is maintained correctly

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26167?focusedWorklogId=760714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760714
 ]

ASF GitHub Bot logged work on HIVE-26167:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 09:57
Start Date: 22/Apr/22 09:57
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request, #3234:
URL: https://github.com/apache/hive/pull/3234

   
   
   ### What changes were proposed in this pull request?
   Add the current querystate object to SessionState's QueryStateMap if it's 
missing.
   
   
   
   ### Why are the changes needed?
   Many unit tests are initializing the driver object once, but reusing it for 
all the query executions in the whole test method. Since the queryStateMap in 
the SessionState is cleared after the first query execution, all subsequent 
queries will not be able to access the querystate through the SessionState. 
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Manual test, unit test
   
   




Issue Time Tracking
---

Worklog Id: (was: 760714)
Remaining Estimate: 0h
Time Spent: 10m

> QueryStateMap in SessionState is maintained correctly
> -
>
> Key: HIVE-26167
> URL: https://issues.apache.org/jira/browse/HIVE-26167
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the Driver is the QueryStateMap is also initialized with the query ID 
> and the current queryState object. This record is kept in the map until the 
> execution of the query is completed. 
> There are many unit tests that initialise the driver object once during the 
> setup phase, and use the same object to execute all the different queries. As 
> a consequence, after the first execution, the QueryStateMap will be cleaned 
> and all subsequent queries will run into null pointer exception while trying 
> to fetch the current querystate from the SessionState. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26167) QueryStateMap in SessionState is maintained correctly

2022-04-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér updated HIVE-26167:
-
Summary: QueryStateMap in SessionState is maintained correctly  (was: 
QueryStateMap is SessionState is maintained correctly)

> QueryStateMap in SessionState is maintained correctly
> -
>
> Key: HIVE-26167
> URL: https://issues.apache.org/jira/browse/HIVE-26167
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> When the Driver is the QueryStateMap is also initialized with the query ID 
> and the current queryState object. This record is kept in the map until the 
> execution of the query is completed. 
> There are many unit tests that initialise the driver object once during the 
> setup phase, and use the same object to execute all the different queries. As 
> a consequence, after the first execution, the QueryStateMap will be cleaned 
> and all subsequent queries will run into null pointer exception while trying 
> to fetch the current querystate from the SessionState. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26167) QueryStateMap is SessionState is maintained correctly

2022-04-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-26167:



> QueryStateMap is SessionState is maintained correctly
> -
>
> Key: HIVE-26167
> URL: https://issues.apache.org/jira/browse/HIVE-26167
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> When the Driver is the QueryStateMap is also initialized with the query ID 
> and the current queryState object. This record is kept in the map until the 
> execution of the query is completed. 
> There are many unit tests that initialise the driver object once during the 
> setup phase, and use the same object to execute all the different queries. As 
> a consequence, after the first execution, the QueryStateMap will be cleaned 
> and all subsequent queries will run into null pointer exception while trying 
> to fetch the current querystate from the SessionState. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26166) Make website GDPR compliant

2022-04-22 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526288#comment-17526288
 ] 

Stamatis Zampetakis commented on HIVE-26166:


I am not familiar with how google analytics actually works but I found the 
following files which seem to be relevant:
 * 
[https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/docs/_includes/top.html]
 * 
https://github.com/apache/hive-site/blob/2b54cb5d57089f50b8a38607d089849b2fe0e350/_includes/top.html

> Make website GDPR compliant
> ---
>
> Key: HIVE-26166
> URL: https://issues.apache.org/jira/browse/HIVE-26166
> Project: Hive
>  Issue Type: Task
>  Components: Website
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> Per the email that was sent out from privacy we need to make the Hive website 
> GDPR compliant. 
>  # The link to privacy policy needs to be updated from 
> [https://hive.apache.org/privacy_policy.html] to 
> [https://privacy.apache.org/policies/privacy-policy-public.html]
>  # The google analytics service must be removed



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26107) Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26107?focusedWorklogId=760707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760707
 ]

ASF GitHub Bot logged work on HIVE-26107:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 09:25
Start Date: 22/Apr/22 09:25
Worklog Time Spent: 10m 
  Work Description: klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r855867715


##
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws 
SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
 if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+  //Inside a compaction transaction, only stats gathering is running which 
is not requiring a new write id,
+  //and for duplicate compaction detection it is necessary to not 
increment it.
+  boolean isWithinCompactionTxn = 
Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   I think you can use driverContext.getTxnType() instead (TxnType.COMPACTION)



##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##
@@ -797,7 +797,6 @@ public void testCompactStatsGather() throws Exception {
 
 int[][] targetVals2 = {{5, 1, 1}, {5, 2, 2}, {5, 3, 1}, {5, 4, 2}};
 runStatementOnDriver("insert into T partition(p=1,q) " + 
makeValuesClause(targetVals2));
-

Review Comment:
   Nit: Unnecessary change to this file



##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java:
##
@@ -122,6 +125,56 @@ public void 
testCompactionShouldNotFailOnPartitionsWithBooleanField() throws Exc
 "ready for cleaning", compacts.get(0).getState());
   }
 
+  @Test
+  public void secondCompactionShouldBeRefusedBeforeEnqueueing() throws 
Exception {
+conf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, true);
+// Set delta numbuer threshold to 2 to avoid skipping compaction because 
of too few deltas
+conf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD, 2);
+// Set delta percentage to a high value to suppress selecting major 
compression based on that
+conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 
1000f);

Review Comment:
   These 2 settings aren't necessary since the Initiator uses these thresholds, 
but in the test we always queue compaction manually



##
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##
@@ -38,6 +38,7 @@
 import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse;
 import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
 import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionResponse;

Review Comment:
   Nit: Unused import



##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java:
##
@@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws 
Exception {
 runStatementOnDriver("export table Tstage to '" + getWarehouseDir() 
+"/2'");
 runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data' 
overwrite into table T");
 String[][] expected3 = new String[][] {
-{"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6", 
"t/base_005/00_0"},

Review Comment:
   Just trying to understand – Why was the writeid 5 originally? And if there 
was no compaction in the meantime, why is it 4 now?



##
ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q:
##
@@ -26,7 +26,6 @@ insert overwrite table sequential_update 
values(current_timestamp, 0, current_ti
 delete from sequential_update where seq=2;
 select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'), 
regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update;
 
-alter table sequential_update compact 'major';

Review Comment:
   Why change this?





Issue Time Tracking
---

Worklog Id: (was: 760707)
Time Spent: 20m  (was: 10m)

> Worker shouldn't inject duplicate entries in `ready for cleaning` state into 
> the compaction queue
> -
>
> Key: HIVE-26107
> URL: https://issues.apache.org/jira/browse/HIVE-26107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> How to reproduce:
> 1) create an acid table and load some data ;
> 2) manually trigger the compaction for 

[jira] [Commented] (HIVE-19711) Refactor Hive Schema Tool

2022-04-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526282#comment-17526282
 ] 

Ádám Szita commented on HIVE-19711:
---

[~mgergely] it looks like this refactor has also incorporated a change in 
behavior. Looking at your commit 
[https://github.com/apache/hive/commit/d83a0be9852467b3b8b3bef84721bb49e63f57b8]
 what I see is that:

After your change the validation task fails if there's any failures with 
locations or columnNullValues: 
[https://github.com/apache/hive/blob/d83a0be9852467b3b8b3bef84721bb49e63f57b8/beeline/src/java/org/apache/hive/beeline/schematool/HiveSchemaToolTaskValidate.java#L75-L76]

Before your change these used to be reported as WARNings only, and didn't cause 
the validation task to actually fail:
[https://github.com/apache/hive/blob/e7d1781ec4662e088dcd6ffbe3f866738792ad9b/beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java#L622-L631]

 

Question: was this change in functionality on purpose? If so what was it? In 
case it was unintentional, do you think the original behavior should be 
restored?

> Refactor Hive Schema Tool
> -
>
> Key: HIVE-19711
> URL: https://issues.apache.org/jira/browse/HIVE-19711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-19711.01.patch, HIVE-19711.02.patch, 
> HIVE-19711.03.patch, HIVE-19711.04.patch, HIVE-19711.05.patch, 
> HIVE-19711.06.patch, HIVE-19711.07.patch, HIVE-19711.08.patch
>
>
> HiveSchemaTool is an 1500 lines long class trying to do everything It shold 
> be cut into multiple classes doing smaller components.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26155) Create a new connection pool for compaction

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26155?focusedWorklogId=760700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760700
 ]

ASF GitHub Bot logged work on HIVE-26155:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 08:56
Start Date: 22/Apr/22 08:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3223:
URL: https://github.com/apache/hive/pull/3223#discussion_r855948450


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5836,11 +5835,11 @@ private void removeTxnsFromMinHistoryLevel(Connection 
dbConn, List txnids)
 }
   }
 
-  private synchronized static DataSource setupJdbcConnectionPool(Configuration 
conf, int maxPoolSize, long getConnectionTimeoutMs) {
+  protected synchronized static DataSource 
setupJdbcConnectionPool(Configuration conf, int maxPoolSize) {

Review Comment:
   Could we add @Override if we override parent method?





Issue Time Tracking
---

Worklog Id: (was: 760700)
Time Spent: 0.5h  (was: 20m)

> Create a new connection pool for compaction
> ---
>
> Key: HIVE-26155
> URL: https://issues.apache.org/jira/browse/HIVE-26155
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: compaction, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the TxnHandler uses 2 connection pools to communicate with the HMS: 
> the default one and one for mutexing. If compaction is configured incorrectly 
> (e.g. too many Initiators are running on the same db) then compaction can use 
> up all the connections in the default connection pool and all user queries 
> can get stuck.
> We should have a separate connection pool (configurable size) just for 
> compaction-related activities.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26155) Create a new connection pool for compaction

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26155?focusedWorklogId=760699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760699
 ]

ASF GitHub Bot logged work on HIVE-26155:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 08:54
Start Date: 22/Apr/22 08:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3223:
URL: https://github.com/apache/hive/pull/3223#discussion_r855946205


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##
@@ -45,15 +45,19 @@ public class HikariCPDataSourceProvider implements 
DataSourceProvider {
 
   @Override
   public DataSource create(Configuration hdpConfig) throws SQLException {
+int maxPoolSize = MetastoreConf.getIntVar(hdpConfig,

Review Comment:
   Could we extract this into the default method in the interface since we 
always use the same conf - 
MetastoreConf.ConfVars.CONNECTION_POOLING_MAX_CONNECTIONS?





Issue Time Tracking
---

Worklog Id: (was: 760699)
Time Spent: 20m  (was: 10m)

> Create a new connection pool for compaction
> ---
>
> Key: HIVE-26155
> URL: https://issues.apache.org/jira/browse/HIVE-26155
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: compaction, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the TxnHandler uses 2 connection pools to communicate with the HMS: 
> the default one and one for mutexing. If compaction is configured incorrectly 
> (e.g. too many Initiators are running on the same db) then compaction can use 
> up all the connections in the default connection pool and all user queries 
> can get stuck.
> We should have a separate connection pool (configurable size) just for 
> compaction-related activities.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26165) Remove READ locks for ACID tables with SoftDelete enabled

2022-04-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26165:
--
Description: Since operations that required an EXCLUSIVE lock were 
rewritten to READs non-blocking, we do not need READ locks anymore. That should 
improve ACID TXN concurrency.

> Remove READ locks for ACID tables with SoftDelete enabled
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID TXN 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26165) Remove READ locks for ACID tables with SoftDelete enabled

2022-04-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26165:
--
Description: Since operations that required an EXCLUSIVE lock were 
rewritten to READs non-blocking, we do not need READ locks anymore. That should 
improve ACID concurrency.  (was: Since operations that required an EXCLUSIVE 
lock were rewritten to READs non-blocking, we do not need READ locks anymore. 
That should improve ACID TXN concurrency.)

> Remove READ locks for ACID tables with SoftDelete enabled
> -
>
> Key: HIVE-26165
> URL: https://issues.apache.org/jira/browse/HIVE-26165
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Since operations that required an EXCLUSIVE lock were rewritten to READs 
> non-blocking, we do not need READ locks anymore. That should improve ACID 
> concurrency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26163) Incorrect format in columnstats_columnname_parse.q's insert statement can cause exceptions

2022-04-22 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526261#comment-17526261
 ] 

Zoltan Haindrich commented on HIVE-26163:
-

is something going wrong while processing this?
{code}
insert into table2 values("1","1","1");
{code}

Is this problem flaky?

but in any case I think this is a serious issue - and we should fix it without 
altering the qfile

> Incorrect format in columnstats_columnname_parse.q's insert statement can 
> cause exceptions
> --
>
> Key: HIVE-26163
> URL: https://issues.apache.org/jira/browse/HIVE-26163
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>
> Exception:
> {code:java}
> 2022-04-20T10:13:06,467 ERROR [016f5292-40a7-4fe6-be58-1c988fa4a6e5 main] 
> metastore.RetryingHMSHandler: java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:4456)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:9099)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:9054)
>   at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
>   at com.sun.proxy.$Proxy59.set_aggr_stats_for(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:2974)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:571)
>   at sun.reflect.GeneratedMethodAccessor192.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216)
>   at com.sun.proxy.$Proxy60.setPartitionColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5583)
>   at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:223)
>   at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:94)
>   at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:775)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:524)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:518)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:853)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:823)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:192)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>   at 
> org.apache.hadoop.hive.cli.split4.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.GeneratedMethodAccessor180.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   

[jira] [Work logged] (HIVE-26159) hive cli is unavailable from hive command

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26159?focusedWorklogId=760680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760680
 ]

ASF GitHub Bot logged work on HIVE-26159:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 07:55
Start Date: 22/Apr/22 07:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on PR #3227:
URL: https://github.com/apache/hive/pull/3227#issuecomment-1106125965

   @nrg4878 can you please review this? You might have a better context why 
this flag is set to true. Thanks!




Issue Time Tracking
---

Worklog Id: (was: 760680)
Time Spent: 0.5h  (was: 20m)

> hive cli is unavailable from hive command
> -
>
> Key: HIVE-26159
> URL: https://issues.apache.org/jira/browse/HIVE-26159
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive cli is a convenient tool to connect to hive metastore service, but now 
> hive cli can not start even if we use *--service cli* option, it should be a 
> bug of ticket HIVE-24348.
> *Steps to reproduce:*
> {code:bash}
> hive@hive:/root$ /usr/share/hive/bin/hive --service cli --hiveconf 
> hive.metastore.uris=thrift://hive:9084
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Beeline version 4.0.0-alpha-2-SNAPSHOT by Apache Hive
> beeline> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26159) hive cli is unavailable from hive command

2022-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26159?focusedWorklogId=760670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760670
 ]

ASF GitHub Bot logged work on HIVE-26159:
-

Author: ASF GitHub Bot
Created on: 22/Apr/22 07:38
Start Date: 22/Apr/22 07:38
Worklog Time Spent: 10m 
  Work Description: wecharyu commented on PR #3227:
URL: https://github.com/apache/hive/pull/3227#issuecomment-1106112770

   @marton-bod @kgyrtkirk Could you please review this PR? The failed test 
seemed not related to this change.




Issue Time Tracking
---

Worklog Id: (was: 760670)
Time Spent: 20m  (was: 10m)

> hive cli is unavailable from hive command
> -
>
> Key: HIVE-26159
> URL: https://issues.apache.org/jira/browse/HIVE-26159
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive cli is a convenient tool to connect to hive metastore service, but now 
> hive cli can not start even if we use *--service cli* option, it should be a 
> bug of ticket HIVE-24348.
> *Steps to reproduce:*
> {code:bash}
> hive@hive:/root$ /usr/share/hive/bin/hive --service cli --hiveconf 
> hive.metastore.uris=thrift://hive:9084
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Beeline version 4.0.0-alpha-2-SNAPSHOT by Apache Hive
> beeline> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)