date:20220616

[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=782257&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782257
 ]

ASF GitHub Bot logged work on HIVE-26274:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 06:50
Start Date: 17/Jun/22 06:50
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request, #3382:
URL: https://github.com/apache/hive/pull/3382

   Addendum to #3332




Issue Time Tracking
---

Worklog Id: (was: 782257)
Time Spent: 40m  (was: 0.5h)

> No vectorization if query has upper case window function
> 
>
> Key: HIVE-26274
> URL: https://issues.apache.org/jira/browse/HIVE-26274
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b int);
> EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
> {code}
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   Vertices:
> Map 1 
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vector.serde.deserialize IS true
> inputFormatFeatureSupport: [DECIMAL_64]
> featureSupportInUse: [DECIMAL_64]
> inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez] IS true
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> vectorized: false
>   Stage: Stage-0
> Fetch Operator
> {code}
> {code}
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782251
 ]

ASF GitHub Bot logged work on HIVE-26244:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 06:37
Start Date: 17/Jun/22 06:37
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #3307:
URL: https://github.com/apache/hive/pull/3307#discussion_r899807718


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created 
table, etc).
 return response;
   }
 }
+
+if (checkForConcurrentCtas && isValidTxn(txnId)) {
+  LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar)
+  .orElseThrow(() -> new MetaException("Unknown lock type: " + 
lockChar));
+
+  if (lockType == LockType.EXCL_WRITE && blockedBy.state == 
LockState.ACQUIRED) {
+
+String deleteBlockedByTxnComp = "DELETE  FROM \"TXN_COMPONENTS\" 
WHERE" + " \"TC_TXNID\"=" + txnId;

Review Comment:
   Realized that the cleaner will take care of this. I have removed the delete 
query in the recent commit.





Issue Time Tracking
---

Worklog Id: (was: 782251)
Time Spent: 7h 10m  (was: 7h)

> Implementing locking for concurrent ctas
> 
>
> Key: HIVE-26244
> URL: https://issues.apache.org/jira/browse/HIVE-26244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782252
 ]

ASF GitHub Bot logged work on HIVE-26244:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 06:37
Start Date: 17/Jun/22 06:37
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #3307:
URL: https://github.com/apache/hive/pull/3307#discussion_r899808001


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created 
table, etc).
 return response;
   }
 }
+
+if (checkForConcurrentCtas && isValidTxn(txnId)) {

Review Comment:
   done





Issue Time Tracking
---

Worklog Id: (was: 782252)
Time Spent: 7h 20m  (was: 7h 10m)

> Implementing locking for concurrent ctas
> 
>
> Key: HIVE-26244
> URL: https://issues.apache.org/jira/browse/HIVE-26244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26335) Partition params dit not updated after calling Hive.loadPartition

2022-06-16 Thread zhangdonglin (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangdonglin updated HIVE-26335:

Description: 
Hi,

   I found that when partition A already exists,   after calling 
Hive.loadPartition to load data into partition A, the partition params in table 
PARTITION_PARAMS was not updated. even I set hasFollowingStatsTask=false.

   The reason is below, newTPart was set to oldPart when old partition exists, 
{code:java}
Partition newTPart = oldPart != null ? oldPart : new Partition(tbl, partSpec, 
newPartPath); {code}
   Due to this, when calling alter_partition, oldPart info was send to 
metastore and it will not update partition params.

   

  was:
Hi,

   I found that when partition A already exists,   after calling 
Hive.loadPartition to load data into partition A, the partition params in table 
PARTITION_PARAMS was not updated. even I set hasFollowingStatsTask=false.

   The reason is below, newTPart was set to oldPart, 
{code:java}
Partition newTPart = oldPart != null ? oldPart : new Partition(tbl, partSpec, 
newPartPath); {code}
   Due to this, when calling alter_partition, oldPart info was send to 
metastore and it will not update partition params.


> Partition params dit not updated after calling Hive.loadPartition
> -
>
> Key: HIVE-26335
> URL: https://issues.apache.org/jira/browse/HIVE-26335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions
>Reporter: zhangdonglin
>Priority: Major
>
> Hi,
>    I found that when partition A already exists,   after calling 
> Hive.loadPartition to load data into partition A, the partition params in 
> table PARTITION_PARAMS was not updated. even I set 
> hasFollowingStatsTask=false.
>    The reason is below, newTPart was set to oldPart when old partition 
> exists, 
> {code:java}
> Partition newTPart = oldPart != null ? oldPart : new Partition(tbl, partSpec, 
> newPartPath); {code}
>    Due to this, when calling alter_partition, oldPart info was send to 
> metastore and it will not update partition params.
>    



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26321) Upgrade commons-io to 2.11.0

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26321?focusedWorklogId=782219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782219
 ]

ASF GitHub Bot logged work on HIVE-26321:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 01:57
Start Date: 17/Jun/22 01:57
Worklog Time Spent: 10m 
  Work Description: ashutoshcipher commented on PR #3370:
URL: https://github.com/apache/hive/pull/3370#issuecomment-1158391644

   Thanks for the PR @nrg4878. It will be helpful if you can share what CVEs 
are fixed in 2.11.0 and not in 2.8.0
   I tried looking but didn't any.




Issue Time Tracking
---

Worklog Id: (was: 782219)
Time Spent: 40m  (was: 0.5h)

> Upgrade commons-io to 2.11.0
> 
>
> Key: HIVE-26321
> URL: https://issues.apache.org/jira/browse/HIVE-26321
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade commons-io to 2.11.0



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26265) REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26265?focusedWorklogId=782199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782199
 ]

ASF GitHub Bot logged work on HIVE-26265:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 01:10
Start Date: 17/Jun/22 01:10
Worklog Time Spent: 10m 
  Work Description: cmunkey commented on code in PR #3365:
URL: https://github.com/apache/hive/pull/3365#discussion_r899674228


##
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbortTxnHandler.java:
##
@@ -39,6 +48,19 @@ public void handle(Context withinContext) throws Exception {
 if (!ReplUtils.includeAcidTableInDump(withinContext.hiveConf)) {
   return;
 }
+
+ if (ReplUtils.filterTransactionOperations(withinContext.hiveConf)) {
+   String contextDbName = 
StringUtils.normalizeIdentifier(withinContext.replScope.getDbName());
+   GetTxnWriteIdsRequest request = new 
GetTxnWriteIdsRequest(eventMessage.getTxnId());
+   request.setDbName(contextDbName);
+   GetTxnWriteIdsResponse response = 
withinContext.db.getMSC().getTxnWriteIds(request);

Review Comment:
   To move this during compilation, there are 2 difficulties:
   1. Would need to add a new field/status to AbortTxnEvent, this field would 
indicate whether or not this aborted txn involved writeids. REPL DUMP could 
filter based on this setting.
   2. The AbortTxnEvent is logged via TxnHandler.abort_txn(), so 
AbortTxnRequest would need to be modified to pass the writeid. AbortTxnRequest 
is a Thrift object. OR, abort_txn() could do the same HMS lookup that is 
currently done in AbortTxnHandler().
   
   Since this metastore call is done during REPL DUMP, would it be ok to live 
with this inefficiency and fix later with a more optimal implementation?





Issue Time Tracking
---

Worklog Id: (was: 782199)
Time Spent: 40m  (was: 0.5h)

> REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.
> 
>
> Key: HIVE-26265
> URL: https://issues.apache.org/jira/browse/HIVE-26265
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: francis pang
>Assignee: francis pang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> REPL DUMP is replication all OpenXacts, even when they are from other non 
> replicated databases. This wastes space in the dump, and ends up opening 
> unneeded transactions during REPL LOAD.
>  
> Add a config property for replication that filters out OpenXact events during 
> REPL DUMP. During REPL LOAD, the txns can be implicitly opened when the 
> ALLOC_WRITE_ID is processed. For CommitTxn and AbortTxn, dump only if WRITE 
> ID was allocated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=782181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782181
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 22:50
Start Date: 16/Jun/22 22:50
Worklog Time Spent: 10m 
  Work Description: sourabh912 opened a new pull request, #3381:
URL: https://github.com/apache/hive/pull/3381

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 782181)
Time Spent: 7h  (was: 6h 50m)

> Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26291) Ranger client file descriptor leak

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26291?focusedWorklogId=782148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782148
 ]

ASF GitHub Bot logged work on HIVE-26291:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 19:16
Start Date: 16/Jun/22 19:16
Worklog Time Spent: 10m 
  Work Description: slachiewicz commented on code in PR #3345:
URL: https://github.com/apache/hive/pull/3345#discussion_r899441832


##
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ranger/RangerRestClientImpl.java:
##
@@ -428,10 +428,7 @@ public RangerExportPolicyList 
readRangerPoliciesFromJsonFile(Path filePath,
HiveConf conf) 
throws SemanticException {
 RangerExportPolicyList rangerExportPolicyList = null;
 Gson gsonBuilder = new 
GsonBuilder().setDateFormat("MMdd-HH:mm:ss.SSS-Z").setPrettyPrinting().create();
-try {
-  FileSystem fs = filePath.getFileSystem(conf);
-  InputStream inputStream = fs.open(filePath);
-  Reader reader = new InputStreamReader(inputStream, 
Charset.forName("UTF-8"));
+try (Reader reader = new 
InputStreamReader(filePath.getFileSystem(conf).open(filePath), 
Charset.forName("UTF-8"))) {

Review Comment:
   still leaks InputStream - in try should be line for InputStream and Reader





Issue Time Tracking
---

Worklog Id: (was: 782148)
Time Spent: 40m  (was: 0.5h)

> Ranger client file descriptor leak
> --
>
> Key: HIVE-26291
> URL: https://issues.apache.org/jira/browse/HIVE-26291
> Project: Hive
>  Issue Type: Improvement
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Ranger Client has an fd leak



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26338) Repl Dump should fail if source database does not exist.

2022-06-16 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla reassigned HIVE-26338:
-


> Repl Dump should fail if source database does not exist.
> 
>
> Key: HIVE-26338
> URL: https://issues.apache.org/jira/browse/HIVE-26338
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26320) Incorrect case evaluation for Parquet based table

2022-06-16 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555159#comment-17555159
 ] 

Stamatis Zampetakis commented on HIVE-26320:


Thanks for the feedback [~chiran54321] ! It may worth also checking \{{EXPLAIN 
VECTORIZATION DETAIL}} since I remember in the past some differences between 
vectorization for ORC, PARQUET, and other formats.

> Incorrect case evaluation for Parquet based table
> -
>
> Key: HIVE-26320
> URL: https://issues.apache.org/jira/browse/HIVE-26320
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 4.0.0-alpha-1
>Reporter: Chiran Ravani
>Priority: Major
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>(kob='BB' and enhanced_type_code='18')
>or (kob='BC' and enhanced_type_code='18')
>  )
> then 1
> else 0
> end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26331) Use maven-surefire-plugin version consistently in standalone-metastore modules

2022-06-16 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26331.

Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/594a14551227530e60123a1f5d6860883876a4a3.]
 Thanks for the reviews [~kgyrtkirk] [~ayushsaxena] !

> Use maven-surefire-plugin version consistently in standalone-metastore modules
> --
>
> Key: HIVE-26331
> URL: https://issues.apache.org/jira/browse/HIVE-26331
> Project: Hive
>  Issue Type: Task
>  Components: Standalone Metastore, Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Due to some problems in the pom.xml files inside the standalone-metastore 
> modules we end up using different maven-surefire-plugin versions.
> Most of the modules use 3.0.0-M4, which is the expected one, while 
> the {{hive-standalone-metastore-common}} uses the older 2.22.0 version.
> +Actual+ 
> {noformat}
> [INFO] --- maven-surefire-plugin:2.22.0:test (default-test) @ 
> hive-standalone-metastore-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-server ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> metastore-tools-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore-benchmarks ---
> {noformat}
> The goal of this JIRA is to ensure we use the same version consistently in 
> all modules.
> +Expected+
> {noformat}
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-server ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> metastore-tools-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore-benchmarks ---
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26331) Use maven-surefire-plugin version consistently in standalone-metastore modules

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26331?focusedWorklogId=782089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782089
 ]

ASF GitHub Bot logged work on HIVE-26331:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:52
Start Date: 16/Jun/22 15:52
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3374: HIVE-26331: Use 
maven-surefire-plugin version consistently in standalone-metastore modules
URL: https://github.com/apache/hive/pull/3374




Issue Time Tracking
---

Worklog Id: (was: 782089)
Time Spent: 20m  (was: 10m)

> Use maven-surefire-plugin version consistently in standalone-metastore modules
> --
>
> Key: HIVE-26331
> URL: https://issues.apache.org/jira/browse/HIVE-26331
> Project: Hive
>  Issue Type: Task
>  Components: Standalone Metastore, Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Due to some problems in the pom.xml files inside the standalone-metastore 
> modules we end up using different maven-surefire-plugin versions.
> Most of the modules use 3.0.0-M4, which is the expected one, while 
> the {{hive-standalone-metastore-common}} uses the older 2.22.0 version.
> +Actual+ 
> {noformat}
> [INFO] --- maven-surefire-plugin:2.22.0:test (default-test) @ 
> hive-standalone-metastore-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-server ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> metastore-tools-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore-benchmarks ---
> {noformat}
> The goal of this JIRA is to ensure we use the same version consistently in 
> all modules.
> +Expected+
> {noformat}
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-server ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> metastore-tools-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore-benchmarks ---
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26314) Support alter function in Hive DDL

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26314?focusedWorklogId=782091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782091
 ]

ASF GitHub Bot logged work on HIVE-26314:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:52
Start Date: 16/Jun/22 15:52
Worklog Time Spent: 10m 
  Work Description: wecharyu commented on code in PR #3360:
URL: https://github.com/apache/hive/pull/3360#discussion_r899246871


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/function/create/CreateFunctionOperation.java:
##
@@ -116,11 +119,27 @@ private int createPermanentFunction() throws 
HiveException, IOException {
   return 1;
 }
 
-boolean addToMetastoreSuccess = addToMetastore(dbName, functionName, 
registeredName);
-if (!addToMetastoreSuccess) {
+// TODO: should this use getUserFromAuthenticator instead of 
SessionState.get().getUserName()?
+Function function = new Function(functionName, dbName, 
desc.getClassName(), SessionState.get().getUserName(),

Review Comment:
   Can we add `OWNER_PRIV` for altering function? Then only the owner of this 
function can do alter operation.





Issue Time Tracking
---

Worklog Id: (was: 782091)
Time Spent: 1h 20m  (was: 1h 10m)

> Support alter function in Hive DDL
> --
>
> Key: HIVE-26314
> URL: https://issues.apache.org/jira/browse/HIVE-26314
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hive SQL does not support {{*ALTER FUNCTION*}} yet, we can refer to the 
> {{*CREATE [OR REPLACE] FUNCTION*}} of 
> [Spark|https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-create-function.html]
>  to implement the alter function .
> {code:sql}
> CREATE [ TEMPORARY ] FUNCTION [ OR REPLACE ] [IF NOT EXISTS ]
>   [db_name.]function_name AS class_name
>   [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
> {code}
> * *OR REPLACE*
> If specified, the resources for the function are reloaded. This is mainly 
> useful to pick up any changes made to the implementation of the function. 
> This parameter is mutually exclusive to {{*IF NOT EXISTS*}} and can not be 
> specified together.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26331) Use maven-surefire-plugin version consistently in standalone-metastore modules

2022-06-16 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555152#comment-17555152
 ] 

Stamatis Zampetakis commented on HIVE-26331:


Hey [~slachiewicz] , I agree with you that we should find a better way of 
managing common parts from different modules. Nevertheless, such a change can 
have a big impact so it should be discussed under a dedicated Jira and 
evaluated carefully.

> Use maven-surefire-plugin version consistently in standalone-metastore modules
> --
>
> Key: HIVE-26331
> URL: https://issues.apache.org/jira/browse/HIVE-26331
> Project: Hive
>  Issue Type: Task
>  Components: Standalone Metastore, Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Due to some problems in the pom.xml files inside the standalone-metastore 
> modules we end up using different maven-surefire-plugin versions.
> Most of the modules use 3.0.0-M4, which is the expected one, while 
> the {{hive-standalone-metastore-common}} uses the older 2.22.0 version.
> +Actual+ 
> {noformat}
> [INFO] --- maven-surefire-plugin:2.22.0:test (default-test) @ 
> hive-standalone-metastore-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-server ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> metastore-tools-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore-benchmarks ---
> {noformat}
> The goal of this JIRA is to ensure we use the same version consistently in 
> all modules.
> +Expected+
> {noformat}
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-standalone-metastore-server ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> metastore-tools-common ---
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-metastore-benchmarks ---
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26314) Support alter function in Hive DDL

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26314?focusedWorklogId=782081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782081
 ]

ASF GitHub Bot logged work on HIVE-26314:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:30
Start Date: 16/Jun/22 15:30
Worklog Time Spent: 10m 
  Work Description: wecharyu commented on code in PR #3360:
URL: https://github.com/apache/hive/pull/3360#discussion_r899213697


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/function/create/CreateFunctionAnalyzer.java:
##
@@ -54,6 +54,14 @@ public CreateFunctionAnalyzer(QueryState queryState) throws 
SemanticException {
   public void analyzeInternal(ASTNode root) throws SemanticException {
 String functionName = root.getChild(0).getText().toLowerCase();
 boolean isTemporary = 
(root.getFirstChildWithType(HiveParser.TOK_TEMPORARY) != null);
+boolean replace = (root.getFirstChildWithType(HiveParser.TOK_ORREPLACE) != 
null);
+boolean ifNotExists = 
(root.getFirstChildWithType(HiveParser.TOK_IFNOTEXISTS) != null);
+if (ifNotExists && replace) {
+  throw new SemanticException("CREATE FUNCTION with both IF NOT EXISTS and 
REPLACE is not allowed.");

Review Comment:
   Both syntaxes look good to me, currently this syntax is just similar to 
Spark.





Issue Time Tracking
---

Worklog Id: (was: 782081)
Time Spent: 1h 10m  (was: 1h)

> Support alter function in Hive DDL
> --
>
> Key: HIVE-26314
> URL: https://issues.apache.org/jira/browse/HIVE-26314
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive SQL does not support {{*ALTER FUNCTION*}} yet, we can refer to the 
> {{*CREATE [OR REPLACE] FUNCTION*}} of 
> [Spark|https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-create-function.html]
>  to implement the alter function .
> {code:sql}
> CREATE [ TEMPORARY ] FUNCTION [ OR REPLACE ] [IF NOT EXISTS ]
>   [db_name.]function_name AS class_name
>   [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
> {code}
> * *OR REPLACE*
> If specified, the resources for the function are reloaded. This is mainly 
> useful to pick up any changes made to the implementation of the function. 
> This parameter is mutually exclusive to {{*IF NOT EXISTS*}} and can not be 
> specified together.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26314) Support alter function in Hive DDL

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26314?focusedWorklogId=782079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782079
 ]

ASF GitHub Bot logged work on HIVE-26314:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:22
Start Date: 16/Jun/22 15:22
Worklog Time Spent: 10m 
  Work Description: wecharyu commented on code in PR #3360:
URL: https://github.com/apache/hive/pull/3360#discussion_r899205805


##
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##
@@ -1613,10 +1613,10 @@ resourceType
 createFunctionStatement
 @init { pushMsg("create function statement", state); }
 @after { popMsg(state); }
-: KW_CREATE (temp=KW_TEMPORARY)? KW_FUNCTION functionIdentifier KW_AS 
StringLiteral
+: KW_CREATE (temp=KW_TEMPORARY)? KW_FUNCTION orReplace? ifNotExists? 
functionIdentifier KW_AS StringLiteral

Review Comment:
   @nrg4878 The reason I use "create function or replace" is that “create or 
replace" can only be used for one statement, I will get error like below when I 
use "create or replace function":
   ```bash
   org.apache.hadoop.hive.ql.parse.ParseException: line 3:18 mismatched input 
'function' expecting KW_VIEW near 'replace' in create view statement
   
   ```
   I am not sure if `antlr3` does not support this syntax in two statements, if 
you have any idea please let me know.





Issue Time Tracking
---

Worklog Id: (was: 782079)
Time Spent: 1h  (was: 50m)

> Support alter function in Hive DDL
> --
>
> Key: HIVE-26314
> URL: https://issues.apache.org/jira/browse/HIVE-26314
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive SQL does not support {{*ALTER FUNCTION*}} yet, we can refer to the 
> {{*CREATE [OR REPLACE] FUNCTION*}} of 
> [Spark|https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-create-function.html]
>  to implement the alter function .
> {code:sql}
> CREATE [ TEMPORARY ] FUNCTION [ OR REPLACE ] [IF NOT EXISTS ]
>   [db_name.]function_name AS class_name
>   [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
> {code}
> * *OR REPLACE*
> If specified, the resources for the function are reloaded. This is mainly 
> useful to pick up any changes made to the implementation of the function. 
> This parameter is mutually exclusive to {{*IF NOT EXISTS*}} and can not be 
> specified together.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26336) Hive JDBC Driver should respect JDBC DriverManager#loginTimeout

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26336:
--
Labels: pull-request-available  (was: )

> Hive JDBC Driver should respect JDBC DriverManager#loginTimeout
> ---
>
> Key: HIVE-26336
> URL: https://issues.apache.org/jira/browse/HIVE-26336
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 4.0.0-alpha-1
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Before HIVE-12371, the Hive JDBC Driver uses DriverManager#loginTimeout as 
> both connectTimeout and socketTimeout, which usually cause socket timeout 
> exceptions for users who use Hive JDBC Driver in Spring Boot project, because 
> Spring Boot will setLoginTimeout to 30s (default values).
> HIVE-12371 introduced a new parameter socketTimeout, and does not care about 
> DriverManager#loginTimeout anymore, I think it's not a correct solution.
> I think the for loginTimeout, prefer to use loginTimeout (in milliseconds) 
> from jdbc connection url, and fallback to use DriverManger#getLoginTimeout 
> (in seconds).
> For socketTimeout, use socketTimeout (in milliseconds) from jdbc connection 
> url if present.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26336) Hive JDBC Driver should respect JDBC DriverManager#loginTimeout

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26336?focusedWorklogId=782067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782067
 ]

ASF GitHub Bot logged work on HIVE-26336:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:00
Start Date: 16/Jun/22 15:00
Worklog Time Spent: 10m 
  Work Description: pan3793 opened a new pull request, #3379:
URL: https://github.com/apache/hive/pull/3379

   
   
   ### What changes were proposed in this pull request?
   
   Introduce a new JDBC parameter `loginTimeout` and respect 
DriverManger#getLoginTimeout if `loginTimeout` is absent.
   
   ### Why are the changes needed?
   
   Before [HIVE-12371](https://issues.apache.org/jira/browse/HIVE-12371), the 
Hive JDBC Driver uses DriverManager#loginTimeout as both connectTimeout and 
socketTimeout, which usually cause socket timeout exceptions for users who use 
Hive JDBC Driver in Spring Boot project, because Spring Boot will 
setLoginTimeout to 30s (default values).
   
   [HIVE-12371](https://issues.apache.org/jira/browse/HIVE-12371) introduced a 
new parameter socketTimeout, and does not care about DriverManager#loginTimeout 
anymore, I think it's not a correct solution.
   
   I think the for loginTimeout, prefer to use `loginTimeout` (in milliseconds) 
from jdbc connection url, and fallback to use `DriverManger#getLoginTimeout` 
(in seconds).
   For socketTimeout, use `socketTimeout` (in milliseconds) from jdbc 
connection url if present.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. New JDBC parameter `loginTimeout` is introduced.
   
   ### How was this patch tested?
   
   Existing UT (maybe).




Issue Time Tracking
---

Worklog Id: (was: 782067)
Remaining Estimate: 0h
Time Spent: 10m

> Hive JDBC Driver should respect JDBC DriverManager#loginTimeout
> ---
>
> Key: HIVE-26336
> URL: https://issues.apache.org/jira/browse/HIVE-26336
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 4.0.0-alpha-1
>Reporter: Cheng Pan
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Before HIVE-12371, the Hive JDBC Driver uses DriverManager#loginTimeout as 
> both connectTimeout and socketTimeout, which usually cause socket timeout 
> exceptions for users who use Hive JDBC Driver in Spring Boot project, because 
> Spring Boot will setLoginTimeout to 30s (default values).
> HIVE-12371 introduced a new parameter socketTimeout, and does not care about 
> DriverManager#loginTimeout anymore, I think it's not a correct solution.
> I think the for loginTimeout, prefer to use loginTimeout (in milliseconds) 
> from jdbc connection url, and fallback to use DriverManger#getLoginTimeout 
> (in seconds).
> For socketTimeout, use socketTimeout (in milliseconds) from jdbc connection 
> url if present.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-16 Thread Sourabh Badhya (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554033#comment-17554033
 ] 

Sourabh Badhya edited comment on HIVE-26324 at 6/16/22 12:22 PM:
-

Had an internal discussion and it was decided that MySQL must use the GENERATED 
columns and not triggers. Mainly because triggers are stored procedures and 
Hive must not initiate making use of triggers/stored procedures in the backend 
DB.


was (Author: JIRAUSER287127):
Had an internal discussion and it was decided that MySQL must also use the 
GENERATED columns and not triggers. Mainly because triggers are stored 
procedures and Hive must not initiate making use of triggers/stored procedures 
in the backend DB.

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-16 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya updated HIVE-26324:
--
Description: 
NOTIFICATION_SEQUENCE table must have only one row however there have been 
several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
to prevent this situation from happening, its best to enforce "one-row-table" 
like constraints on NOTIFICATION_SEQUENCE table.

Queries tried on supported databases - 
NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help us 
in adding "one-row-table" like constraints.
*MySQL* - 
{code:java}
ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) GENERATED 
ALWAYS AS (1) STORED NOT NULL;{code}
CHECK constraints are not effective in MySQL 5.7. Hence need to shift to using 
GENERATED columns. This is supported in MySQL 5.7.
Similarly for MariaDB which uses the same schema script as that of MySQL, 
Generated columns with syntax compatible with MySQL are supported from 10.2.
Link - 
[https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
Link - [https://mariadb.com/kb/en/generated-columns/]

*Postgres* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE "NOTIFICATION_SEQUENCE"
(
"NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
"NEXT_EVENT_ID" BIGINT NOT NULL,
PRIMARY KEY ("NNI_ID")
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE "NOTIFICATION_SEQUENCE"
ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
*Derby* - 
{code:java}
ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
CHECK (NNI_ID = 1); {code}
*Oracle* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE NOTIFICATION_SEQUENCE
(
NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
NEXT_EVENT_ID NUMBER NOT NULL
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}
*Microsoft SQL Server* - 
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}

  was:
NOTIFICATION_SEQUENCE table must have only one row however there have been 
several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
to prevent this situation from happening, its best to enforce "one-row-table" 
like constraints on NOTIFICATION_SEQUENCE table.

Queries tried on supported databases - 
NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help us 
in adding "one-row-table" like constraints.
*MySQL* - 
{code:java}
ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) GENERATED 
ALWAYS AS (1) STORED NOT NULL;{code}
CHECK constraints are not effective in MySQL 5.7. Hence need to shift to using 
GENERATED columns. This is supported in MySQL 5.7.
Similarly for MariaDB which uses the same schema script as that of MySQL, 
Generated columns with syntax compatible with MySQL are supported from 10.2.
Link - [https://dev.mysql.com/doc/refman/5.7/en/create-table.html]
Link - [https://mariadb.com/kb/en/constraint/]

*Postgres* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE "NOTIFICATION_SEQUENCE"
(
"NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
"NEXT_EVENT_ID" BIGINT NOT NULL,
PRIMARY KEY ("NNI_ID")
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE "NOTIFICATION_SEQUENCE"
ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
*Derby* - 
{code:java}
ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
CHECK (NNI_ID = 1); {code}
*Oracle* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE NOTIFICATION_SEQUENCE
(
NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
NEXT_EVENT_ID NUMBER NOT NULL
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}
*Microsoft SQL Server* - 
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}


> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like c

[jira] [Comment Edited] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-16 Thread Sourabh Badhya (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554033#comment-17554033
 ] 

Sourabh Badhya edited comment on HIVE-26324 at 6/16/22 12:00 PM:
-

Had an internal discussion and it was decided that MySQL must also use the 
GENERATED columns and not triggers. Mainly because triggers are stored 
procedures and Hive must not initiate making use of triggers/stored procedures 
in the backend DB.


was (Author: JIRAUSER287127):
Had an internal discussion and it was decided that MySQL must also use the 
CHECK constraint and not triggers even though it is not effective for MySQL 
5.7. Mainly because triggers are stored procedures and Hive must not initiate 
making use of triggers/stored procedures in the backend DB.

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-16 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya updated HIVE-26324:
--
Description: 
NOTIFICATION_SEQUENCE table must have only one row however there have been 
several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
to prevent this situation from happening, its best to enforce "one-row-table" 
like constraints on NOTIFICATION_SEQUENCE table.

Queries tried on supported databases - 
NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help us 
in adding "one-row-table" like constraints.
*MySQL* - 
{code:java}
ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) GENERATED 
ALWAYS AS (1) STORED NOT NULL;{code}
CHECK constraints are not effective in MySQL 5.7. Hence need to shift to using 
GENERATED columns. This is supported in MySQL 5.7.
Similarly for MariaDB which uses the same schema script as that of MySQL, 
Generated columns with syntax compatible with MySQL are supported from 10.2.
Link - [https://dev.mysql.com/doc/refman/5.7/en/create-table.html]
Link - [https://mariadb.com/kb/en/constraint/]

*Postgres* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE "NOTIFICATION_SEQUENCE"
(
"NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
"NEXT_EVENT_ID" BIGINT NOT NULL,
PRIMARY KEY ("NNI_ID")
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE "NOTIFICATION_SEQUENCE"
ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
*Derby* - 
{code:java}
ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
CHECK (NNI_ID = 1); {code}
*Oracle* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE NOTIFICATION_SEQUENCE
(
NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
NEXT_EVENT_ID NUMBER NOT NULL
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}
*Microsoft SQL Server* - 
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}

  was:
NOTIFICATION_SEQUENCE table must have only one row however there have been 
several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
to prevent this situation from happening, its best to enforce "one-row-table" 
like constraints on NOTIFICATION_SEQUENCE table.

Queries tried on supported databases - 
NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help us 
in adding "one-row-table" like constraints.
*MySQL* - 
{code:java}
ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) GENERATED 
ALWAYS AS (1) STORED NOT NULL;{code}
CHECK constraints are not effective in MySQL 5.7. Hence 
Similarly for MariaDB which uses the same schema script as that of MySQL, CHECK 
constraint is effective from 10.2.1.
Link - [https://dev.mysql.com/doc/refman/5.7/en/create-table.html]
Link - [https://mariadb.com/kb/en/constraint/]

*Postgres* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE "NOTIFICATION_SEQUENCE"
(
"NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
"NEXT_EVENT_ID" BIGINT NOT NULL,
PRIMARY KEY ("NNI_ID")
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE "NOTIFICATION_SEQUENCE"
ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
*Derby* - 
{code:java}
ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
CHECK (NNI_ID = 1); {code}
*Oracle* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE NOTIFICATION_SEQUENCE
(
NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
NEXT_EVENT_ID NUMBER NOT NULL
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}
*Microsoft SQL Server* - 
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}


> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as th

[jira] [Updated] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-16 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya updated HIVE-26324:
--
Description: 
NOTIFICATION_SEQUENCE table must have only one row however there have been 
several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
to prevent this situation from happening, its best to enforce "one-row-table" 
like constraints on NOTIFICATION_SEQUENCE table.

Queries tried on supported databases - 
NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help us 
in adding "one-row-table" like constraints.
*MySQL* - 
{code:java}
ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) GENERATED 
ALWAYS AS (1) STORED NOT NULL;{code}
CHECK constraints are not effective in MySQL 5.7. Hence 
Similarly for MariaDB which uses the same schema script as that of MySQL, CHECK 
constraint is effective from 10.2.1.
Link - [https://dev.mysql.com/doc/refman/5.7/en/create-table.html]
Link - [https://mariadb.com/kb/en/constraint/]

*Postgres* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE "NOTIFICATION_SEQUENCE"
(
"NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
"NEXT_EVENT_ID" BIGINT NOT NULL,
PRIMARY KEY ("NNI_ID")
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE "NOTIFICATION_SEQUENCE"
ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
*Derby* - 
{code:java}
ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
CHECK (NNI_ID = 1); {code}
*Oracle* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE NOTIFICATION_SEQUENCE
(
NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
NEXT_EVENT_ID NUMBER NOT NULL
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}
*Microsoft SQL Server* - 
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}

  was:
NOTIFICATION_SEQUENCE table must have only one row however there have been 
several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
to prevent this situation from happening, its best to enforce "one-row-table" 
like constraints on NOTIFICATION_SEQUENCE table.

Queries tried on supported databases - 
NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help us 
in adding "one-row-table" like constraints.
*MySQL* - 
{code:java}
ALTER TABLE `NOTIFICATION_SEQUENCE` ADD CONSTRAINT `ONE_ROW_CONSTRAINT` CHECK 
(`NNI_ID` = 1);{code}
CHECK constraints are not effective in MySQL 5.7. It is introduced in 8.0.16.
Similarly for MariaDB which uses the same schema script as that of MySQL, CHECK 
constraint is effective from 10.2.1.
Link - [https://dev.mysql.com/doc/refman/5.7/en/create-table.html]
Link - [https://mariadb.com/kb/en/constraint/]

*Postgres* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE "NOTIFICATION_SEQUENCE"
(
"NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
"NEXT_EVENT_ID" BIGINT NOT NULL,
PRIMARY KEY ("NNI_ID")
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE "NOTIFICATION_SEQUENCE"
ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
*Derby* - 
{code:java}
ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
CHECK (NNI_ID = 1); {code}
*Oracle* - 
Either change the definition of table like this - 
{code:java}
CREATE TABLE NOTIFICATION_SEQUENCE
(
NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
NEXT_EVENT_ID NUMBER NOT NULL
); {code}
OR add explicit constraints like this -
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}
*Microsoft SQL Server* - 
{code:java}
ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
(NNI_ID = 1); {code}


> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:jav

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782025
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 11:52
Start Date: 16/Jun/22 11:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r899000415


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergWriter writer = writers.get(output);
+  Collection dataFiles = Lists.newArrayList();
+  Collection deleteFiles = Lists.newArrayList();
   String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-  attemptID.getJobID(), attemptID.getTaskID().getId());
-  if (writer != null) {
-createFileForCommit(writer.files(), fileForCommitLocation, 
table.io());
-  } else {
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (writers.get(output) != null) {
+for (HiveIcebergWriter writer : writers.get(output)) {
+  if (writer != null) {
+dataFiles.addAll(writer.files().dataFiles());

Review Comment:
   I leave this decision to you 😄 
   I do not have a clear preference





Issue Time Tracking
---

Worklog Id: (was: 782025)
Time Spent: 1h 20m  (was: 1h 10m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26334) Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg tables

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26334:
--
Labels: pull-request-available  (was: )

> Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg 
> tables
> --
>
> Key: HIVE-26334
> URL: https://issues.apache.org/jira/browse/HIVE-26334
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The DESCRIBE FORMATTED output show this even for bucketed Iceberg tables:
> {code}
> Num Buckets:  0   NULL
> Bucket Columns:   []  NULL
> {code}
> We should remove them, and the user should rely on the information in the {{# 
> Partition Transform Information}} block instead



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26334) Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg tables

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26334?focusedWorklogId=782022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782022
 ]

ASF GitHub Bot logged work on HIVE-26334:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 11:42
Start Date: 16/Jun/22 11:42
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request, #3378:
URL: https://github.com/apache/hive/pull/3378

   ### What changes were proposed in this pull request?
   Remove the misleading data
   
   ### Why are the changes needed?
   So the users should see the correct info only
   
   ### Does this PR introduce _any_ user-facing change?
   Removes the data as shown in the test file changes
   
   ### How was this patch tested?
   Unit tests




Issue Time Tracking
---

Worklog Id: (was: 782022)
Remaining Estimate: 0h
Time Spent: 10m

> Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg 
> tables
> --
>
> Key: HIVE-26334
> URL: https://issues.apache.org/jira/browse/HIVE-26334
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The DESCRIBE FORMATTED output show this even for bucketed Iceberg tables:
> {code}
> Num Buckets:  0   NULL
> Bucket Columns:   []  NULL
> {code}
> We should remove them, and the user should rely on the information in the {{# 
> Partition Transform Information}} block instead



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-16 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555030#comment-17555030
 ] 

Zoltan Haindrich commented on HIVE-20607:
-

if it would have been on 3.1 - then it would have been released recentlybut 
as of now I don't know about any planned 3.x releases; I guess 4.0 will be next

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782016
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 11:11
Start Date: 16/Jun/22 11:11
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898969956


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergWriter writer = writers.get(output);
+  Collection dataFiles = Lists.newArrayList();
+  Collection deleteFiles = Lists.newArrayList();
   String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-  attemptID.getJobID(), attemptID.getTaskID().getId());
-  if (writer != null) {
-createFileForCommit(writer.files(), fileForCommitLocation, 
table.io());
-  } else {
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (writers.get(output) != null) {
+for (HiveIcebergWriter writer : writers.get(output)) {
+  if (writer != null) {
+dataFiles.addAll(writer.files().dataFiles());

Review Comment:
   I found this usage:
   
https://github.com/apache/hive/blob/67c2d4910ff17c694653eb8bd9c9ed2405cec38b/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergWriterBase.java#L59
   
   `HiveIcebergWriter` does not have `dataFiles()`, `deleteFiles()` methods and 
it can be a `HiveIcebergRecordWriter`, `HiveIcebergDeleteWriter` etc which 
treats data and delete files a different way.
   
   If we want to avoid creating the `FilesForCommit` object creation to replace 
`HiveIcebergWriter.files()`
   * create a method like `HiveIcebergWriter.collectFiles(List<> dataFiles, 
List<> deleteFiles)`
   or 
   * create dataFiles(), deleteFiles() methods. I prefer wrapping the returned 
lists into unmodifiableList which is also a new object creation.
   
   Which do you prefer? 
   
   On the other hand I don't think creating the `FilesForCommit` objects is 
critical. These are created only when a result of a statement should be 
committed/aborted not per record basis





Issue Time Tracking
---

Worklog Id: (was: 782016)
Time Spent: 1h 10m  (was: 1h)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-16 Thread Colm O hEigeartaigh (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555000#comment-17555000
 ] 

Colm O hEigeartaigh commented on HIVE-20607:


Thanks [~kgyrtkirk] , but as it's a security issue why wouldn't we just 
backport the fix to a supported release branch? Users are not going to switch 
to an alpha release in production.

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-16 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554996#comment-17554996
 ] 

Zoltan Haindrich commented on HIVE-20607:
-

This patch is on branch-3 via [this 
commit|https://github.com/apache/hive/commit/09b92d3c864b00df99923f03a843a8179bd874a0];
I don't think we have a 3.2.1 release - or even 3.2.0; I don't see any traces 
of that ; we also don't even have a branch-3.2 right now.

3.2.0 is an [unreleased 
version|https://issues.apache.org/jira/projects/HIVE/versions/12343559] - I 
would recommend to use 4.0.0-alpha-1 which contains this fix.

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-16 Thread Colm O hEigeartaigh (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554433#comment-17554433
 ] 

Colm O hEigeartaigh edited comment on HIVE-20607 at 6/16/22 9:42 AM:
-

Would it be possible to backport this fix to branch-3.1 as well? 

I don't see any evidence of it in the commit log for 
[https://github.com/apache/hive/commits/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java]

Maybe the fix could be backported in 
https://issues.apache.org/jira/browse/HIVE-22073, as security scanners are 
showing a vulnerability in Hive 3.1.2 due to this issue.


was (Author: coheigea):
Was this fix ever backported to branch-3? I don't see any evidence of it in the 
commit log for 
[https://github.com/apache/hive/commits/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java]

Therefore I think the fix-version of 3.2.0 on this ticket is incorrect. See: 
https://issues.apache.org/jira/browse/HIVE-22073

Maybe the fix could be backported in 
https://issues.apache.org/jira/browse/HIVE-22073, as security scanners are 
showing a vulnerability in Hive 3.2.1 due to this issue.

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26316) Handle dangling open txns on both src & tgt in unplanned failover.

2022-06-16 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26316.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the PR [~haymant]!

> Handle dangling open txns on both src & tgt in unplanned failover.
> --
>
> Key: HIVE-26316
> URL: https://issues.apache.org/jira/browse/HIVE-26316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26316) Handle dangling open txns on both src & tgt in unplanned failover.

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26316?focusedWorklogId=781985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781985
 ]

ASF GitHub Bot logged work on HIVE-26316:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:41
Start Date: 16/Jun/22 09:41
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #3367:
URL: https://github.com/apache/hive/pull/3367




Issue Time Tracking
---

Worklog Id: (was: 781985)
Time Spent: 2h  (was: 1h 50m)

> Handle dangling open txns on both src & tgt in unplanned failover.
> --
>
> Key: HIVE-26316
> URL: https://issues.apache.org/jira/browse/HIVE-26316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=781982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781982
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:31
Start Date: 16/Jun/22 09:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898890734


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -858,19 +862,24 @@ private static boolean 
hasParquetListColumnSupport(Properties tableProps, Schema
* @param overwrite If we have to overwrite the existing table or just add 
the new data
* @return The generated JobContext
*/
-  private Optional generateJobContext(Configuration configuration, 
String tableName, boolean overwrite) {
+  private Optional> generateJobContext(

Review Comment:
   nit: In Iceberg code we usually break lines like this:
   ```
 private Optional> generateJobContext(Configuration 
configuration, String tableName,
 boolean overwrite) {
   ```





Issue Time Tracking
---

Worklog Id: (was: 781982)
Time Spent: 1h  (was: 50m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=781980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781980
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:30
Start Date: 16/Jun/22 09:30
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898889313


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -411,23 +411,27 @@ public boolean commitInMoveTask() {
   public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
 String tableName = commitProperties.getProperty(Catalogs.NAME);
 Configuration configuration = SessionState.getSessionConf();
-Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
-if (jobContext.isPresent()) {
+Optional> jobContextList = 
generateJobContext(configuration, tableName, overwrite);
+if (!jobContextList.isPresent()) {
+  return;
+}
+
+for (JobContext jobContext : jobContextList.get()) {
   OutputCommitter committer = new HiveIcebergOutputCommitter();
   try {
-committer.commitJob(jobContext.get());
+committer.commitJob(jobContext);
   } catch (Throwable e) {
 // Aborting the job if the commit has failed
 LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
-jobContext.get().getJobID(), tableName, e);
+jobContext.getJobID(), tableName, e);
 try {
-  committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+  committer.abortJob(jobContext, JobStatus.State.FAILED);

Review Comment:
   Shall we abort all of the other jobs as well?
   What is our strategy here?





Issue Time Tracking
---

Worklog Id: (was: 781980)
Time Spent: 50m  (was: 40m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=781978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781978
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:28
Start Date: 16/Jun/22 09:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898887564


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -411,23 +411,27 @@ public boolean commitInMoveTask() {
   public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
 String tableName = commitProperties.getProperty(Catalogs.NAME);
 Configuration configuration = SessionState.getSessionConf();
-Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
-if (jobContext.isPresent()) {
+Optional> jobContextList = 
generateJobContext(configuration, tableName, overwrite);
+if (!jobContextList.isPresent()) {
+  return;
+}
+
+for (JobContext jobContext : jobContextList.get()) {
   OutputCommitter committer = new HiveIcebergOutputCommitter();
   try {
-committer.commitJob(jobContext.get());
+committer.commitJob(jobContext);
   } catch (Throwable e) {
 // Aborting the job if the commit has failed
 LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
-jobContext.get().getJobID(), tableName, e);
+jobContext.getJobID(), tableName, e);

Review Comment:
   please remove the extra spaces



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -411,23 +411,27 @@ public boolean commitInMoveTask() {
   public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
 String tableName = commitProperties.getProperty(Catalogs.NAME);
 Configuration configuration = SessionState.getSessionConf();
-Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
-if (jobContext.isPresent()) {
+Optional> jobContextList = 
generateJobContext(configuration, tableName, overwrite);
+if (!jobContextList.isPresent()) {
+  return;
+}
+
+for (JobContext jobContext : jobContextList.get()) {
   OutputCommitter committer = new HiveIcebergOutputCommitter();
   try {
-committer.commitJob(jobContext.get());
+committer.commitJob(jobContext);
   } catch (Throwable e) {
 // Aborting the job if the commit has failed
 LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
-jobContext.get().getJobID(), tableName, e);
+jobContext.getJobID(), tableName, e);
 try {
-  committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+  committer.abortJob(jobContext, JobStatus.State.FAILED);
 } catch (IOException ioe) {
   LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", ioe);
   // no throwing here because the original exception should be 
propagated
 }
 throw new HiveException(
-"Error committing job: " + jobContext.get().getJobID() + " for 
table: " + tableName, e);
+"Error committing job: " + jobContext.getJobID() + " for 
table: " + tableName, e);

Review Comment:
   please remove the extra spaces





Issue Time Tracking
---

Worklog Id: (was: 781978)
Time Spent: 40m  (was: 0.5h)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26295) Hive LB based on zookeeper occured some probability of connect failed

2022-06-16 Thread hansonhe (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554987#comment-17554987
 ] 

hansonhe commented on HIVE-26295:
-

@ [~dengzh] 
The following is the details with  adding _--verbose_ at the end of beeline 
command
22/06/16 17:25:23 INFO ZooKeeper: Session: 0x100c261eafc0081 closed
22/06/16 17:25:23 INFO ClientCnxn: EventThread shut down for session: 
0x100c261eafc0081
Error: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read 
HiveServer2 configs from ZooKeeper (state=,code=0)
java.sql.SQLException: org.apache.hive.jdbc.ZooKeeperHiveClientException: 
Unable to read HiveServer2 configs from ZooKeeper
        at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:170)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:208)
        at 
org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145)
        at 
org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209)
        at org.apache.hive.beeline.Commands.connect(Commands.java:1641)
        at org.apache.hive.beeline.Commands.connect(Commands.java:1536)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:56)
        at 
org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1384)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1423)
        at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:900)
        at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:795)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1048)
        at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538)
        at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read 
HiveServer2 configs from ZooKeeper
        at 
org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:147)
        at 
org.apache.hive.jdbc.Utils.configureConnParamsFromZooKeeper(Utils.java:511)
        at org.apache.hive.jdbc.Utils.parseURL(Utils.java:334)
        at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:168)
        ... 25 more
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read 
HiveServer2 uri from ZooKeeper: 
        at 
org.apache.hive.jdbc.ZooKeeperHiveClientHelper.updateParamsWithZKServerNode(ZooKeeperHiveClientHelper.java:125)
        at 
org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:145)
        ... 28 more
Beeline version 3.1.2 by Apache Hive

> Hive LB based on zookeeper occured some probability of connect failed
> -
>
> Key: HIVE-26295
> URL: https://issues.apache.org/jira/browse/HIVE-26295
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: hansonhe
>Priority: Major
>
> (1) I set LB properties  in hive-site.xml
> hive.server2.support.dynamic.service.discovery=true
> hive.server2.active.passive.ha.enable=false
>  (2) My hive production cluster info 
> hive  version: Apache 3.1.2
> hadoop version: Apache 3.1.4
> zookeeper version: Apache 3.5.9
> URL：  
> jdbc:hive2://host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
> my hive cluster have 3 hiveserver2 :  host1:1,host2:1,host3:1
> （3）After lots of test connect to use beeline -u '${URL}'  -n 'hive' -p '' 
>  , some probability of connect failured or connect success.When connect 
> success，it can connect anyone of 3 hiveserver2 randomly and successfully. 
> When connect failed，failed logs  as following :
> 22/06/07 11:14:59 INFO X509Util: Setting -D 
> jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated 
> TLS renegotiation
> 22/06/07 11:14:59 INFO ClientCnxnSocket: jute.maxb

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=781977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781977
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:27
Start Date: 16/Jun/22 09:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898886482


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -325,7 +339,15 @@ private void commitTable(FileIO io, ExecutorService 
executor, JobContext jobCont
 LOG.info("Committing job has started for table: {}, using location: {}",
 table, generateJobLocation(location, conf, jobContext.getJobID()));
 
-int numTasks = SessionStateUtil.getCommitInfo(conf, name).map(info -> 
info.getTaskNum()).orElseGet(() -> {
+Optional commitInfo;
+if (SessionStateUtil.getCommitInfo(conf, name).isPresent()) {
+  commitInfo = SessionStateUtil.getCommitInfo(conf, name).get()
+  .stream().filter(ci -> 
ci.getJobIdStr().equals(jobContext.getJobID().toString())).findFirst();

Review Comment:
   Are we sure that the first `commitInfo` has the same number of tasks as all 
of them?





Issue Time Tracking
---

Worklog Id: (was: 781977)
Time Spent: 0.5h  (was: 20m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=781973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781973
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:22
Start Date: 16/Jun/22 09:22
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898882678


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergWriter writer = writers.get(output);
+  Collection dataFiles = Lists.newArrayList();
+  Collection deleteFiles = Lists.newArrayList();
   String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-  attemptID.getJobID(), attemptID.getTaskID().getId());
-  if (writer != null) {
-createFileForCommit(writer.files(), fileForCommitLocation, 
table.io());
-  } else {
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (writers.get(output) != null) {
+for (HiveIcebergWriter writer : writers.get(output)) {
+  if (writer != null) {
+dataFiles.addAll(writer.files().dataFiles());

Review Comment:
   Is there any place where we use the `writer.files()` directly instead of 
calling `dataFiles()`, `deleteFiles()`? We might want to remove the unnecessary 
object creation then.





Issue Time Tracking
---

Worklog Id: (was: 781973)
Time Spent: 20m  (was: 10m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26326) Support enabling background threads when failover is in progress.

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26326?focusedWorklogId=781963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781963
 ]

ASF GitHub Bot logged work on HIVE-26326:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 08:46
Start Date: 16/Jun/22 08:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3376:
URL: https://github.com/apache/hive/pull/3376#issuecomment-1157399245

   Is the test failure related?
   If not, could you please check with the 
http://ci.hive.apache.org/job/hive-flaky-check/
   So we can disable it if it is really flaky?




Issue Time Tracking
---

Worklog Id: (was: 781963)
Time Spent: 20m  (was: 10m)

> Support enabling background threads when failover is in progress.
> -
>
> Key: HIVE-26326
> URL: https://issues.apache.org/jira/browse/HIVE-26326
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> repl.target.for doesn't allow background threads, expose a 
> {*}repl.backgroundthread{*}=enable: To force enable the background threads, 
> irrespective of repl.target.for, once B takes over as primary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?focusedWorklogId=781962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781962
 ]

ASF GitHub Bot logged work on HIVE-26324:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 08:45
Start Date: 16/Jun/22 08:45
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on PR #3369:
URL: https://github.com/apache/hive/pull/3369#issuecomment-1157398601

   @deniskuzZ Yes I agree. `NNI_ID` if used anywhere (which I don't think its 
used) is always `1`. No point in having that column.




Issue Time Tracking
---

Worklog Id: (was: 781962)
Time Spent: 0.5h  (was: 20m)

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` ADD CONSTRAINT `ONE_ROW_CONSTRAINT` CHECK 
> (`NNI_ID` = 1);{code}
> CHECK constraints are not effective in MySQL 5.7. It is introduced in 8.0.16.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> CHECK constraint is effective from 10.2.1.
> Link - [https://dev.mysql.com/doc/refman/5.7/en/create-table.html]
> Link - [https://mariadb.com/kb/en/constraint/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26326) Support enabling background threads when failover is in progress.

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26326:
--
Labels: pull-request-available  (was: )

> Support enabling background threads when failover is in progress.
> -
>
> Key: HIVE-26326
> URL: https://issues.apache.org/jira/browse/HIVE-26326
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> repl.target.for doesn't allow background threads, expose a 
> {*}repl.backgroundthread{*}=enable: To force enable the background threads, 
> irrespective of repl.target.for, once B takes over as primary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26326) Support enabling background threads when failover is in progress.

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26326?focusedWorklogId=781961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781961
 ]

ASF GitHub Bot logged work on HIVE-26326:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 08:44
Start Date: 16/Jun/22 08:44
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3376:
URL: https://github.com/apache/hive/pull/3376#discussion_r898847702


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java:
##
@@ -291,9 +291,17 @@ public static boolean isTargetOfReplication(Database db) {
 return dbParameters != null && 
!StringUtils.isEmpty(dbParameters.get(ReplConst.TARGET_OF_REPLICATION));
   }
 
+  public static boolean forceEnableBackgroundThreads(Database db) {

Review Comment:
   is this `isBackgroundTreadsEnabled`?
   The current name suggests for me that we set something forcefully, but the 
code shows that we are just checking a property





Issue Time Tracking
---

Worklog Id: (was: 781961)
Remaining Estimate: 0h
Time Spent: 10m

> Support enabling background threads when failover is in progress.
> -
>
> Key: HIVE-26326
> URL: https://issues.apache.org/jira/browse/HIVE-26326
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> repl.target.for doesn't allow background threads, expose a 
> {*}repl.backgroundthread{*}=enable: To force enable the background threads, 
> irrespective of repl.target.for, once B takes over as primary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26265) REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.

2022-06-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26265?focusedWorklogId=781954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781954
 ]

ASF GitHub Bot logged work on HIVE-26265:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 07:53
Start Date: 16/Jun/22 07:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3365:
URL: https://github.com/apache/hive/pull/3365#discussion_r898801022


##
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbortTxnHandler.java:
##
@@ -39,6 +48,19 @@ public void handle(Context withinContext) throws Exception {
 if (!ReplUtils.includeAcidTableInDump(withinContext.hiveConf)) {
   return;
 }
+
+ if (ReplUtils.filterTransactionOperations(withinContext.hiveConf)) {
+   String contextDbName = 
StringUtils.normalizeIdentifier(withinContext.replScope.getDbName());
+   GetTxnWriteIdsRequest request = new 
GetTxnWriteIdsRequest(eventMessage.getTxnId());
+   request.setDbName(contextDbName);
+   GetTxnWriteIdsResponse response = 
withinContext.db.getMSC().getTxnWriteIds(request);

Review Comment:
   Could we do this without calling the metastore?
   All of requested writeIds should be available during the compilation phase





Issue Time Tracking
---

Worklog Id: (was: 781954)
Time Spent: 0.5h  (was: 20m)

> REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.
> 
>
> Key: HIVE-26265
> URL: https://issues.apache.org/jira/browse/HIVE-26265
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: francis pang
>Assignee: francis pang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> REPL DUMP is replication all OpenXacts, even when they are from other non 
> replicated databases. This wastes space in the dump, and ends up opening 
> unneeded transactions during REPL LOAD.
>  
> Add a config property for replication that filters out OpenXact events during 
> REPL DUMP. During REPL LOAD, the txns can be implicitly opened when the 
> ALLOC_WRITE_ID is processed. For CommitTxn and AbortTxn, dump only if WRITE 
> ID was allocated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

44 matches

Mail list logo