[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-05 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823671#comment-17823671
 ] 

Rui Li edited comment on HIVE-26882 at 3/5/24 3:50 PM:
---

I tested again with MariaDB and there're fewer commit conflicts in HMS log than 
in the test log. This is expected because iceberg checks for conflict itself 
before it calls {{{}alter_table{}}}. The number of conflicts triggered by HMS 
is the same as the number in HMS log.

I also tested with Postgres and the result is correct. I read the 
[doc|https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ]
 and I think it works because:
{quote}a repeatable read transaction cannot modify or lock rows changed by 
other transactions after the repeatable read transaction began
{quote}
But I suspect this is stricter than the ANSI SQL standard. I checked SQL:2011, 
and it says the following about {{SERIALIZABLE}} level:
{quote}The execution of concurrent SQL-transactions at transaction isolation 
level SERIALIZABLE is guaranteed to be serializable. A serializable execution 
is defined to be an execution of the operations of concurrently executing 
SQL-transactions that produces the same effect as some serial execution of 
those same SQL-transactions. A serial execution is one in which each 
SQL-transaction executes to completion before the next SQL-transaction begins.
{quote}
Suppose we have these two concurrent transactions trying to update the 
property. IIUC both transactions can commit and the result can be either {{v1}} 
or {{{}v2{}}}, even for {{SERIALIZABLE}} level.
{code:sql}
txn1> update tbl set val = 'v1' where key = 'k';

txn2> update tbl set val = 'v2' where key = 'k';
{code}
Maybe another solution is to use direct SQL and checks for the number of 
affected rows to detect conflict. We did a PoC for this and it provides correct 
results with MariaDB. The pseudo code is like this:
{code:java}
String key = ...;
String expectedVal = ...;
Table oldTable = ...;
Table newTable = ...;
Connection connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
try {
  Statement statement = connection.createStatement();
  if (!expectedVal.equals(oldTable.getParameters().get(key))) {
throw new MetaException("Table has been modified");
  }
  int affectedRows = statement.executeUpdate("UPDATE TABLE_PARAMS SET 
PARAM_VALUE = 'new_val' WHERE TBL_ID = ... AND PARAM_KEY = 'key' AND 
PARAM_VALUE = 'expected_val'");
  if (affectedRows != 1) {
throw new MetaException("Table has been modified");
  }
  connection.commit();
} catch (Throwable t) {
  connection.rollback();
  throw t;
} finally {
  connection.close();
}
{code}
A problem is each iceberg commit can modify multiple properties or even other 
table fields. So it can be difficult to generate all the SQLs manually. Not 
sure how (or whether possible) to do this with JDO.


was (Author: lirui):
I tested again with MariaDB and there're fewer commit conflicts in HMS log than 
in the test log. This is expected because iceberg checks for conflict itself 
before it calls {{{}alter_table{}}}. The number of conflicts triggered by HMS 
is the same as the number in HMS log.

I also tested with Postgres and the result is correct. I read the 
[doc|https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ]
 and I think it works because:
{quote}a repeatable read transaction cannot modify or lock rows changed by 
other transactions after the repeatable read transaction began
{quote}
But I suspect this is stricter than the ANSI SQL standard. I checked SQL:2011, 
and it says the following about {{SERIALIZABLE}} level:
{quote}The execution of concurrent SQL-transactions at transaction isolation 
level SERIALIZABLE is guaranteed to be serializable. A serializable execution 
is defined to be an execution of the operations of concurrently executing 
SQL-transactions that produces the same effect as some serial execution of 
those same SQL-transactions. A serial execution is one in which each 
SQL-transaction executes to completion before the next SQL-transaction begins.
{quote}
Suppose we have these two concurrent transactions trying to update the 
property. IIUC both transactions can commit and the result can be either {{v1}} 
or {{{}v2{}}}, even for {{SERIALIZABLE}} level.
{code:sql}
txn1> update tbl set val = 'v1' where key = 'k';

txn2> update tbl set val = 'v2' where key = 'k';
{code}
Maybe another solution is to use direct SQL and checks for the number of 
affected rows to detect conflict. We did a PoC for this and it also provides 
the correct results. The pseudo code is like this:
{code:java}
String key = ...;
String expectedVal = ...;
Table oldTable = ...;
Table newTable = ...;
Connection connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
try {
  Statement statement = connection.createStatement();
  if 

[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-05 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823678#comment-17823678
 ] 

Rui Li commented on HIVE-26882:
---

And if this feature has only been verified with Postgres, perhaps we should 
document that, or even throw an exception for other DBMS type?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-05 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823671#comment-17823671
 ] 

Rui Li edited comment on HIVE-26882 at 3/5/24 3:22 PM:
---

I tested again with MariaDB and there're fewer commit conflicts in HMS log than 
in the test log. This is expected because iceberg checks for conflict itself 
before it calls {{{}alter_table{}}}. The number of conflicts triggered by HMS 
is the same as the number in HMS log.

I also tested with Postgres and the result is correct. I read the 
[doc|https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ]
 and I think it works because:
{quote}a repeatable read transaction cannot modify or lock rows changed by 
other transactions after the repeatable read transaction began
{quote}
But I suspect this is stricter than the ANSI SQL standard. I checked SQL:2011, 
and it says the following about {{SERIALIZABLE}} level:
{quote}The execution of concurrent SQL-transactions at transaction isolation 
level SERIALIZABLE is guaranteed to be serializable. A serializable execution 
is defined to be an execution of the operations of concurrently executing 
SQL-transactions that produces the same effect as some serial execution of 
those same SQL-transactions. A serial execution is one in which each 
SQL-transaction executes to completion before the next SQL-transaction begins.
{quote}
Suppose we have these two concurrent transactions trying to update the 
property. IIUC both transactions can commit and the result can be either {{v1}} 
or {{{}v2{}}}, even for {{SERIALIZABLE}} level.
{code:sql}
txn1> update tbl set val = 'v1' where key = 'k';

txn2> update tbl set val = 'v2' where key = 'k';
{code}
Maybe another solution is to use direct SQL and checks for the number of 
affected rows to detect conflict. We did a PoC for this and it also provides 
the correct results. The pseudo code is like this:
{code:java}
String key = ...;
String expectedVal = ...;
Table oldTable = ...;
Table newTable = ...;
Connection connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
try {
  Statement statement = connection.createStatement();
  if (!expectedVal.equals(oldTable.getParameters().get(key))) {
throw new MetaException("Table has been modified");
  }
  int affectedRows = statement.executeUpdate("UPDATE TABLE_PARAMS SET 
PARAM_VALUE = 'new_val' WHERE TBL_ID = ... AND PARAM_KEY = 'key' AND 
PARAM_VALUE = 'expected_val'");
  if (affectedRows != 1) {
throw new MetaException("Table has been modified");
  }
  connection.commit();
} catch (Throwable t) {
  connection.rollback();
  throw t;
} finally {
  connection.close();
}
{code}
A problem is each iceberg commit can modify multiple properties or even other 
table fields. So it can be difficult to generate all the SQLs manually. Not 
sure how (or whether possible) to do this with JDO.


was (Author: lirui):
I tested again with MariaDB and there're fewer commit conflicts in HMS log than 
in the test log. This is expected because iceberg checks for conflict itself 
before it calls {{alter_table}}. The number of conflicts triggered by HMS is 
the same as the number in HMS log.

I also tested with Postgres and the result is correct. I read the 
[doc|https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ]
 and I think it works because:
bq. a repeatable read transaction cannot modify or lock rows changed by other 
transactions after the repeatable read transaction began
But I suspect this is stricter than the ANSI SQL standard. I checked SQL:2011, 
and it says the following about {{SERIALIZABLE}} level:
bq. The execution of concurrent SQL-transactions at transaction isolation level 
SERIALIZABLE is guaranteed to be serializable. A serializable execution is 
defined to be an execution of the operations of concurrently executing 
SQL-transactions that produces the same effect as some serial execution of 
those same SQL-transactions. A serial execution is one in which each 
SQL-transaction executes to completion before the next SQL-transaction begins.
Suppose we have these two concurrent transactions trying to update the 
property. IIUC the result can be either {{v1}} or {{v2}}, even for 
{{SERIALIZABLE}} level.
{code:SQL}
txn1> update tbl set val = 'v1' where key = 'k';

txn2> update tbl set val = 'v2' where key = 'k';
{code}

Maybe another solution is to use direct SQL and checks for the number of 
affected rows to detect conflict. We did a PoC for this and it also provides 
the correct results. The pseudo code is like this:
{code:Java}
String key = ...;
String expectedVal = ...;
Table oldTable = ...;
Table newTable = ...;
Connection connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
try {
  Statement statement = connection.createStatement();
  if (!expectedVal.equals(oldTable.getParameters().get(key))) {
throw new MetaException("Table has been 

[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-05 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823671#comment-17823671
 ] 

Rui Li commented on HIVE-26882:
---

I tested again with MariaDB and there're fewer commit conflicts in HMS log than 
in the test log. This is expected because iceberg checks for conflict itself 
before it calls {{alter_table}}. The number of conflicts triggered by HMS is 
the same as the number in HMS log.

I also tested with Postgres and the result is correct. I read the 
[doc|https://www.postgresql.org/docs/14/transaction-iso.html#XACT-REPEATABLE-READ]
 and I think it works because:
bq. a repeatable read transaction cannot modify or lock rows changed by other 
transactions after the repeatable read transaction began
But I suspect this is stricter than the ANSI SQL standard. I checked SQL:2011, 
and it says the following about {{SERIALIZABLE}} level:
bq. The execution of concurrent SQL-transactions at transaction isolation level 
SERIALIZABLE is guaranteed to be serializable. A serializable execution is 
defined to be an execution of the operations of concurrently executing 
SQL-transactions that produces the same effect as some serial execution of 
those same SQL-transactions. A serial execution is one in which each 
SQL-transaction executes to completion before the next SQL-transaction begins.
Suppose we have these two concurrent transactions trying to update the 
property. IIUC the result can be either {{v1}} or {{v2}}, even for 
{{SERIALIZABLE}} level.
{code:SQL}
txn1> update tbl set val = 'v1' where key = 'k';

txn2> update tbl set val = 'v2' where key = 'k';
{code}

Maybe another solution is to use direct SQL and checks for the number of 
affected rows to detect conflict. We did a PoC for this and it also provides 
the correct results. The pseudo code is like this:
{code:Java}
String key = ...;
String expectedVal = ...;
Table oldTable = ...;
Table newTable = ...;
Connection connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
try {
  Statement statement = connection.createStatement();
  if (!expectedVal.equals(oldTable.getParameters().get(key))) {
throw new MetaException("Table has been modified");
  }
  int affectedRows = statement.executeUpdate("UPDATE TABLE_PARAMS SET 
PARAM_VALUE = 'new_val' WHERE TBL_ID = ... AND PARAM_KEY = 'key' AND 
PARAM_VALUE = 'expected_val'");
  if (affectedRows != 1) {
throw new MetaException("Table has been modified");
  }
  connection.commit();
} catch (Throwable t) {
  connection.rollback();
  throw t;
} finally {
  connection.close();
}
{code}
A problem is each iceberg commit can modify multiple properties or even other 
table fields. So it can be difficult to generate all the SQLs manually. Not 
sure how (or whether possible) to do this with JDO.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27928) Update development version in a master branch

2024-03-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27928:
--
Fix Version/s: 4.1.0
   (was: 4.0.0)

> Update development version in a master branch
> -
>
> Key: HIVE-27928
> URL: https://issues.apache.org/jira/browse/HIVE-27928
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27928) Update development version in a master branch

2024-03-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27928.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Update development version in a master branch
> -
>
> Key: HIVE-27928
> URL: https://issues.apache.org/jira/browse/HIVE-27928
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28076) Selecting data from a bucketed table with decimal column type throwing NPE.

2024-03-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28076:
--
Fix Version/s: 4.0.0
   (was: 4.1.0)

> Selecting data from a bucketed table with decimal column type throwing NPE.
> ---
>
> Key: HIVE-28076
> URL: https://issues.apache.org/jira/browse/HIVE-28076
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> selecting data from a bucketed table with decimal bucket column type throwing 
> NPE.
> Steps to reproduce:
> {noformat}
> create table bucket_table(id decimal(38,0), name string) clustered by(id) 
> into 3 buckets;
> insert into bucket_table values(5999640711, 'Cloud');
> select * from bucket_table bt where id = 5999640711;{noformat}
> HS2 log contains NPE:
> {noformat}
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCodeMurmur(ObjectInspectorUtils.java:889)
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketHashCode(ObjectInspectorUtils.java:805)
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketNumber(ObjectInspectorUtils.java:638)
>at 
> org.apache.hadoop.hive.ql.optimizer.FixedBucketPruningOptimizer$BucketBitsetGenerator.generatePredicate(FixedBucketPruningOptimizer.java:225)
>at 
> org.apache.hadoop.hive.ql.optimizer.PrunerOperatorFactory$FilterPruner.process(PrunerOperatorFactory.java:87)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>at 
> org.apache.hadoop.hive.ql.optimizer.PrunerUtils.walkOperatorTree(PrunerUtils.java:84)
>at 
> org.apache.hadoop.hive.ql.optimizer.FixedBucketPruningOptimizer.transform(FixedBucketPruningOptimizer.java:331)
>at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:249)
>at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12995)
>at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:303)
>at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)
>at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:621)
>at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:567)
>at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561)
>at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:204)
>at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:130)
>at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:429)
>at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:360)
>at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:857)
>at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:827)
>at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:191)
>at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  

[jira] [Updated] (HIVE-28073) Upgrade jackson version to 2.16.1

2024-03-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28073:
--
Fix Version/s: 4.0.0
   (was: 4.1.0)

> Upgrade jackson version to 2.16.1
> -
>
> Key: HIVE-28073
> URL: https://issues.apache.org/jira/browse/HIVE-28073
> Project: Hive
>  Issue Type: Bug
>Reporter: Araika Singh
>Assignee: Araika Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Jackson-databind through 2.15.2 allows attackers to cause a denial of service 
> or other unspecified impact via a crafted object that uses cyclic 
> dependencies.
> [https://nvd.nist.gov/vuln/detail/CVE-2023-35116]
> https://github.com/FasterXML/jackson-databind/issues/3972



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28102) Iceberg: Invoke validateDataFilesExist for RowDelta operations

2024-03-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28102:
--
Fix Version/s: 4.0.0
   (was: 4.1.0)

> Iceberg: Invoke validateDataFilesExist for RowDelta operations
> --
>
> Key: HIVE-28102
> URL: https://issues.apache.org/jira/browse/HIVE-28102
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg, pull-request-available
> Fix For: 4.0.0
>
>
> Hive must invoke validateDataFilesExist for RowDelta operations 
> (DELETE/UPDATE/MERGE).
> Without this a concurrent RewriteFiles (compaction) and RowDelta can corrupt 
> a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28037) Qtest running with postgresql DB fails with "Database does not exist: default"

2024-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28037:
--
Labels: pull-request-available  (was: )

> Qtest running with postgresql DB fails with "Database does not exist: default"
> --
>
> Key: HIVE-28037
> URL: https://issues.apache.org/jira/browse/HIVE-28037
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
>
> Running qtest with postgresql :
> mvn test -Pitests -pl itests/qtest -Dtest.metastore.db=postgres 
> -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile="acid_insert_overwrite_update.q,acid_stats3.q"
> fails:
> org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: 
> default
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:1867)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:1856)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:14167)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12907)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13076)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28105) Beeline: Show current database in prompt

2024-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28105:
--
Labels: pull-request-available  (was: )

> Beeline: Show current database in prompt
> 
>
> Key: HIVE-28105
> URL: https://issues.apache.org/jira/browse/HIVE-28105
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: tanishqchugh
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
>
> currently, we have:
> {code:java}
> 0: jdbc:hive2://link> use dbs;
> 0: jdbc:hive2://link>
> {code}
> instead, we can have something like:
> {code:java}
> 0: jdbc:hive2://link> use dbs;
> 0: jdbc:hive2://link (dbs)>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28106) Parallel select queries are failing on external tables with FNF due to staging directory

2024-03-05 Thread Taraka Rama Rao Lethavadla (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823578#comment-17823578
 ] 

Taraka Rama Rao Lethavadla commented on HIVE-28106:
---

Seems like some code refactoring made as part of 
https://issues.apache.org/jira/browse/HIVE-24581 seems to have caused this 
behaviour. But not able to reproduce this problem be it in cluster or using 
junit test cases

> Parallel select queries are failing on external tables with FNF due to 
> staging directory
> 
>
> Key: HIVE-28106
> URL: https://issues.apache.org/jira/browse/HIVE-28106
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> The issue reported here is similar to that of HIVE-26481
> But here it is happening between simultaneous queries on external tables.
> Query1:
>  
> {noformat}
> 2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: 
> [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory 
> if it doesn't exist: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
> ..
> ..
> 2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: 
> [HiveServer2-Background-Pool: Thread-416]: Executing 
> command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): 
> SELECT COUNT(*) FROM database.tbl WHERE  IS NULL OR =''
> ..
> ..
> 2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: 
> [HiveServer2-Background-Pool: Thread-416]: Completed executing 
> command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); 
> Time taken: 11.548 seconds
> {noformat}
> This query got completed and deleted the respective staging directory.
> {noformat}
> 2024-02-27 09:42:54,565 DEBUG hive.ql.Context: 
> [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001
>  
> ..   
> ..
> 2024-02-27 09:42:54,566 DEBUG hive.ql.Context: 
> [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch 
> dir: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
>   {noformat}
>  Query 2 started to execute at the same time on the same table
> {noformat}
> 2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: 
> [HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, 
> sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, 
> applicationId=application_1708662665640_1222, dagName=SELECT ABS((( - 
> ... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
> callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:
>  }  {noformat}
> Tez AM logs (syslog_dag_1708662665640_1222_1)
>  
> {noformat}
> 2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] 
> |app.DAGAppMaster|: Running DAG: SELECT ABS((( - ...  (Stage-1), 
> callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
> callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:
>  } 
> .. 
> ..
> 2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: 
> Adding 1 inputs; the first input is 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
> ..
> ..
> 2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: 
> Generating splits for dirs: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
> ..
> ..
> 2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] 
> |tez.HiveSplitGenerator|: The preferred split size is 33554432
> ..
> ..
> 2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: 
> Adding 1 inputs; the first input is 
> hdfs://namespace/data/eisds/apps/qlys/final/history/tbl/partition_year=2023/partition_month=12/partition_date=2023-12-30
> ..
> ..
> 2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 
> 111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: 
> "/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20"
>  startAfter: "" needLocation: true}  {noformat}
> And the query failed since that directory got removed at the same time
> {noformat}
> 2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Vertex Input: tbl initializer failed, 
> vertex=vertex_1708662665640_1222_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.RuntimeException: ORC split generation failed with exception: 
> 

[jira] [Updated] (HIVE-28105) Beeline: Show current database in prompt

2024-03-05 Thread tanishqchugh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tanishqchugh updated HIVE-28105:

Description: 
currently, we have:
{code:java}
0: jdbc:hive2://link> use dbs;
0: jdbc:hive2://link>
{code}
instead, we can have something like:
{code:java}
0: jdbc:hive2://link> use dbs;
0: jdbc:hive2://link (dbs)>
{code}

  was:
currently, we have:
{code:java}
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team> use tpcds_partitioned_orc_100;
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team>
{code}
instead, we can have something like:
{code:java}
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team> use tpcds_partitioned_orc_100;
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team (tpcds_partitioned_orc_100)>
{code}


> Beeline: Show current database in prompt
> 
>
> Key: HIVE-28105
> URL: https://issues.apache.org/jira/browse/HIVE-28105
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: tanishqchugh
>Assignee: tanishqchugh
>Priority: Major
>
> currently, we have:
> {code:java}
> 0: jdbc:hive2://link> use dbs;
> 0: jdbc:hive2://link>
> {code}
> instead, we can have something like:
> {code:java}
> 0: jdbc:hive2://link> use dbs;
> 0: jdbc:hive2://link (dbs)>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28106) Parallel select queries are failing on external tables with FNF due to staging directory

2024-03-05 Thread Taraka Rama Rao Lethavadla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taraka Rama Rao Lethavadla updated HIVE-28106:
--
Description: 
The issue reported here is similar to that of HIVE-26481

But here it is happening between simultaneous queries on external tables.

Query1:

 
{noformat}
2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory if 
it doesn't exist: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
..
..
2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-416]: Executing 
command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): 
SELECT COUNT(*) FROM database.tbl WHERE  IS NULL OR =''
..
..
2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-416]: Completed executing 
command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); 
Time taken: 11.548 seconds
{noformat}
This query got completed and deleted the respective staging directory.
{noformat}
2024-02-27 09:42:54,565 DEBUG hive.ql.Context: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001
 
..   
..
2024-02-27 09:42:54,566 DEBUG hive.ql.Context: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch dir: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
  {noformat}
 Query 2 started to execute at the same time on the same table
{noformat}
2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: 
[HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, 
sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, 
applicationId=application_1708662665640_1222, dagName=SELECT ABS((( - 
... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User: 
}  {noformat}
Tez AM logs (syslog_dag_1708662665640_1222_1)
 
{noformat}
2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] 
|app.DAGAppMaster|: Running DAG: SELECT ABS((( - ...  (Stage-1), 
callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User: 
} 
.. 
..
2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: Adding 
1 inputs; the first input is 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
..
..
2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: 
Generating splits for dirs: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
..
..
2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] |tez.HiveSplitGenerator|: 
The preferred split size is 33554432
..
..
2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: Adding 
1 inputs; the first input is 
hdfs://namespace/data/eisds/apps/qlys/final/history/tbl/partition_year=2023/partition_month=12/partition_date=2023-12-30
..
..
2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 
111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: 
"/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20"
 startAfter: "" needLocation: true}  {noformat}
And the query failed since that directory got removed at the same time
{noformat}
2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Vertex Input: tbl initializer failed, 
vertex=vertex_1708662665640_1222_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
java.lang.RuntimeException: ORC split generation failed with exception: 
java.io.FileNotFoundException: File 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
 does not exist.
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188)
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171)
    at java.util.concurrent.Executors.call(Executors.java:511)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at 

[jira] [Created] (HIVE-28106) Parallel select queries are failing on external tables with FNF due to staging directory

2024-03-05 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-28106:
-

 Summary: Parallel select queries are failing on external tables 
with FNF due to staging directory
 Key: HIVE-28106
 URL: https://issues.apache.org/jira/browse/HIVE-28106
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Taraka Rama Rao Lethavadla


The issue reported here is similar to that of HIVE-26481

But here it is happening between simultaneous queries on external tables.

Query1:

 
{noformat}
2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory if 
it doesn't exist: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
..
..
2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-416]: Executing 
command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): 
SELECT COUNT(*) FROM database.tbl WHERE  IS NULL OR =''
..
..
2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-416]: Completed executing 
command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); 
Time taken: 11.548 seconds
{noformat}
This query got completed and deleted the respective staging directory.
{noformat}
2024-02-27 09:42:54,565 DEBUG hive.ql.Context: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001
 
..   
..
2024-02-27 09:42:54,566 DEBUG hive.ql.Context: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch dir: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
  {noformat}
 Query 2 started to execute at the same time on the same table
{noformat}
2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: 
[HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, 
sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, 
applicationId=application_1708662665640_1222, dagName=SELECT ABS((( - 
... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User: 
}  {noformat}
Tez AM logs (syslog_dag_1708662665640_1222_1)
 
{noformat}
2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] 
|app.DAGAppMaster|: Running DAG: SELECT ABS((( - ...  (Stage-1), 
callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User: 
} 
.. 
..
2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: Adding 
1 inputs; the first input is 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
..
..
2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: 
Generating splits for dirs: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
..
..
2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] |tez.HiveSplitGenerator|: 
The preferred split size is 33554432
..
..
2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: Adding 
1 inputs; the first input is 
hdfs://namespace/data/eisds/apps/qlys/final/history/qualys_authentication/partition_year=2023/partition_month=12/partition_date=2023-12-30
..
..
2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 
111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: 
"/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20"
 startAfter: "" needLocation: true}  {noformat}
And the query failed since that directory got removed at the same time
{noformat}
2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Vertex Input: qualys_authentication initializer failed, 
vertex=vertex_1708662665640_1222_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
java.lang.RuntimeException: ORC split generation failed with exception: 
java.io.FileNotFoundException: File 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
 does not exist.
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188)
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171)
    at java.util.concurrent.Executors.call(Executors.java:511)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 

[jira] [Created] (HIVE-28105) Beeline: Show current database in prompt

2024-03-05 Thread tanishqchugh (Jira)
tanishqchugh created HIVE-28105:
---

 Summary: Beeline: Show current database in prompt
 Key: HIVE-28105
 URL: https://issues.apache.org/jira/browse/HIVE-28105
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: tanishqchugh
Assignee: tanishqchugh


currently, we have:
{code:java}
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team> use tpcds_partitioned_orc_100;
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team>
{code}
instead, we can have something like:
{code:java}
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team> use tpcds_partitioned_orc_100;
0: jdbc:hive2://hs2-lbodor-aws-bug.dw-dw-team (tpcds_partitioned_orc_100)>
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28076) Selecting data from a bucketed table with decimal column type throwing NPE.

2024-03-05 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28076.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> Selecting data from a bucketed table with decimal column type throwing NPE.
> ---
>
> Key: HIVE-28076
> URL: https://issues.apache.org/jira/browse/HIVE-28076
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> selecting data from a bucketed table with decimal bucket column type throwing 
> NPE.
> Steps to reproduce:
> {noformat}
> create table bucket_table(id decimal(38,0), name string) clustered by(id) 
> into 3 buckets;
> insert into bucket_table values(5999640711, 'Cloud');
> select * from bucket_table bt where id = 5999640711;{noformat}
> HS2 log contains NPE:
> {noformat}
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCodeMurmur(ObjectInspectorUtils.java:889)
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketHashCode(ObjectInspectorUtils.java:805)
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketNumber(ObjectInspectorUtils.java:638)
>at 
> org.apache.hadoop.hive.ql.optimizer.FixedBucketPruningOptimizer$BucketBitsetGenerator.generatePredicate(FixedBucketPruningOptimizer.java:225)
>at 
> org.apache.hadoop.hive.ql.optimizer.PrunerOperatorFactory$FilterPruner.process(PrunerOperatorFactory.java:87)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>at 
> org.apache.hadoop.hive.ql.optimizer.PrunerUtils.walkOperatorTree(PrunerUtils.java:84)
>at 
> org.apache.hadoop.hive.ql.optimizer.FixedBucketPruningOptimizer.transform(FixedBucketPruningOptimizer.java:331)
>at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:249)
>at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12995)
>at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:303)
>at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)
>at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:621)
>at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:567)
>at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561)
>at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:204)
>at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:130)
>at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:429)
>at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:360)
>at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:857)
>at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:827)
>at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:191)
>at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>at 
>