[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-24 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830352#comment-17830352
 ] 

Peter Vary edited comment on HIVE-26882 at 3/25/24 6:06 AM:


If someone from the MySQL/MariaDB community could describe their take on 
REPEATABLE_READ, that would be nice. In the meantime we need to work around the 
RDBMS differences. [~lirui] has a PR (https://github.com/apache/hive/pull/5129) 
which still needs some changes and testing, but which would fix the issue.


was (Author: pvary):
If someone from the MySQL/MariaDB community could describe their take on 
REPEATABLE_READ, that would be nice. In the meantime we need to work around the 
RDBMS differences. [~lirui] has a PR (ttps://github.com/apache/hive/pull/5129) 
which still needs some changes and testing, but which would fix the issue.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-24 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830352#comment-17830352
 ] 

Peter Vary commented on HIVE-26882:
---

If someone from the MySQL/MariaDB community could describe their take on 
REPEATABLE_READ, that would be nice. In the meantime we need to work around the 
RDBMS differences. [~lirui] has a PR (ttps://github.com/apache/hive/pull/5129) 
which still needs some changes and testing, but which would fix the issue.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826830#comment-17826830
 ] 

Peter Vary edited comment on HIVE-26882 at 3/13/24 6:52 PM:


[~lirui]: You could try this:
{code}
query.executeUpdate()
{code}

https://github.com/apache/hive/blob/4b01a607091581ac9bdb372f8b47c1efca4d4bb4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdatePart.java#L587



was (Author: pvary):
[~lirui]: You could try this:
```
query.executeUpdate()
```
https://github.com/apache/hive/blob/4b01a607091581ac9bdb372f8b47c1efca4d4bb4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdatePart.java#L587


> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826830#comment-17826830
 ] 

Peter Vary commented on HIVE-26882:
---

[~lirui]: You could try this:
```
query.executeUpdate()
```
https://github.com/apache/hive/blob/4b01a607091581ac9bdb372f8b47c1efca4d4bb4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdatePart.java#L587


> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-12 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825627#comment-17825627
 ] 

Peter Vary commented on HIVE-26882:
---

{quote}
Yes, but that requires the direct SQL and JDO run in the same transaction, 
right? Otherwise the update will not be atomic. I'm not very familiar with JDO. 
Does PersistenceManager::newQuery guarantees the query shares the same 
transaction?{quote}

I think they should use the same connection/transaction. If I remember 
correctly we liberally mix direct SQL and JDO queries. 

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-11 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825536#comment-17825536
 ] 

Peter Vary commented on HIVE-26882:
---

{quote}The issue is each alter table operation updates more than just the 
metadata location. For example, when we change iceberg table schema, JDO will 
update both the iceberg metadata location, and the HMS storage descriptor. If 
we use direct SQL, then either we follow JDO to generate all the SQL 
statements, or we allow storage descriptor to be out of sync with iceberg 
metadata.
{quote}
If the first transaction updates the metadata location, then the second 
transaction will fails to update the metadata location, and the second 
transaction is rolled back. So I think the state will be consistent in this 
regard.
We might have a conflict with other transactions which do not update the 
metadata location, but that could happen anyways.
Do I miss something?

{quote}Not sure I understand the question. You can execute multiple update 
statements in the transaction and check the affected rows for each of them. In 
our PoC, we update current and previous metadata location, and leave all other 
fields out of sync.{quote}

I'm trying to suggest to use the direct SQL to update the metadata location 
only, and keep the other parts of the code intact. I think this would be enough 
to prevent concurrent updates of the table.

[~maswin]: Could you please help us try out the proposed solution with Oracle?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-09 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824979#comment-17824979
 ] 

Peter Vary edited comment on HIVE-26882 at 3/10/24 6:47 AM:


[~lirui]: Thanks for collecting the possibilities.
Also, I would like to highlight [~maswin]'s comment, that this is not working 
for Oracle either. So it was clearly an oversight from my side :(

First, I would like to revisit the direct SQL solution:
 - What do you see as an issue with that?
 - The API only allows a single checked property, would it be enough to check 
the change of that?
 - Would READ COMMITTED serialization level enough for this solution? We check 
and lock the field anyway, so at first glance, it should be enough.
 - Is this a general solution which would work on all of the supported 
databases?

If the solution above is not working, we might move forward with something like 
this:
{code:java}
DatabaseProduct dbProduct = jdbcResource.getDatabaseProduct();
if (dbProduct.dbType.MYSQL||  ORACLE) {
  connection = getConnection(Connection.TRANSACTION_SERIALIZABLE);
} else {
  connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
}
{code}
My mistake highlights the need to test this out on all of the supported 
databases.

Thanks for all the work you put into this [~lirui]!


was (Author: pvary):
[~lirui]: Thanks for collecting the possibilities.
I am not comfortable with the 2nd solution. Could we move forward with 
something like this:
{code}
DatabaseProduct dbProduct = jdbcResource.getDatabaseProduct();
if (dbProduct.dbType.MYSQL) {
  connection = getConnection(Connection.TRANSACTION_SERIALIZABLE);
} else {
  connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
}
{code}

Even better, if we could test it out for the different supported databases, and 
set the transaction insolation level correctly for all (I still kind of 
remember trying to find an Oracle 11 installation, but)

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-09 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824979#comment-17824979
 ] 

Peter Vary commented on HIVE-26882:
---

[~lirui]: Thanks for collecting the possibilities.
I am not comfortable with the 2nd solution. Could we move forward with 
something like this:
{code}
DatabaseProduct dbProduct = jdbcResource.getDatabaseProduct();
if (dbProduct.dbType.MYSQL) {
  connection = getConnection(Connection.TRANSACTION_SERIALIZABLE);
} else {
  connection = getConnection(Connection.TRANSACTION_REPEATABLE_READ);
}
{code}

Even better, if we could test it out for the different supported databases, and 
set the transaction insolation level correctly for all (I still kind of 
remember trying to find an Oracle 11 installation, but)

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-08 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824745#comment-17824745
 ] 

Peter Vary edited comment on HIVE-26882 at 3/8/24 11:23 AM:


Thanks for checking. Still not sure how this got through my testing before :(
Do we have working solution for MariaDB?
What are our options?


was (Author: pvary):
Thanks for checking. Still not sure how this got through my testing before :(
Do we have working solution for MariaDB?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-08 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824745#comment-17824745
 ] 

Peter Vary edited comment on HIVE-26882 at 3/8/24 11:23 AM:


Thanks for checking. Still not sure how this got through my testing before :(
Do we have working solution for MariaDB?


was (Author: pvary):
Do we have working solution for MariaDB?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-08 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824745#comment-17824745
 ] 

Peter Vary commented on HIVE-26882:
---

Do we have working solution for MariaDB?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824630#comment-17824630
 ] 

Peter Vary commented on HIVE-26882:
---

[~lirui]: I try to wreck my brain to remember how did I test this feature. 
Could you please try the supported mysql version? Maybe mysql has different 
semantics here than MariaDB?

[https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27362076#content/view/27362076]
{quote}MySQL 5.6.17 mysql 
Postgres 9.1.13 postgres
Oracle 11g oracle 
MS SQL Server 2008 R2 mssql
{quote}
 

I know it doesn't help you, but I am almost sure that I tried all of the 
supported databases.

 

Also, I would try to avoid using SERIALIZABLE transactions as it could 
seriously restrict the throughput of the HMS.

Maybe if we issue a select for update for a well defined row?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824483#comment-17824483
 ] 

Peter Vary commented on HIVE-26882:
---

Can we force the MariaDB \{REPEATABLE_READ} to prevent the concurrent writes?
Like running a manual query, like:
{code:java}
update tbl set val = 'v2' where `key` = 'k'; {code}
instead of the
{code:java}
update tbl set val = 'v2' where `key` = 'k' and val = 'v0'; {code}
In this case MariaDB might not consider it to be a phantom read anymore.

Or select the row specifically, like:
{code:java}
select * from tbl where `key` = 'k' and val = 'v0';{code}

My interpretation of phantom read should not allow the modification of a row 
once read, it is only that rows might "appear"/"disappear" which were not 
returned previously but added later.
Do we do \{update}, and not \{delete} for the old rows? (Maybe MariaDB does 
this behind the scenes, and considers this a phantom read?)

 

Whatever is my expectation (and independently of whether it is correct, or not 
:)), we have to think about the users, and try to make this work for MariaDB.

 

Anyway, here are my preferences:
 * Force MariaDB to fail (we can even issue "random" directsql queries to cause 
failures)
 * Use DB specific code to specify the serialization level used in this case
 * Allow the user to set the required serialization level

What do you think?

 

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-06 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824117#comment-17824117
 ] 

Peter Vary commented on HIVE-26882:
---

[~lirui]: Is the same true if we read the previous value before modifying? Hive 
does this. We read the table params and check the original values before 
modifying them in the same transaction.
In your example, with something like this:
{code:java}
txn1> select * from tbl where key = 'k';
txn1> update tbl set val = 'v1' where key = 'k';

txn2> select * from tbl where key = 'k';
txn2> update tbl set val = 'v2' where key = 'k';
{code}
In this case the transactions are not \{SERIALIZABLE}, even by the SQL 
definition. And I think the definition for \{REPEATABLE READ}

requirements are also would be broken 
([https://jepsen.io/consistency/models/repeatable-read):]
{quote}Repeatable read is closely related to serializability, but unlike 
serializable, it allows phantoms: if a transaction T1 reads a predicate, like 
"the set of all people with the name “Dikembe”, then another transaction T2 may 
create or modify a person with the name “Dikembe” before T1 commits. Individual 
objects are stable once read, but the predicate itself may not be.
{quote}
But even the linked article says that the ANSI definition is not clear:
{quote}However, as Berenson, Bernstein, et al observed, the ANSI specification 
allows multiple intepretations
{quote}

I expected that the isolation level definitions are clear, and it would prevent 
the situation you mentioned above. If not, then I fully agree with your comment 
that we need to fix/document this.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-01 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822714#comment-17822714
 ] 

Peter Vary commented on HIVE-26882:
---

I see this in the MariaDB doc:
https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users/#isolation-levels-and-locks


{quote}
MariaDB isolation levels differ from SQL Server in the following ways:

REPEATABLE READ does not acquire share locks on all read rows, nor a range lock 
on the missing values that match a WHERE clause.
{quote}

Which is not too encouraging. I would expect it to create a lock to prevent 
modifications.
When I was working on this I have used 2 clients to connect to the same server, 
manually started the transactions, and used queries and updates to check the 
actual behavior.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-01 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822711#comment-17822711
 ] 

Peter Vary commented on HIVE-26882:
---

Thanks for your interest in the change.
I have tested it with postgres e2e, so there might be some MariaDB related 
changes which might be needed.

That said, could you check the number of commit failures in HMS log, and the 
number of commit failures in your test harness? I would be interested in if 
they correlate. If the issue is on HMS/RDBMS side, then I expect the same 
number of failures in HMS side, than in the harness.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27850) Compaction for Iceberg tables

2023-11-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783580#comment-17783580
 ] 

Peter Vary commented on HIVE-27850:
---

Hi [~difin],

Thanks for reaching out!

Indeed, I am planning to work on Iceberg compaction on Flink jobs, so it is 
somewhat related. Our use-case is different, as we do not have an option to 
rewrite the whole table. Our goal is to do an incremental compaction of the 
freshly arrived new files, and maybe convert the equality delete files to 
positional delete files for easier read on merge operations. The full table 
rewrite would come as a side benefit, but the main goal would be to provide a 
less resource intensive compaction for the new (never before compacted) files.

I was thinking that maybe Hive would also benefit from refactoring out Spark 
related compaction code to some generic place, where Spark, Flink and Hive 
could reuse the compaction features already written by the Iceberg-Spark team.

Thanks,

Peter

> Compaction for Iceberg tables
> -
>
> Key: HIVE-27850
> URL: https://issues.apache.org/jira/browse/HIVE-27850
> Project: Hive
>  Issue Type: New Feature
>  Components: Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>
> Hive currently doesn't have the table compaction functionality. It would be 
> highly beneficial for performance to implement this feature because this 
> would create larger data files and eliminate positional delete files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-16 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677358#comment-17677358
 ] 

Peter Vary commented on HIVE-26882:
---

Thanks [~ayushtkn] for all the help!

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-15 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676999#comment-17676999
 ] 

Peter Vary edited comment on HIVE-26882 at 1/15/23 9:46 AM:


Thanks for the review [~ayushtkn]!

Merged all of them - as in all cases the code were able to compile and the new 
tests were passed. Had to merge the 4 commits (backported to have the testing 
infra at hand) as the github settings did not allow for the rebase merging 
strategy.

Is there a planned timeline for a release from any of these branches?

Thanks,

Petr


was (Author: pvary):
Merged all of them.
Thanks for the review [~ayushtkn]!

Is there a planned timeline for a release from any of these branches?

Thanks,

Petr

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-15 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676999#comment-17676999
 ] 

Peter Vary commented on HIVE-26882:
---

Merged all of them.
Thanks for the review [~ayushtkn]!

Is there a planned timeline for a release from any of these branches?

Thanks,

Petr

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-14 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676946#comment-17676946
 ] 

Peter Vary commented on HIVE-26882:
---

[~ayushtkn]: Could you please approve the PR then?

Did the backports for the other branches as well. If the Hive community is 
ready to accept them, please approve them too.

Thanks,

Peter

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676720#comment-17676720
 ] 

Peter Vary commented on HIVE-26882:
---

[~ayushtkn]: Thanks for the help!

I have created the PR, an run the tests.

There are 30 failure, but they are not related.

The tests I have created has been successful:
|[Testing / split-16 / Archive / 
testAlterTableExpectedPropertyConcurrent[Embedded]|http://ci.hive.apache.org/job/hive-precommit/job/PR-3943/1/testReport/org.apache.hadoop.hive.metastore.client/TestTablesCreateDropAlterTruncate/Testing___split_16___Archive___testAlterTableExpectedPropertyConcurrent_Embedded_]|
|[Testing / split-16 / Archive / 
testAlterTableExpectedPropertyConcurrent[Remote]|http://ci.hive.apache.org/job/hive-precommit/job/PR-3943/1/testReport/org.apache.hadoop.hive.metastore.client/TestTablesCreateDropAlterTruncate/Testing___split_16___Archive___testAlterTableExpectedPropertyConcurrent_Remote_]|
|[Testing / split-16 / Archive / 
testAlterTableExpectedPropertyDifferent[Embedded]|http://ci.hive.apache.org/job/hive-precommit/job/PR-3943/1/testReport/org.apache.hadoop.hive.metastore.client/TestTablesCreateDropAlterTruncate/Testing___split_16___Archive___testAlterTableExpectedPropertyDifferent_Embedded_]|
|[Testing / split-16 / Archive / 
testAlterTableExpectedPropertyDifferent[Remote]|http://ci.hive.apache.org/job/hive-precommit/job/PR-3943/1/testReport/org.apache.hadoop.hive.metastore.client/TestTablesCreateDropAlterTruncate/Testing___split_16___Archive___testAlterTableExpectedPropertyDifferent_Remote_]|
|[Testing / split-16 / Archive / 
testAlterTableExpectedPropertyMatch[Embedded]|http://ci.hive.apache.org/job/hive-precommit/job/PR-3943/1/testReport/org.apache.hadoop.hive.metastore.client/TestTablesCreateDropAlterTruncate/Testing___split_16___Archive___testAlterTableExpectedPropertyMatch_Embedded_]|
|[Testing / split-16 / Archive / 
testAlterTableExpectedPropertyMatch[Remote]|http://ci.hive.apache.org/job/hive-precommit/job/PR-3943/1/testReport/org.apache.hadoop.hive.metastore.client/TestTablesCreateDropAlterTruncate/Testing___split_16___Archive___testAlterTableExpectedPropertyMatch_Remote_]|

[~ayushtkn]: What should be our next steps?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-12 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676455#comment-17676455
 ] 

Peter Vary commented on HIVE-26882:
---

Hi [~ayushtkn]! Long time, no see. I hope things are going well for you!

Tried to build branch-3 on a new laptop and got this:
{code:java}
$ mvn clean install -DskipTests
[INFO] Scanning for projects...
[..]
[ERROR] Failed to execute goal on project hive-upgrade-acid: Could not resolve 
dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.2.0-SNAPSHOT: 
Failed to collect dependencies at org.apache.hive:hive-exec:jar:2.3.3 -> 
org.apache.calcite:calcite-core:jar:1.10.0 -> 
org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to read 
artifact descriptor for 
org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Could not transfer 
artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to 
maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: 
[datanucleus (http://www.datanucleus.org/downloads/maven2, default, releases), 
glassfish-repository (http://maven.glassfish.org/content/groups/glassfish, 
default, disabled), glassfish-repo-archive 
(http://maven.glassfish.org/content/groups/glassfish, default, disabled), 
apache.snapshots (http://repository.apache.org/snapshots, default, snapshots), 
central (http://repo.maven.apache.org/maven2, default, releases), conjars 
(http://conjars.org/repo, default, releases+snapshots)] -> [Help 1]
{code}
Seems like an issue with the removal of the pentaho jar from the repos. 
Something similar like [https://github.com/apache/hudi/issues/160]

To try to move forward, I removed the {{upgrade-acid}} from the pom to check if 
there are another errors. I used this change:
{code:java}
$ g d
diff --git a/pom.xml b/pom.xml
index e1bbb8193e..2cdc848d8a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -62,7 +62,6 @@
 testutils
 packaging
 standalone-metastore
-upgrade-acid
   
 
   
@@ -996,11 +995,6 @@
   slf4j-api
   ${slf4j.version}
 
-
-  org.apache.hive
-  hive-upgrade-acid
-  ${project.version}
-
 
   io.netty
   netty
{code}
And got this error:
{code:java}
$ mvn clean install -DskipTests
[INFO] Scanning for projects...
[..]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project hive-service-rpc: Compilation failure: Compilation failure: 
[ERROR] 
/Users/petervary/dev/hive/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetTableTypesReq.java:[32,24]
 package javax.annotation does not exist
[ERROR] 
/Users/petervary/dev/hive/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetTableTypesReq.java:[37,2]
 cannot find symbol
[ERROR]   symbol: class Generated
[ERROR] 
/Users/petervary/dev/hive/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TSessionHandle.java:[32,24]
 package javax.annotation does not exist
[ERROR] 
/Users/petervary/dev/hive/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TSessionHandle.java:[37,2]
 cannot find symbol
[ERROR]   symbol: class Generated
[ERROR] 
/Users/petervary/dev/hive/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/THandleIdentifier.java:[32,24]
 package javax.annotation does not exist
[ERROR] 
/Users/petervary/dev/hive/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/THandleIdentifier.java:[37,2]
 cannot find symbol
[ERROR]   symbol: class Generated
[..]
{code}
Could you please help me compile the branch-3?

Thanks,
Peter

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-08 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26882.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2023-01-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26882:
-

Assignee: Peter Vary

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26134) Remove Hive on Spark from the main branch

2022-12-11 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645938#comment-17645938
 ] 

Peter Vary commented on HIVE-26134:
---

That would be a breaking change. We do not allow breaking changes in minor 
releases 

> Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-25998) Build iceberg modules without a flag

2022-11-28 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25998:
-

Assignee: (was: Peter Vary)

> Build iceberg modules without a flag
> 
>
> Key: HIVE-25998
> URL: https://issues.apache.org/jira/browse/HIVE-25998
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We originally introduced a -Piceberg flag for building the iceberg modules.
> Since then the iceberg modules are stabilised and we would like to have a 
> release, we should remove the flag now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?

2022-10-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614360#comment-17614360
 ] 

Peter Vary commented on HIVE-26611:
---

[~yigress]: In my experience the HS2 WebUI is rarely used in production 
environments. 3rd party tools are used for audit logging and query history

> add HiveServer2 History Server?
> ---
>
> Key: HIVE-26611
> URL: https://issues.apache.org/jira/browse/HIVE-26611
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Yi Zhang
>Priority: Major
>
> HiveServer2 Web UI provides query profile and optional operation log, however 
> these are gone when hs2 server exits. 
> Was there discussion of add a hs2 history server before?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26355) Column compare should be case insensitive for name

2022-06-30 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26355.
---
Resolution: Fixed

Pushed to master. 
Thanks for the fix [~wechar]!

> Column compare should be case insensitive for name
> --
>
> Key: HIVE-26355
> URL: https://issues.apache.org/jira/browse/HIVE-26355
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive stores all name related value as lower case, such as db_name, tbl_name, 
> col_name etc. But the compare of {{FieldSchema}} does not ignore the case of 
> name, which may cause incorrect result of compare.
> *Bug Description:*
> Some computing engines are case sensitive for column name. For example, Spark 
> will add a table property to save the column fields when creating a table, 
> and will replace column fields with this property when fetching table fields.
> When calling {{*ALTER TABLE ... ADD COLUMNS*}}, the compare of fields between 
> old table and new table will be not expected, and the ADD COLUMNS operation 
> will be cascaded to PARTITIONS, which is unnecessary and time consuming if 
> the table has many partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26358) Querying metadata tables does not work for Iceberg tables using HADOOP_TABLE

2022-06-29 Thread Peter Vary (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Peter Vary resolved as Fixed  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Pushed to master. Thanks for the review László Pintér  
 

  
 
 
 
 

 
 Hive /  HIVE-26358  
 
 
  Querying metadata tables does not work for Iceberg tables using HADOOP_TABLE   
 

  
 
 
 
 

 
Change By: 
 Peter Vary  
 
 
Fix Version/s: 
 4.0.0  
 
 
Resolution: 
 Fixed  
 
 
Status: 
 Open Resolved  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Resolved] (HIVE-26354) Support expiring snapshots on iceberg table

2022-06-28 Thread Peter Vary (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Peter Vary resolved as Fixed  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Pushed to master. Thanks for the review László Pintér  
 

  
 
 
 
 

 
 Hive /  HIVE-26354  
 
 
  Support expiring snapshots on iceberg table   
 

  
 
 
 
 

 
Change By: 
 Peter Vary  
 
 
Fix Version/s: 
 4.0.0  
 
 
Resolution: 
 Fixed  
 
 
Status: 
 Open Resolved  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Resolved] (HIVE-26265) REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.

2022-06-28 Thread Peter Vary (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Peter Vary resolved as Fixed  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Pushed to master. Thanks for the PR francis pang!  
 

  
 
 
 
 

 
 Hive /  HIVE-26265  
 
 
  REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.   
 

  
 
 
 
 

 
Change By: 
 Peter Vary  
 
 
Fix Version/s: 
 4.0.0  
 
 
Resolution: 
 Fixed  
 
 
Status: 
 Open Resolved  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Updated] (HIVE-26358) Querying metadata tables does not work for Iceberg tables using HADOOP_TABLE

2022-06-27 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26358:
--
Description: 
The query returns data for the table instead of the metadata table

{code}
SELECT * FROM default.source.history;
{code}
Returns:
{code}
5 Alice Green_2
3 Alice Green_0
4 Alice Green_1
0 Alice Brown
1 Bob Green
2 Trudy Pink
6 Alice Green_3
{code}

  was:The query returns data for the table instead of the metadata table


> Querying metadata tables does not work for Iceberg tables using HADOOP_TABLE
> 
>
> Key: HIVE-26358
> URL: https://issues.apache.org/jira/browse/HIVE-26358
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The query returns data for the table instead of the metadata table
> {code}
> SELECT * FROM default.source.history;
> {code}
> Returns:
> {code}
> 5 Alice Green_2
> 3 Alice Green_0
> 4 Alice Green_1
> 0 Alice Brown
> 1 Bob Green
> 2 Trudy Pink
> 6 Alice Green_3
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-06-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25980.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the patch @crivani and @shameersss1 for the review!

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient

[jira] [Assigned] (HIVE-26334) Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg tables

2022-06-23 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26334:
-

Assignee: Peter Vary

> Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg 
> tables
> --
>
> Key: HIVE-26334
> URL: https://issues.apache.org/jira/browse/HIVE-26334
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The DESCRIBE FORMATTED output show this even for bucketed Iceberg tables:
> {code}
> Num Buckets:  0   NULL
> Bucket Columns:   []  NULL
> {code}
> We should remove them, and the user should rely on the information in the {{# 
> Partition Transform Information}} block instead



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26354) Support expiring snapshots on iceberg table

2022-06-23 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26354:
-

Assignee: Peter Vary

> Support expiring snapshots on iceberg table
> ---
>
> Key: HIVE-26354
> URL: https://issues.apache.org/jira/browse/HIVE-26354
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> It would be good to support expire snapshots for Iceberg tables.
> The syntax could be something like below:
> {code}
> ALTER TABLE test_table EXECUTE expire_snapshots('timestamp');
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26334) Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg tables

2022-06-22 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26334.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~lpinter]!

> Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg 
> tables
> --
>
> Key: HIVE-26334
> URL: https://issues.apache.org/jira/browse/HIVE-26334
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The DESCRIBE FORMATTED output show this even for bucketed Iceberg tables:
> {code}
> Num Buckets:  0   NULL
> Bucket Columns:   []  NULL
> {code}
> We should remove them, and the user should rely on the information in the {{# 
> Partition Transform Information}} block instead



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26316) Handle dangling open txns on both src & tgt in unplanned failover.

2022-06-16 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26316.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the PR [~haymant]!

> Handle dangling open txns on both src & tgt in unplanned failover.
> --
>
> Key: HIVE-26316
> URL: https://issues.apache.org/jira/browse/HIVE-26316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-25733) Add check-spelling CI action

2022-06-15 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25733.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Closing this jira, since the PR is merged.

Also I needed to add some addendum since this PR caused continuous failures.

https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45

> Add check-spelling CI action
> 
>
> Key: HIVE-25733
> URL: https://issues.apache.org/jira/browse/HIVE-25733
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Josh Soref
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add CI to catch spelling errors. See [https://www.check-spelling.dev/] for 
> information.
> Initially this will only check the {{serde}} directory, but the intention is 
> to expand its coverage as spelling errors in other directories are fixed.
> Note that for this to work the action should be made a required check, 
> otherwise when a typo is added forks from that commit will get complaints.
> If a typo is intentional, the action will provide information about how to 
> add it to {{expect.txt}} such that it will be accepted as an expected item 
> (i.e. not a typo).
> To skip a file/directory entirely, add a matching entry to 
> {{{}excludes.txt{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] (HIVE-26311) Incorrect content of array when IN operator is in the filter

2022-06-14 Thread Peter Vary (Jira)


[ https://issues.apache.org/jira/browse/HIVE-26311 ]


Peter Vary deleted comment on HIVE-26311:
---

was (Author: pvary):
I would guess that we have the same issue here - we are comparing arrays in 
Hive, and it is failing. We might have to do a similar exception here for the 
comparison.

> Incorrect content of array when IN operator is in the filter
> 
>
> Key: HIVE-26311
> URL: https://issues.apache.org/jira/browse/HIVE-26311
> Project: Hive
>  Issue Type: Bug
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: correctness
> Attachments: arrays.parq
>
>
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 and id = 5
> {code:java}
> +-+---+---+
> | id  |     arr1      |                 arr2                  |
> +-+---+---+
> | 5   | [10,null,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+---+---+{code}
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 *and id in (select id from functional_parquet.alltypestiny)* and id = 5;
> {code:java}
> +-+-+---+
> | id  |    arr1     |                 arr2                  |
> +-+-+---+
> | 5   | [10,10,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+-+---+ {code}
> Note, the first (and correct) example returns 10, null and 12 as the items of 
> an array while the second query for some reaon shows 10 instead of the null 
> value. The only difference between the 2 examples is that in the second I 
> added an extra filter (that in fact doesn't filter out anything as 
> functional_parquet.alltypestiny's ID contains numbers from zero to ten)
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26311) Incorrect content of array when IN operator is in the filter

2022-06-14 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554016#comment-17554016
 ] 

Peter Vary commented on HIVE-26311:
---

I would guess that we have the same issue here - we are comparing arrays in 
Hive, and it is failing. We might have to do a similar exception here for the 
comparison.

> Incorrect content of array when IN operator is in the filter
> 
>
> Key: HIVE-26311
> URL: https://issues.apache.org/jira/browse/HIVE-26311
> Project: Hive
>  Issue Type: Bug
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: correctness
> Attachments: arrays.parq
>
>
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 and id = 5
> {code:java}
> +-+---+---+
> | id  |     arr1      |                 arr2                  |
> +-+---+---+
> | 5   | [10,null,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+---+---+{code}
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 *and id in (select id from functional_parquet.alltypestiny)* and id = 5;
> {code:java}
> +-+-+---+
> | id  |    arr1     |                 arr2                  |
> +-+-+---+
> | 5   | [10,10,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+-+---+ {code}
> Note, the first (and correct) example returns 10, null and 12 as the items of 
> an array while the second query for some reaon shows 10 instead of the null 
> value. The only difference between the 2 examples is that in the second I 
> added an extra filter (that in fact doesn't filter out anything as 
> functional_parquet.alltypestiny's ID contains numbers from zero to ten)
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26307) Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads

2022-06-13 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26307.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~szita]!

> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects 
> just to store the location of the files. If we can avoid this, then we can 
> improve the performance, since the {{path.getFileSystem(conf)}} calls can 
> become costly, especially for S3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26311) Incorrect content of array when IN operator is in the filter

2022-06-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553572#comment-17553572
 ] 

Peter Vary commented on HIVE-26311:
---

[~gaborkaszab]: Would HIVE-26250 help?

> Incorrect content of array when IN operator is in the filter
> 
>
> Key: HIVE-26311
> URL: https://issues.apache.org/jira/browse/HIVE-26311
> Project: Hive
>  Issue Type: Bug
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: correctness
> Attachments: arrays.parq
>
>
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 and id = 5
> {code:java}
> +-+---+---+
> | id  |     arr1      |                 arr2                  |
> +-+---+---+
> | 5   | [10,null,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+---+---+{code}
> select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 
> 2 = 1 *and id in (select id from functional_parquet.alltypestiny)* and id = 5;
> {code:java}
> +-+-+---+
> | id  |    arr1     |                 arr2                  |
> +-+-+---+
> | 5   | [10,10,12]  | ["ten","eleven","twelve","thirteen"]  |
> +-+-+---+ {code}
> Note, the first (and correct) example returns 10, null and 12 as the items of 
> an array while the second query for some reaon shows 10 instead of the null 
> value. The only difference between the 2 examples is that in the second I 
> added an extra filter (that in fact doesn't filter out anything as 
> functional_parquet.alltypestiny's ID contains numbers from zero to ten)
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26301) Fix ACID tables bootstrap during reverse replication in unplanned failover.

2022-06-10 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26301.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the PR [~haymant]!

> Fix ACID tables bootstrap during reverse replication in unplanned failover.
> ---
>
> Key: HIVE-26301
> URL: https://issues.apache.org/jira/browse/HIVE-26301
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26307) Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads

2022-06-09 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26307:
-


> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects 
> just to store the location of the files. If we can avoid this, then we can 
> improve the performance, since the {{path.getFileSystem(conf)}} calls can 
> become costly, especially for S3



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26285) Overwrite database metadata on original source in optimised failover.

2022-06-09 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26285.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the PR [~haymant] and [~dkuzmenko] for the review!

> Overwrite database metadata on original source in optimised failover.
> -
>
> Key: HIVE-26285
> URL: https://issues.apache.org/jira/browse/HIVE-26285
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-06-08 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551963#comment-17551963
 ] 

Peter Vary commented on HIVE-25790:
---

[~haymant]: Could you please link the PR?

Thanks,
Peter 

> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-06-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25907.
---
Resolution: Fixed

Pushed to master.
Thanks for the fix [~srahman]!

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-22670) ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file

2022-05-31 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-22670:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the fix [~ganeshas] and [~achennagiri]!

> ArrayIndexOutOfBoundsException when vectorized reader is used for reading a 
> parquet file
> 
>
> Key: HIVE-22670
> URL: https://issues.apache.org/jira/browse/HIVE-22670
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, Vectorization
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22670.1.patch, HIVE-22670.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds 
> of a row group in parquet file with vectorization enabled. 
> *Exception stack trace:*
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>  at 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>  ... 24 more{code}
>  
> This issue seems to be caused by re-using the same dictionary column vector 
> while reading consecutive row groups. This looks like one of the corner case 
> bug which occurs for a certain distribution of dictionary/plain encoded data 
> while we read/populate the underlying bit packed dictionary data into a 
> column-vector based data structure. 
> Similar issue issue was reported in spark (Ref: 
> https://issues.apache.org/jira/browse/SPARK-16334)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26233.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks [~zabetak] for the review!

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26261) Fix some issues with Spark engine removal

2022-05-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26261.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~kgyrtkirk]!

> Fix some issues with Spark engine removal
> -
>
> Key: HIVE-26261
> URL: https://issues.apache.org/jira/browse/HIVE-26261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have made some mistakes when removed the Spark code:
>  * CommandAuthorizerV2.java - should check the properties. At that stage the 
> authorizer was referring tables created by Spark as a HMS client, and not as 
> an engine
>  * There is one unused method left in MapJoinTableContainerSerDe.java



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26235) OR Condition on binary column is returning empty result

2022-05-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26235.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~abstractdog]

> OR Condition on binary column is returning empty result
> ---
>
> Key: HIVE-26235
> URL: https://issues.apache.org/jira/browse/HIVE-26235
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repro steps
> {code:java}
> create table test_binary(data_col timestamp, binary_col binary) partitioned 
> by (ts string);
> insert into test_binary partition(ts='20220420') values ('2022-04-20 
> 00:00:00.0', 'a'),('2022-04-20 00:00:00.0', 'b'), ('2022-04-20 00:00:00.0', 
> 'c');
> // Works
> select * from test_binary where ts='20220420' and binary_col = 
> unhex('61');
> select * from test_binary where ts='20220420' and binary_col between 
> unhex('61') and unhex('62');
> //Returns empty result
> select * from test_binary where binary_col = unhex('61') or binary_col = 
> unhex('62');
> select * from test_binary where ts='20220420' and (binary_col = 
> unhex('61') or binary_col = unhex('62'));
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-05-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26136.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, but forgot to close the jira

> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-05-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26092.
---
Resolution: Fixed

Thanks [~slachiewicz]!

Forgot to close the jira.

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26258) Provide an option for enable locking of external tables

2022-05-24 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26258.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the review [~dkuzmenko]!

> Provide an option for enable locking of external tables
> ---
>
> Key: HIVE-26258
> URL: https://issues.apache.org/jira/browse/HIVE-26258
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During migration to Hive 3 the MANAGED tables are migrated to EXTERNAL tables.
> To provide backward compatibility we need an option to keep the original 
> strict locking for these external tables.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26261) Fix some issues with Spark engine removal

2022-05-24 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26261:
-


> Fix some issues with Spark engine removal
> -
>
> Key: HIVE-26261
> URL: https://issues.apache.org/jira/browse/HIVE-26261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> I have made some mistakes when removed the Spark code:
>  * CommandAuthorizerV2.java - should check the properties. At that stage the 
> authorizer was referring tables created by Spark as a HMS client, and not as 
> an engine
>  * There is one unused method left in MapJoinTableContainerSerDe.java



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26258) Provide an option for enable locking of external tables

2022-05-24 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26258:
--
Summary: Provide an option for enable locking of external tables  (was: 
Provide an option for strict locking of external tables)

> Provide an option for enable locking of external tables
> ---
>
> Key: HIVE-26258
> URL: https://issues.apache.org/jira/browse/HIVE-26258
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During migration to Hive 3 the MANAGED tables are migrated to EXTERNAL tables.
> To provide backward compatibility we need an option to keep the original 
> strict locking for these external tables.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-24 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541313#comment-17541313
 ] 

Peter Vary commented on HIVE-26233:
---

I checked the specific TS (-12-31-23:59:59.999) generated with Hive2.
The TS is converted to UTC and the resulting NanoTime has been written to 
Parquet. Since the local TZ was EST, the actual date written to Parquet is 
1-01-01-04:59:59.999.

Since the TS was written with old Parquet/Hive, we correctly want to use the 
legacy timestamp conversion:
{code}
if (legacyConversion) {
  try {
DateFormat formatter = getLegacyDateFormatter();
formatter.setTimeZone(TimeZone.getTimeZone(fromZone));
java.util.Date date = formatter.parse(ts.toSting());
// Set the formatter to use a different timezone
formatter.setTimeZone(TimeZone.getTimeZone(toZone));
Timestamp result = Timestamp.valueOf(formatter.format(date));
result.setNanos(ts.getNanos());
return result;
  } catch (ParseException e) {
throw new RuntimeException(e);
  }
}
{code}

This fails because to the {{toString()}} prints {{\+1-01-01 04:59:59.999}} 
- notice the {{\+}} sing at the beginning of the String. Then we try to parse 
it and we fail. So we have to either print it without the {{\+}} or parse it 
correctly. After my proposed changes we write it out without a {{\+}} sing, and 
we can read it back correctly. This way - after applying the conversion - we 
are able get back the original (-12-31-23:59:59.999) TS.

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26258) Provide an option for strict locking of external tables

2022-05-23 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26258:
-


> Provide an option for strict locking of external tables
> ---
>
> Key: HIVE-26258
> URL: https://issues.apache.org/jira/browse/HIVE-26258
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> During migration to Hive 3 the MANAGED tables are migrated to EXTERNAL tables.
> To provide backward compatibility we need an option to keep the original 
> strict locking for these external tables.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26255) Drop custom code for String deduplication inside URI class

2022-05-23 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540827#comment-17540827
 ] 

Peter Vary commented on HIVE-26255:
---

[~slachiewicz]: Which version of Java have you tried? We have a jira for 
upgrading to java 11: HIVE-22415

> Drop custom code for String deduplication inside URI class
> --
>
> Key: HIVE-26255
> URL: https://issues.apache.org/jira/browse/HIVE-26255
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sylwester Lachiewicz
>Priority: Minor
>
> We have class 
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java]
>  to do String.intern() inside the URI class via reflection.
> Instead, we can recommend using newer java and enabling String deduplication 
> https://localcoder.org/is-string-deduplication-feature-of-the-g1-garbage-collector-enabled-by-default
> -XX:+UseG1GC -XX:+UseStringDeduplication
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26255) Drop custom code for String deduplication inside URI class

2022-05-23 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540829#comment-17540829
 ] 

Peter Vary commented on HIVE-26255:
---

Maybe related to the Java upgrade

> Drop custom code for String deduplication inside URI class
> --
>
> Key: HIVE-26255
> URL: https://issues.apache.org/jira/browse/HIVE-26255
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sylwester Lachiewicz
>Priority: Minor
>
> We have class 
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java]
>  to do String.intern() inside the URI class via reflection.
> Instead, we can recommend using newer java and enabling String deduplication 
> https://localcoder.org/is-string-deduplication-feature-of-the-g1-garbage-collector-enabled-by-default
> -XX:+UseG1GC -XX:+UseStringDeduplication
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26112) Missing scripts for metastore

2022-05-23 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540812#comment-17540812
 ] 

Peter Vary commented on HIVE-26112:
---

[~achennagiri]: In theory it could, but the picture is much more clean/easier 
to follow if we stick to the same versions, and use empty upgrade files.

> Missing scripts for metastore
> -
>
> Key: HIVE-26112
> URL: https://issues.apache.org/jira/browse/HIVE-26112
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Blocker
> Fix For: 4.0.0-alpha-2
>
>
> The version of the scripts for _metastore_ and _standalone-metastore_ should 
> be in sync, but at the moment for the metastore side we are missing 3.2.0 
> scripts (in _metastore/scripts/upgrade/hive_), while they are present in the 
> standalone_metastore counterpart(s):
> * hive-schema-3.2.0.*.sql
> * upgrade-3.1.0-to-3.2.0.*.sql
> * upgrade-3.2.0-to-4.0.0-alpha-1.*.sql
> * upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.*.sql



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-22670) ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file

2022-05-20 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540279#comment-17540279
 ] 

Peter Vary commented on HIVE-22670:
---

[~achennagiri]: Sounds like a good plan!

> ArrayIndexOutOfBoundsException when vectorized reader is used for reading a 
> parquet file
> 
>
> Key: HIVE-22670
> URL: https://issues.apache.org/jira/browse/HIVE-22670
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, Vectorization
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-22670.1.patch, HIVE-22670.2.patch
>
>
> ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds 
> of a row group in parquet file with vectorization enabled. 
> *Exception stack trace:*
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>  at 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>  ... 24 more{code}
>  
> This issue seems to be caused by re-using the same dictionary column vector 
> while reading consecutive row groups. This looks like one of the corner case 
> bug which occurs for a certain distribution of dictionary/plain encoded data 
> while we read/populate the underlying bit packed dictionary data into a 
> column-vector based data structure. 
> Similar issue issue was reported in spark (Ref: 
> https://issues.apache.org/jira/browse/SPARK-16334)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26250) Fix GenericUDFIn for Binary type

2022-05-20 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26250.
---
Resolution: Duplicate

Found another jira

> Fix GenericUDFIn for Binary type
> 
>
> Key: HIVE-26250
> URL: https://issues.apache.org/jira/browse/HIVE-26250
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>
> When we use IN, or the query optimizer converts our OR statements to IN then 
> we get empty results.
> One example is:
> {code}
> create table test_binary(datet timestamp, dip binary);
> insert into test_binary values ('2022-04-20 00:00:00.0', 'a'),('2022-04-20 
> 00:00:00.0', 'b'), ('2022-04-20 00:00:00.0', 'c') ;
> select * from test_binary where dip = unhex('61') or dip = unhex('62') ; 
> --empty result
> select * from test_binary where dip = unhex('61'); -- correct result
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26235) OR Condition on binary column is returning empty result

2022-05-20 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26235:
-

Assignee: Peter Vary

> OR Condition on binary column is returning empty result
> ---
>
> Key: HIVE-26235
> URL: https://issues.apache.org/jira/browse/HIVE-26235
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Peter Vary
>Priority: Major
>
> Repro steps
> {code:java}
> create table test_binary(data_col timestamp, binary_col binary) partitioned 
> by (ts string);
> insert into test_binary partition(ts='20220420') values ('2022-04-20 
> 00:00:00.0', 'a'),('2022-04-20 00:00:00.0', 'b'), ('2022-04-20 00:00:00.0', 
> 'c');
> // Works
> select * from test_binary where ts='20220420' and binary_col = 
> unhex('61');
> select * from test_binary where ts='20220420' and binary_col between 
> unhex('61') and unhex('62');
> //Returns empty result
> select * from test_binary where binary_col = unhex('61') or binary_col = 
> unhex('62');
> select * from test_binary where ts='20220420' and (binary_col = 
> unhex('61') or binary_col = unhex('62'));
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-18 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538744#comment-17538744
 ] 

Peter Vary commented on HIVE-26233:
---

 They are getting an exception:
{code}
Caused by: java.text.ParseException: Unparseable date: "+1-01-01 
04:59:59.99"
at java.text.DateFormat.parse(DateFormat.java:366) ~[?:1.8.0_232]
at 
org.apache.hadoop.hive.common.type.TimestampTZUtil.convertTimestampToZone(TimestampTZUtil.java:180)
 
at 
org.apache.hadoop.hive.ql.io.parquet.timestamp.NanoTimeUtils.getTimestamp(NanoTimeUtils.java:122)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$9$2.convert(ETypeConverter.java:710)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$9$2.convert(ETypeConverter.java:692)
 
at 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.setDictionary(ETypeConverter.java:933)
at 
org.apache.parquet.column.impl.ColumnReaderBase.(ColumnReaderBase.java:385)
at 
org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:46)
at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) 
at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) 
at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
 
at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) 
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
 
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
 
at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:810)
 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:365)
 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576) 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) 
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:912) 
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) 
at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:476)
... 13 more
{code}

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-18 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538685#comment-17538685
 ] 

Peter Vary commented on HIVE-26233:
---

[~zabetak]: The bug did not cause problems with Hive 2.x, and the data is 
written out for several tables there. After migration to 3.x the user is not 
able to read the old data anymore. With this fix, they can at least read the 
data and rewrite the tables. Without this the table is not readable.

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-17 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26233:
-


> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26202) Refactor Iceberg Writers

2022-05-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26202.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~lpinter]!

> Refactor Iceberg Writers
> 
>
> Key: HIVE-26202
> URL: https://issues.apache.org/jira/browse/HIVE-26202
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26202) Refactor Iceberg Writers

2022-05-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26202:
--
Summary: Refactor Iceberg Writers  (was: Refactor Iceberg 
HiveFileWriterFactory)

> Refactor Iceberg Writers
> 
>
> Key: HIVE-26202
> URL: https://issues.apache.org/jira/browse/HIVE-26202
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26200) Add tests for Iceberg DELETE statements for every supported type

2022-05-05 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26200.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~lpinter]!

> Add tests for Iceberg DELETE statements for every supported type
> 
>
> Key: HIVE-26200
> URL: https://issues.apache.org/jira/browse/HIVE-26200
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be good to check if we are able to delete every supported column 
> type.
> I have found issues with updates, and I think it would be good to have 
> additional tests with delete as well (even though they are working correctly 
> ATM)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26183) Create delete writer for the UPDATE statemens

2022-05-04 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26183.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~szita] and [~Marton Bod]!

> Create delete writer for the UPDATE statemens
> -
>
> Key: HIVE-26183
> URL: https://issues.apache.org/jira/browse/HIVE-26183
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> During the investigation of the updates of partitioned table we had the 
> following issue:
> - Iceberg inserts are needed to be sorted by the new partition keys
> - Iceberg deletes are needed to be sorted by the old partition keys and 
> filenames
> This could contradict each other. OTOH Hive updates create a single query and 
> writes out the insert/delete record for ever row. This would mean plenty of 
> open writers.
> We might want to create something like a 
> https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java,
>  but we do not want to keep the whole rows in memory.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26193) Fix Iceberg partitioned tables null bucket handling

2022-05-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26193.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~szita] and [~Marton Bod]!

> Fix Iceberg partitioned tables null bucket handling
> ---
>
> Key: HIVE-26193
> URL: https://issues.apache.org/jira/browse/HIVE-26193
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Inserting null values into a partition column should write the rows into null 
> partitions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26114) Fix jdbc connection hiveserver2 using dfs command with prefix space will cause exception

2022-04-26 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528582#comment-17528582
 ] 

Peter Vary commented on HIVE-26114:
---

Pushed to master.
Thanks for the patch [~Zing]!

The fix will be in the next 4.0.0-x releases. If you need this in any of the 
other releases we need to backport this fix to the relevant branches, like 
branch-3, branch-3.1 etc.

> Fix jdbc connection hiveserver2 using dfs command with prefix space will 
> cause exception
> 
>
> Key: HIVE-26114
> URL: https://issues.apache.org/jira/browse/HIVE-26114
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.8, 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {code:java}
>         Connection con = 
> DriverManager.getConnection("jdbc:hive2://10.214.35.115:1/");
>         Statement stmt = con.createStatement();
>         // dfs command with prefix space or "\n"
>         ResultSet res = stmt.executeQuery(" dfs -ls /");
>         //ResultSet res = stmt.executeQuery("\ndfs -ls /"); {code}
> it will cause exception
> {code:java}
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while processing statement: null
>     at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>     at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>     at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
>     at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:375)
>     at com.ne.gdc.whitemane.shezm.TestJdbc.main(TestJdbc.java:30)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> processing statement: null
>     at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>     at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:118)
>     at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>     at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>     at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> But when I execute sql with prefix "\n" it works fine
> {code:java}
> ResultSet res = stmt.executeQuery("\n select 1"); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26114) Fix jdbc connection hiveserver2 using dfs command with prefix space will cause exception

2022-04-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26114:
--
Summary: Fix jdbc connection hiveserver2 using dfs command with prefix 
space will cause exception  (was: jdbc connection hivesrerver2 using dfs 
command with prefix space will cause exception)

> Fix jdbc connection hiveserver2 using dfs command with prefix space will 
> cause exception
> 
>
> Key: HIVE-26114
> URL: https://issues.apache.org/jira/browse/HIVE-26114
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.8, 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {code:java}
>         Connection con = 
> DriverManager.getConnection("jdbc:hive2://10.214.35.115:1/");
>         Statement stmt = con.createStatement();
>         // dfs command with prefix space or "\n"
>         ResultSet res = stmt.executeQuery(" dfs -ls /");
>         //ResultSet res = stmt.executeQuery("\ndfs -ls /"); {code}
> it will cause exception
> {code:java}
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while processing statement: null
>     at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>     at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>     at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
>     at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:375)
>     at com.ne.gdc.whitemane.shezm.TestJdbc.main(TestJdbc.java:30)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> processing statement: null
>     at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>     at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:118)
>     at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>     at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>     at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> But when I execute sql with prefix "\n" it works fine
> {code:java}
> ResultSet res = stmt.executeQuery("\n select 1"); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26114) Fix jdbc connection hiveserver2 using dfs command with prefix space will cause exception

2022-04-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26114.
---
Fix Version/s: 4.0.0-alpha-2
   (was: 3.2.0)
   Resolution: Fixed

> Fix jdbc connection hiveserver2 using dfs command with prefix space will 
> cause exception
> 
>
> Key: HIVE-26114
> URL: https://issues.apache.org/jira/browse/HIVE-26114
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.8, 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {code:java}
>         Connection con = 
> DriverManager.getConnection("jdbc:hive2://10.214.35.115:1/");
>         Statement stmt = con.createStatement();
>         // dfs command with prefix space or "\n"
>         ResultSet res = stmt.executeQuery(" dfs -ls /");
>         //ResultSet res = stmt.executeQuery("\ndfs -ls /"); {code}
> it will cause exception
> {code:java}
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while processing statement: null
>     at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>     at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>     at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
>     at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:375)
>     at com.ne.gdc.whitemane.shezm.TestJdbc.main(TestJdbc.java:30)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> processing statement: null
>     at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>     at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:118)
>     at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>     at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>     at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> But when I execute sql with prefix "\n" it works fine
> {code:java}
> ResultSet res = stmt.executeQuery("\n select 1"); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26134.
---
Resolution: Fixed

Pushed to master.
Thanks for the review [~kgyrtkirk]

> Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26171) HMSHandler get_all_tables method can not retrieve tables from remote database

2022-04-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26171.
---
Resolution: Fixed

Pushed to master.
Thanks for the patch [~zhangbutao]!

> HMSHandler get_all_tables method can not retrieve tables from remote database
> -
>
> Key: HIVE-26171
> URL: https://issues.apache.org/jira/browse/HIVE-26171
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> At present, get_all_tables  method in HMSHandler would not get table from 
> remote database. However, other component like presto and some jobs we 
> developed have used this api instead of _get_tables_ which could retrieve all 
> tables both native database and remote database .
> {code:java}
> // get_all_tables only can get tables from native database
> public List get_all_tables(final String dbname) throws MetaException 
> {{code}
> {code:java}
> // get_tables can get tables from both native and remote database
> public List get_tables(final String dbname, final String 
> pattern){code}
> I think we shoud fix get_all_tables to make it retrive tables from remote 
> database.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26150) OrcRawRecordMerger reads each row twice

2022-04-26 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527968#comment-17527968
 ] 

Peter Vary commented on HIVE-26150:
---

[~asolimando]: I am trying to find out if fixing this issue would cause 
performance improvements when reading tables where we have delete deltas 
present.

We have two ways to read the delete deltas:
- [SortMergedDeleteEventRegistry 
|https://github.com/apache/hive/blob/a29810ce97a726fc70aecb53ebd648c3237106c4/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1228]
- [ColumnizedDeleteEventRegistry 
|https://github.com/apache/hive/blob/a29810ce97a726fc70aecb53ebd648c3237106c4/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1383]

IIRC the ColumnizedDeleteEventRegistry creates its own readers and 
SortMergedDeleteEventRegistry uses OrcRawRecordMerger, so I would guess that 
the normal reads would be effected with this inefficiency when 
SortMergedDeleteEventRegistry is used, but I would like this to be confirmed.

Thanks,
Peter

> OrcRawRecordMerger reads each row twice
> ---
>
> Key: HIVE-26150
> URL: https://issues.apache.org/jira/browse/HIVE-26150
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> OrcRawRecordMerger reads each row twice, the issue does not surface since the 
> merger is only used with the parameter "collapseEvents" as true, which 
> filters out one of the two rows.
> collapseEvents true and false should produce the same result, since in 
> current acid implementation, each event has a distinct rowid, so two 
> identical rows cannot be there, this is the case only for the bug.
> In order to reproduce the issue, it is sufficient to set the second parameter 
> to false 
> [here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
>  and run tests in TestOrcRawRecordMerger and observe two tests failing:
> {code:bash}
> mvn test -Dtest=TestOrcRawRecordMerger -pl ql
> {code}
> {noformat}
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta:1332 Found 
> unexpected row: (0,ignore.1)
> [ERROR]   TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta:1208 Found 
> unexpected row: (0,ignore.1)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26162) Documentation upgrade

2022-04-25 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527431#comment-17527431
 ] 

Peter Vary commented on HIVE-26162:
---

[~florianc]: I am not aware of any such documentation.
OTOH, I think the possible serdeproperties are very much dependent on the serde 
used. So I would try to check the SerDe documentation if any, or create a new 
one if I do not find it based on the SerDe code

> Documentation upgrade
> -
>
> Key: HIVE-26162
> URL: https://issues.apache.org/jira/browse/HIVE-26162
> Project: Hive
>  Issue Type: Wish
>Reporter: Florian CASTELAIN
>Priority: Major
>
> Hello.
>  
> I have been looking for specific elements in the documentation, more 
> specifically, the list of serdeproperties.
> So I was looking for an exhaustive list of serdeproperties and I cannot find 
> one at all. 
> This is very surprising as one would expect a tool to describe all of its 
> features.
> Is it planned to create such a list ? If it already exists, where is it ? 
> Because the official docs do not contain it (or it is well hidden, thus you 
> should make it more accessible).
>  
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26150) OrcRawRecordMerger reads each row twice

2022-04-25 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527423#comment-17527423
 ] 

Peter Vary commented on HIVE-26150:
---

[~asolimando]: Does this happen during normal read of deleted deltas?

> OrcRawRecordMerger reads each row twice
> ---
>
> Key: HIVE-26150
> URL: https://issues.apache.org/jira/browse/HIVE-26150
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Major
>
> OrcRawRecordMerger reads each row twice, the issue does not surface since the 
> merger is only used with the parameter "collapseEvents" as true, which 
> filters out one of the two rows.
> collapseEvents true and false should produce the same result, since in 
> current acid implementation, each event has a distinct rowid, so two 
> identical rows cannot be there, this is the case only for the bug.
> In order to reproduce the issue, it is sufficient to set the second parameter 
> to false 
> [here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
>  and run tests in TestOrcRawRecordMerger and observe two tests failing:
> {code:bash}
> mvn test -Dtest=TestOrcRawRecordMerger -pl ql
> {code}
> {noformat}
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta:1332 Found 
> unexpected row: (0,ignore.1)
> [ERROR]   TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta:1208 Found 
> unexpected row: (0,ignore.1)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-23885) Remove Hive on Spark

2022-04-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23885.
---
Resolution: Duplicate

After the discussion on upstream we still see no ongoing development on the 
Hive On Spark engine, so we will remove it.

> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-23885) Remove Hive on Spark

2022-04-12 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521179#comment-17521179
 ] 

Peter Vary commented on HIVE-23885:
---

Made a mistake to create a new jira :(
HIVE-26134

> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-04-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26136:
-


> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26134:
--
Summary: Remove Hive on Spark from the main branch  (was: Remove Hive on 
Spark from the main branch branch)

> Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-11 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26093.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~zabetak]!

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc
>  @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> '/Users/pvary/dev/upstream/hive/target/site/apidocs' dir.
> {code}
> We should fix this by removing one of the above



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26124.
---
Resolution: Won't Fix

HBase 2 and Hadoop 3 is incompatible.
We might have to move forward to HBase 3 if it becomes available.

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518978#comment-17518978
 ] 

Peter Vary commented on HIVE-26124:
---

Talked to [~stoty], and he pointed out that he already did this exercise on 
HIVE-24473.

The short story is that HBase 2.x is compiled against Hadoop 2, and it could 
not be used for testing with any Hadoop 3 artifacts. The root cause is 
HBASE-22394 BTW.

Thanks [~stoty] for the pointers!

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518896#comment-17518896
 ] 

Peter Vary commented on HIVE-26124:
---

Now I back at the first step:
{code}
[ERROR] Please refer to 
/Users/pvary/dev/upstream/hive/hbase-handler/target/surefire-reports for the 
individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, 
[date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/pvary/dev/upstream/hive/hbase-handler 
&& 
/usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/bin/java
 -Xmx2048m -jar 
/Users/pvary/dev/upstream/hive/hbase-handler/target/surefire/surefirebooter1320893522602873596.jar
 /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire 
2022-04-07T15-55-06_090-jvmRun1 surefire4212888302150641194tmp 
surefire_04095119596947982877tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hive.hbase.TestHBaseQueries
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/pvary/dev/upstream/hive/hbase-handler 
&& 
/usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/bin/java
 -Xmx2048m -jar 
/Users/pvary/dev/upstream/hive/hbase-handler/target/surefire/surefirebooter1320893522602873596.jar
 /Users/pvary/dev/upstream/hive/hbase-handler/target/surefire 
2022-04-07T15-55-06_090-jvmRun1 surefire4212888302150641194tmp 
surefire_04095119596947982877tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hive.hbase.TestHBaseQueries
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:513)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:460)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:301)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:249)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1217)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1063)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:889)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:972)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:293)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:196)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.ja

[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518870#comment-17518870
 ] 

Peter Vary commented on HIVE-26124:
---

That would be nice.
There is some config changes in the test utils.

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518866#comment-17518866
 ] 

Peter Vary commented on HIVE-26124:
---

I think I am struggling with the same test failures on the PR.
{code}
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.(InetSocketAddress.java:224)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147)
{code}

I was expecting some issues, so I was trying to be conservative. If we can fix 
the issues, I would be happy to move as high as possible with the dependency

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26124:
-

Assignee: Peter Vary

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26124 started by Peter Vary.
-
> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26092:
-

Assignee: Peter Vary

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26092 started by Peter Vary.
-
> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26112) Missing scripts for metastore

2022-04-04 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516933#comment-17516933
 ] 

Peter Vary commented on HIVE-26112:
---

Yeah, I wanted to answer for your last comment, that you are absolutely right 
about the versions, so go ahead and create the scripts.

As for the BIT_VECTOR-s, I think they are the intentional omissions. It is not 
really that useful for the users to be able to access the internal byte-s (and 
maybe it is not working in a backend db independent way anyway)

> Missing scripts for metastore
> -
>
> Key: HIVE-26112
> URL: https://issues.apache.org/jira/browse/HIVE-26112
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Blocker
> Fix For: 4.0.0-alpha-2
>
>
> The version of the scripts for _metastore_ and _standalone-metastore_ should 
> be in sync, but at the moment for the metastore side we are missing 3.2.0 
> scripts (in _metastore/scripts/upgrade/hive_), while they are present in the 
> standalone_metastore counterpart(s):
> * hive-schema-3.2.0.*.sql
> * upgrade-3.1.0-to-3.2.0.*.sql
> * upgrade-3.2.0-to-4.0.0-alpha-1.*.sql
> * upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.*.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26112) Missing scripts for metastore

2022-04-04 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516820#comment-17516820
 ] 

Peter Vary commented on HIVE-26112:
---

metastore/scripts/upgrade/hive should contain only the hive sysdb upgrade 
scripts.
upgrade-3.2.0-to-4.0.0-alpha-1.derby.sql is derby HDM DB upgrade script.

It is perfectly legal :) to have different upgrade path-s for the sysdb, and 
for the HMS DB init scripts.

The goal is that the tables generated by the sysdb scrips should allow access 
to the data generated by the HMS DB scripts.
Also there might be some omission here and there, but it should be intentional 
and reserved for cases where we do not want to expose some internal data to the 
end users.

> Missing scripts for metastore
> -
>
> Key: HIVE-26112
> URL: https://issues.apache.org/jira/browse/HIVE-26112
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Priority: Blocker
> Fix For: 4.0.0-alpha-2
>
>
> The version of the scripts for _metastore_ and _standalone-metastore_ should 
> be in sync, but at the moment for the metastore side we are missing 3.2.0 
> scripts (in _metastore/scripts/upgrade/hive_), while they are present in the 
> standalone_metastore counterpart(s):
> * upgrade-3.1.0-to-3.2.0.derby.sql
> * upgrade-3.2.0-to-4.0.0-alpha-1.derby.sql
> * upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.hive.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26103) Port Iceberg fixes to the iceberg module

2022-04-04 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26103.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~Marton Bod]

> Port Iceberg fixes to the iceberg module
> 
>
> Key: HIVE-26103
> URL: https://issues.apache.org/jira/browse/HIVE-26103
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should synchronise the Iceberg hive-metastore, mr modules with the Hive 
> codebase



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26093:
-

Assignee: Peter Vary

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc
>  @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> '/Users/pvary/dev/upstream/hive/target/site/apidocs' dir.
> {code}
> We should fix this by removing one of the above



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26093:
--
Summary: Deduplicate org.apache.hadoop.hive.metastore.annotation 
package-info.java  (was: Duplicate org.apache.hadoop.hive.metastore.annotation 
package-info.java)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc
>  @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> '/Users/pvary/dev/upstream/hive/target/site/apidocs' dir.
> {code}
> We should fix this by removing one of the above



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   7   8   9   10   >