[jira] [Created] (HIVE-25132) ReadDatabase event should return HiveOperationType as ShowDatabases
Sai Hemanth Gantasala created HIVE-25132: Summary: ReadDatabase event should return HiveOperationType as ShowDatabases Key: HIVE-25132 URL: https://issues.apache.org/jira/browse/HIVE-25132 Project: Hive Issue Type: Bug Reporter: Sai Hemanth Gantasala Assignee: Sai Hemanth Gantasala Currently ReadDatabaseEvent should return a HivePrivilegeObject with HiveOperationType as ShowDatabases instead of Query. This is useful if we have a default policy in ranger that grants access to all databases -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25131) PreAlterPartitionEvent should have table owner details that can authorized in ranger/sentry
Sai Hemanth Gantasala created HIVE-25131: Summary: PreAlterPartitionEvent should have table owner details that can authorized in ranger/sentry Key: HIVE-25131 URL: https://issues.apache.org/jira/browse/HIVE-25131 Project: Hive Issue Type: Bug Reporter: Sai Hemanth Gantasala Assignee: Sai Hemanth Gantasala PreAlterPartition event should have a table object, so that the call can be authorized in ranger/sentry using the owner details of the table object. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark
Kishen Das created HIVE-25130: - Summary: alter table concat gives NullPointerException, when data is inserted from Spark Key: HIVE-25130 URL: https://issues.apache.org/jira/browse/HIVE-25130 Project: Hive Issue Type: Bug Reporter: Kishen Das This is the complete stack trace of the NullPointerException 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333) at org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966) at org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907) at org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892) at org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797) at org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674) at org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544) at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304) at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798) at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335) at org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129) at org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63) at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift
Stamatis Zampetakis created HIVE-25129: -- Summary: Wrong results when timestamps stored in Avro/Parquet fall into the DST shift Key: HIVE-25129 URL: https://issues.apache.org/jira/browse/HIVE-25129 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.0 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Timestamp values falling into the daylight savings time of the system timezone cannot be retrieved as is when those are stored in Parquet/Avro tables. The respective SELECT query shifts those timestamps by +1 reflecting the DST shift. +Example+ {code:sql} --! qt:timezone:US/Pacific create table employee (eid int, birthdate timestamp) stored as parquet; insert into employee values (0, '2019-03-10 02:00:00'); insert into employee values (1, '2020-03-08 02:00:00'); insert into employee values (2, '2021-03-14 02:00:00'); select eid, birthdate from employee order by eid;{code} +Actual results+ |0|2019-03-10 03:00:00| |1|2020-03-08 03:00:00| |2|2021-03-14 03:00:00| +Expected results+ |0|2019-03-10 02:00:00| |1|2020-03-08 02:00:00| |2|2021-03-14 02:00:00| Storing and retrieving values in columns using the [timestamp data type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types] (equivalent with LocalDateTime java API) should not alter at any way the value that the user is seeing. The results are correct for {{TEXTFILE}} and {{ORC}} tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog
David Mollitor created HIVE-25128: - Summary: Remove Thrift Exceptions From RawStore alterCatalog Key: HIVE-25128 URL: https://issues.apache.org/jira/browse/HIVE-25128 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Fwd: Hive HMS RawStore-ObjectStore Design
And here is a JIRA for continued discussion: https://issues.apache.org/jira/browse/HIVE-25126 -- Forwarded message - From: David Date: Mon, May 17, 2021 at 1:42 PM Subject: Hive HMS RawStore-ObjectStore Design To: dev Hello Gang, I just wanted to put out a few thoughts for anyone interested in the Metastore, and in particular, the connection handling. As I understand it, client requests from the Thrift server come into Hive via the HMSHandler class. This class lists all of the services (RPCs) that the Hive Metastore provides. This class's methods do some amount of validation, listener notification, but it ultimately calls one or more RawStore/ObjectStore methods to interact with the database. This entire orchestration needs some work to make this code more easy to work with and to improve error handling. What I propose is: 1// Remove Thrift Errors from RawStore Remove all references to NoSuchObjectException/InvalidOperationException/MetaException from the method signature of RawStore. These Exceptions are generated by Thrift and are used to communicate error conditions across the wire. They are not designed for use as part of the underlying stack, yet in Hive, they have been pushed down into these data access operators. The RawStore should not have to be this tightly coupled to the transport layer. My preference here would be to remove all checked Exceptions from RawStore in favor of runtime exceptions. This is a popular format and is used (and therefore dovetails nicely) with the underlying database access library DataNucleaus. All of the logging of un-checked Exceptions, and transforming them into Thrift exceptions, should happen at the HMSHandler code. 2// Move Transaction Management The ObjectStore has a pretty crazy path of handling transactions. There seems to be a lot of extra code around transaction tracking that was put in probably because it's so hard to track transaction management within Hive. All of the boiler-plate transaction management code should be removed from ObjectStore and instead brought up into HMS handler as well. This allows the handler to create a single transaction per-request and call the necessary ObjectStore methods. This is not currently possible because each ObjectStore handles transactions in its own special way. When you include all of the open/commit/roll-back, and "transactional listeners," I'm not certain all code paths are correct. For example, I suspect some listeners are being alerted outside of a transaction. I also suspect some actions are occurring in multiple transactions that should really be occurring within a single transaction. I have locally created some helper-code (TransactionOperations) to do this from HMSHandler: TransactionOperations.newOperation(rawstore).execute(new TransactionCallback() { // This method is called after openTransaction is called on the RawStore // Runtime Exceptions are caught and cause the transaction to roll back // The RawStore method commitTransaction is called if method completes OK @Override public void doInTransaction(RawStore rawstore) throws MetaException { // These RawStore method happen in one transaction rawstore.methodABC(); rawstore.methodXXX(); rawstore.methodXYZ(); if (!transactionalListeners.isEmpty()) { transactionalListenersResponses = MetaStoreListenerNotifier.notifyEvent(transactionalListeners, EventType.CREATE_XXX, new CreateXxxEvent(true, HMSHandler.this, xxx)); } } }); Re-architecting the method signatures to remove the MetaExceptions is a large-ish task, but trying to unwind all this transaction code is going to be a bear, it's what prompted me to write this email. Thanks.
[jira] [Created] (HIVE-25127) Update getCatalogs
David Mollitor created HIVE-25127: - Summary: Update getCatalogs Key: HIVE-25127 URL: https://issues.apache.org/jira/browse/HIVE-25127 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25126) Remove Thrift Exceptions From RawStore
David Mollitor created HIVE-25126: - Summary: Remove Thrift Exceptions From RawStore Key: HIVE-25126 URL: https://issues.apache.org/jira/browse/HIVE-25126 Project: Hive Issue Type: Improvement Reporter: David Mollitor Remove all references to NoSuchObjectException/InvalidOperationException/MetaException from the method signature of RawStore. These Exceptions are generated by Thrift and are used to communicate error conditions across the wire. They are not designed for use as part of the underlying stack, yet in Hive, they have been pushed down into these data access operators. The RawStore should not have to be this tightly coupled to the transport layer. Remove all checked Exceptions from RawStore in favor of Hive runtime exceptions. This is a popular format and is used (and therefore dovetails nicely) with the underlying database access library DataNucleaus. All of the logging of un-checked Exceptions, and transforming them into Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}). -- This message was sent by Atlassian Jira (v8.3.4#803005)
Hive HMS RawStore-ObjectStore Design
Hello Gang, I just wanted to put out a few thoughts for anyone interested in the Metastore, and in particular, the connection handling. As I understand it, client requests from the Thrift server come into Hive via the HMSHandler class. This class lists all of the services (RPCs) that the Hive Metastore provides. This class's methods do some amount of validation, listener notification, but it ultimately calls one or more RawStore/ObjectStore methods to interact with the database. This entire orchestration needs some work to make this code more easy to work with and to improve error handling. What I propose is: 1// Remove Thrift Errors from RawStore Remove all references to NoSuchObjectException/InvalidOperationException/MetaException from the method signature of RawStore. These Exceptions are generated by Thrift and are used to communicate error conditions across the wire. They are not designed for use as part of the underlying stack, yet in Hive, they have been pushed down into these data access operators. The RawStore should not have to be this tightly coupled to the transport layer. My preference here would be to remove all checked Exceptions from RawStore in favor of runtime exceptions. This is a popular format and is used (and therefore dovetails nicely) with the underlying database access library DataNucleaus. All of the logging of un-checked Exceptions, and transforming them into Thrift exceptions, should happen at the HMSHandler code. 2// Move Transaction Management The ObjectStore has a pretty crazy path of handling transactions. There seems to be a lot of extra code around transaction tracking that was put in probably because it's so hard to track transaction management within Hive. All of the boiler-plate transaction management code should be removed from ObjectStore and instead brought up into HMS handler as well. This allows the handler to create a single transaction per-request and call the necessary ObjectStore methods. This is not currently possible because each ObjectStore handles transactions in its own special way. When you include all of the open/commit/roll-back, and "transactional listeners," I'm not certain all code paths are correct. For example, I suspect some listeners are being alerted outside of a transaction. I also suspect some actions are occurring in multiple transactions that should really be occurring within a single transaction. I have locally created some helper-code (TransactionOperations) to do this from HMSHandler: TransactionOperations.newOperation(rawstore).execute(new TransactionCallback() { // This method is called after openTransaction is called on the RawStore // Runtime Exceptions are caught and cause the transaction to roll back // The RawStore method commitTransaction is called if method completes OK @Override public void doInTransaction(RawStore rawstore) throws MetaException { // These RawStore method happen in one transaction rawstore.methodABC(); rawstore.methodXXX(); rawstore.methodXYZ(); if (!transactionalListeners.isEmpty()) { transactionalListenersResponses = MetaStoreListenerNotifier.notifyEvent(transactionalListeners, EventType.CREATE_XXX, new CreateXxxEvent(true, HMSHandler.this, xxx)); } } }); Re-architecting the method signatures to remove the MetaExceptions is a large-ish task, but trying to unwind all this transaction code is going to be a bear, it's what prompted me to write this email. Thanks.
[jira] [Created] (HIVE-25124) PTF: Vectorize cume_dist function
László Bodor created HIVE-25124: --- Summary: PTF: Vectorize cume_dist function Key: HIVE-25124 URL: https://issues.apache.org/jira/browse/HIVE-25124 Project: Hive Issue Type: Sub-task Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25125) PTF: Vectorize percent_rank function
László Bodor created HIVE-25125: --- Summary: PTF: Vectorize percent_rank function Key: HIVE-25125 URL: https://issues.apache.org/jira/browse/HIVE-25125 Project: Hive Issue Type: Sub-task Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25123) Implement vectorized streaming leag/lag
László Bodor created HIVE-25123: --- Summary: Implement vectorized streaming leag/lag Key: HIVE-25123 URL: https://issues.apache.org/jira/browse/HIVE-25123 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.3.4#803005)