[jira] [Created] (HIVE-25132) ReadDatabase event should return HiveOperationType as ShowDatabases

2021-05-17 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25132:


 Summary: ReadDatabase event should return HiveOperationType as 
ShowDatabases
 Key: HIVE-25132
 URL: https://issues.apache.org/jira/browse/HIVE-25132
 Project: Hive
  Issue Type: Bug
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Currently ReadDatabaseEvent should return a HivePrivilegeObject with 
HiveOperationType as ShowDatabases instead of Query. This is useful if we have 
a default policy in ranger that grants access to all databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25131) PreAlterPartitionEvent should have table owner details that can authorized in ranger/sentry

2021-05-17 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25131:


 Summary: PreAlterPartitionEvent should have table owner details 
that can authorized in ranger/sentry 
 Key: HIVE-25131
 URL: https://issues.apache.org/jira/browse/HIVE-25131
 Project: Hive
  Issue Type: Bug
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


PreAlterPartition event should have a table object, so that the call can be 
authorized in ranger/sentry using the owner details of the table object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread Kishen Das (Jira)
Kishen Das created HIVE-25130:
-

 Summary: alter table concat gives NullPointerException, when data 
is inserted from Spark
 Key: HIVE-25130
 URL: https://issues.apache.org/jira/browse/HIVE-25130
 Project: Hive
  Issue Type: Bug
Reporter: Kishen Das


This is the complete stack trace of the NullPointerException

2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
'java.lang.NullPointerException(null)'

java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)

at 
org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)

at 
org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)

at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)

at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)

at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)

at 
org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)

at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)

at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)

at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)

at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)

at 
org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)

at 
org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)

at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)

at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)

at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)

at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)

at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)

at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)

at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)

at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)

at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)

at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift

2021-05-17 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25129:
--

 Summary: Wrong results when timestamps stored in Avro/Parquet fall 
into the DST shift
 Key: HIVE-25129
 URL: https://issues.apache.org/jira/browse/HIVE-25129
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Timestamp values falling into the daylight savings time of the system timezone 
cannot be retrieved as is when those are stored in Parquet/Avro tables. The 
respective SELECT query shifts those timestamps by +1 reflecting the DST shift.

+Example+
{code:sql}
--! qt:timezone:US/Pacific

create table employee (eid int, birthdate timestamp) stored as parquet;

insert into employee values (0, '2019-03-10 02:00:00');
insert into employee values (1, '2020-03-08 02:00:00');
insert into employee values (2, '2021-03-14 02:00:00');

select eid, birthdate from employee order by eid;{code}

+Actual results+
|0|2019-03-10 03:00:00|
|1|2020-03-08 03:00:00|
|2|2021-03-14 03:00:00|

+Expected results+
|0|2019-03-10 02:00:00|
|1|2020-03-08 02:00:00|
|2|2021-03-14 02:00:00|

Storing and retrieving values in columns using the [timestamp data 
type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]
 (equivalent with LocalDateTime java API) should not alter at any way the value 
that the user is seeing. The results are correct for {{TEXTFILE}} and {{ORC}} 
tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-25128:
-

 Summary: Remove Thrift Exceptions From RawStore alterCatalog
 Key: HIVE-25128
 URL: https://issues.apache.org/jira/browse/HIVE-25128
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Fwd: Hive HMS RawStore-ObjectStore Design

2021-05-17 Thread David
And here is a JIRA for continued discussion:

https://issues.apache.org/jira/browse/HIVE-25126

-- Forwarded message -
From: David 
Date: Mon, May 17, 2021 at 1:42 PM
Subject: Hive HMS RawStore-ObjectStore Design
To: dev 


Hello Gang,

I just wanted to put out a few thoughts for anyone interested in the
Metastore, and in particular, the connection handling.

As I understand it, client requests from the Thrift server come into Hive
via the HMSHandler class.  This class lists all of the services (RPCs) that
the Hive Metastore provides.

This class's methods do some amount of validation, listener notification,
but it ultimately calls one or more RawStore/ObjectStore methods to
interact with the database.

This entire orchestration needs some work to make this code more easy to
work with and to improve error handling.

What I propose is:

1// Remove Thrift Errors from RawStore

Remove all references to
NoSuchObjectException/InvalidOperationException/MetaException from the
method signature of RawStore.  These Exceptions are generated by Thrift and
are used to communicate error conditions across the wire.  They are not
designed for use as part of the underlying stack, yet in Hive, they have
been pushed down into these data access operators.  The RawStore should not
have to be this tightly coupled to the transport layer.  My preference here
would be to remove all checked Exceptions from RawStore in favor of runtime
exceptions.  This is a popular format and is used (and therefore dovetails
nicely) with the underlying database access library DataNucleaus.  All of
the logging of un-checked Exceptions, and transforming them into Thrift
exceptions, should happen at the HMSHandler code.


2// Move Transaction Management

The ObjectStore has a pretty crazy path of handling transactions.  There
seems to be a lot of extra code around transaction tracking that was put in
probably because it's so hard to track transaction management within Hive.
All of the boiler-plate transaction management code should be removed from
ObjectStore and instead brought up into HMS handler as well.  This allows
the handler to create a single transaction per-request and call the
necessary ObjectStore methods.  This is not currently possible because each
ObjectStore handles transactions in its own special way. When you include
all of the open/commit/roll-back, and "transactional listeners," I'm not
certain all code paths are correct.  For example, I suspect some listeners
are being alerted outside of a transaction.  I also suspect some actions
are occurring in multiple transactions that should really be occurring
within a single transaction.

I have locally created some helper-code (TransactionOperations) to do this
from HMSHandler:

  TransactionOperations.newOperation(rawstore).execute(new
TransactionCallback() {

// This method is called after openTransaction is called on the
RawStore
// Runtime Exceptions are caught and cause the transaction to roll
back
// The RawStore method commitTransaction is called if method
completes OK
@Override
public void doInTransaction(RawStore rawstore) throws MetaException
{

  // These RawStore method happen in one transaction
  rawstore.methodABC();
  rawstore.methodXXX();
  rawstore.methodXYZ();

  if (!transactionalListeners.isEmpty()) {
transactionalListenersResponses =

MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
EventType.CREATE_XXX,
new CreateXxxEvent(true, HMSHandler.this, xxx));
  }
}
  });


Re-architecting the method signatures to remove the MetaExceptions is a
large-ish task, but trying to unwind all this transaction code is going to
be a bear, it's what prompted me to write this email.

Thanks.


[jira] [Created] (HIVE-25127) Update getCatalogs

2021-05-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-25127:
-

 Summary: Update getCatalogs
 Key: HIVE-25127
 URL: https://issues.apache.org/jira/browse/HIVE-25127
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-25126:
-

 Summary: Remove Thrift Exceptions From RawStore
 Key: HIVE-25126
 URL: https://issues.apache.org/jira/browse/HIVE-25126
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor


Remove all references to 
NoSuchObjectException/InvalidOperationException/MetaException from the method 
signature of RawStore.  These Exceptions are generated by Thrift and are used 
to communicate error conditions across the wire.  They are not designed for use 
as part of the underlying stack, yet in Hive, they have been pushed down into 
these data access operators. 

 

The RawStore should not have to be this tightly coupled to the transport layer.

 

Remove all checked Exceptions from RawStore in favor of Hive runtime 
exceptions.  This is a popular format and is used (and therefore dovetails 
nicely) with the underlying database access library DataNucleaus.

All of the logging of un-checked Exceptions, and transforming them into Thrift 
exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Hive HMS RawStore-ObjectStore Design

2021-05-17 Thread David
Hello Gang,

I just wanted to put out a few thoughts for anyone interested in the
Metastore, and in particular, the connection handling.

As I understand it, client requests from the Thrift server come into Hive
via the HMSHandler class.  This class lists all of the services (RPCs) that
the Hive Metastore provides.

This class's methods do some amount of validation, listener notification,
but it ultimately calls one or more RawStore/ObjectStore methods to
interact with the database.

This entire orchestration needs some work to make this code more easy to
work with and to improve error handling.

What I propose is:

1// Remove Thrift Errors from RawStore

Remove all references to
NoSuchObjectException/InvalidOperationException/MetaException from the
method signature of RawStore.  These Exceptions are generated by Thrift and
are used to communicate error conditions across the wire.  They are not
designed for use as part of the underlying stack, yet in Hive, they have
been pushed down into these data access operators.  The RawStore should not
have to be this tightly coupled to the transport layer.  My preference here
would be to remove all checked Exceptions from RawStore in favor of runtime
exceptions.  This is a popular format and is used (and therefore dovetails
nicely) with the underlying database access library DataNucleaus.  All of
the logging of un-checked Exceptions, and transforming them into Thrift
exceptions, should happen at the HMSHandler code.


2// Move Transaction Management

The ObjectStore has a pretty crazy path of handling transactions.  There
seems to be a lot of extra code around transaction tracking that was put in
probably because it's so hard to track transaction management within Hive.
All of the boiler-plate transaction management code should be removed from
ObjectStore and instead brought up into HMS handler as well.  This allows
the handler to create a single transaction per-request and call the
necessary ObjectStore methods.  This is not currently possible because each
ObjectStore handles transactions in its own special way. When you include
all of the open/commit/roll-back, and "transactional listeners," I'm not
certain all code paths are correct.  For example, I suspect some listeners
are being alerted outside of a transaction.  I also suspect some actions
are occurring in multiple transactions that should really be occurring
within a single transaction.

I have locally created some helper-code (TransactionOperations) to do this
from HMSHandler:

  TransactionOperations.newOperation(rawstore).execute(new
TransactionCallback() {

// This method is called after openTransaction is called on the
RawStore
// Runtime Exceptions are caught and cause the transaction to roll
back
// The RawStore method commitTransaction is called if method
completes OK
@Override
public void doInTransaction(RawStore rawstore) throws MetaException
{

  // These RawStore method happen in one transaction
  rawstore.methodABC();
  rawstore.methodXXX();
  rawstore.methodXYZ();

  if (!transactionalListeners.isEmpty()) {
transactionalListenersResponses =

MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
EventType.CREATE_XXX,
new CreateXxxEvent(true, HMSHandler.this, xxx));
  }
}
  });


Re-architecting the method signatures to remove the MetaExceptions is a
large-ish task, but trying to unwind all this transaction code is going to
be a bear, it's what prompted me to write this email.

Thanks.


[jira] [Created] (HIVE-25124) PTF: Vectorize cume_dist function

2021-05-17 Thread Jira
László Bodor created HIVE-25124:
---

 Summary: PTF: Vectorize cume_dist function
 Key: HIVE-25124
 URL: https://issues.apache.org/jira/browse/HIVE-25124
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25125) PTF: Vectorize percent_rank function

2021-05-17 Thread Jira
László Bodor created HIVE-25125:
---

 Summary: PTF: Vectorize percent_rank function
 Key: HIVE-25125
 URL: https://issues.apache.org/jira/browse/HIVE-25125
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25123) Implement vectorized streaming leag/lag

2021-05-17 Thread Jira
László Bodor created HIVE-25123:
---

 Summary: Implement vectorized streaming leag/lag
 Key: HIVE-25123
 URL: https://issues.apache.org/jira/browse/HIVE-25123
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)