[jira] [Created] (HIVE-25137) getAllWriteEventInfo calls RDBMS directly instead of going through the HMS client

2021-05-18 Thread Pratyushotpal Madhukar (Jira)
Pratyushotpal Madhukar created HIVE-25137:
-

 Summary: getAllWriteEventInfo calls RDBMS directly instead of 
going through the HMS client
 Key: HIVE-25137
 URL: https://issues.apache.org/jira/browse/HIVE-25137
 Project: Hive
  Issue Type: Improvement
Reporter: Pratyushotpal Madhukar


{code:java}
private List getAllWriteEventInfo(Context withinContext) throws 
Exception {
String contextDbName = 
StringUtils.normalizeIdentifier(withinContext.replScope.getDbName());
RawStore rawStore = 
HiveMetaStore.HMSHandler.getMSForConf(withinContext.hiveConf);
List writeEventInfoList
= rawStore.getAllWriteEventInfo(eventMessage.getTxnId(), 
contextDbName, null);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25136) Remove MetaExceptions From RawStore First Cut

2021-05-18 Thread David Mollitor (Jira)
David Mollitor created HIVE-25136:
-

 Summary: Remove MetaExceptions From RawStore First Cut
 Key: HIVE-25136
 URL: https://issues.apache.org/jira/browse/HIVE-25136
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Hive HMS RawStore-ObjectStore Design

2021-05-18 Thread David
Hello  Narayanan,

Thank you so much for your feedback.

As I view it, and have had success in other projects, the RawStore
interface defines the methods for pulling data from some underlying data
source.
I would imagine something like JdbcRawStore, FileRawStore, HBaseRawStore,
MongoRawStore, etc.

I do not believe that CachedStore should be a feature.  Given that there
can be multiple HMS accessing the same data source (RDBMS for example),
there is not a lot of caching within the HMS application.  I think we
should revisit this, we should better leverage DataNucleaus Level 2 Cache,
but that is another issue.
The point is that each RawStore instance should be able to perform caching
that makes sense for the underlying data source, I do not think there
should be some sort
of cache that is wrapping the RawStore implementation. My expectation would
be that the HMSHandler would call a separate caching framework and
only reach out to the underlying data source (the RawStore), and start a
transaction, if there it needs to load something that is not already in the
main cache, thus
the cache is not tightly coupled to the RawStore stuff.  If you're not
familiar with this, check out the Guava LoaderCache.  In this example, the
loading code would
reach out to the RawStore.

https://github.com/google/guava/wiki/CachesExplained



InvalidOperationException is a Thrift-generated error.  We really, really,
should not have these Thrift generated checked exceptions embedded in the
HMS.  They lack
some pretty important features like the ability to specify a "cause" for
the exception. There are all kinds of weird workarounds in the code to
account for these deficiencies.
For example, there are places in the code that embed stack traces in the
message of the Thrift Exceptions and then parse them to re-create a new
Exception.  This is pretty
bonkers and added a lot of code/time to add these capabilities instead of
simply using standing POJO exceptions.  Also, if someday for example, we
wanted to replace Thrift
with another RPC engine (ProtoBuffs), we have to yank out all of these
Exception classes before we can fully remove the dependency on Thrift.

Any Thrift-Generated exceptions should only be generated in the HMSHandler
(which should be called HMSThriftHandler) class.  Whatever the RawStore
needs to throw,
it should be unchecked Hive project exceptions.  Interestingly, as I've
been examining the code, I've noticed that many of the unchecked Exceptions
that DataNucleaus
throws go un-caught, so the code claims that the Thrift-generated
MetaException is the generic "database error," however, that is simply not
true. Most of the methods
do not catch the DataNucleaus exceptions and those bubble up.

So yes, any such Thrift-generated Exceptions should be replace by something
from Hive: "InvalidOperationException"  is fine to pass back to the Thrift
client, but it should be
something like:

// HMSHandler
try {
   rawStore.doThings();
} catch (IllegalArgumentException e) {
  LOG.error("Error doing things", e);
  throw new  InvalidOperationException(e.getMessage());
}

https://issues.apache.org/jira/browse/HIVE-25126



To your point about "rollbackAndCleanup," if you look at the RawStore
interface as simply a class that provides persistence, then the
"andCleanup" part is moot. Anything that
is unrelated to data persistence needs to be moved out.  If the calls to
RawStore have weird side effects that do not directly apply to the storage
(like creating a warehouse directory in HDFS),
that code needs to be removed.  Anything related to the transaction is
reverted by the calling class with a call to RawStore#rollbackTransaction.


Thanks.

On Tue, May 18, 2021 at 9:17 AM Narayanan Venkateswaran 
wrote:

> Hi,
>
> Thank you for the email,
>
> please find some replies inline,
>
> On Mon, May 17, 2021 at 11:12 PM David  wrote:
>
> > Hello Gang,
> >
> > I just wanted to put out a few thoughts for anyone interested in the
> > Metastore, and in particular, the connection handling.
> >
> > As I understand it, client requests from the Thrift server come into Hive
> > via the HMSHandler class.  This class lists all of the services (RPCs)
> that
> > the Hive Metastore provides.
> >
> > This class's methods do some amount of validation, listener notification,
> > but it ultimately calls one or more RawStore/ObjectStore methods to
> > interact with the database.
> >
>
> I guess you mean one of the RawStore implementations to interact with the
> database,
>
> for example it could be CachedStore also.
>
>
> >
> > This entire orchestration needs some work to make this code more easy to
> > work with and to improve error handling.
> >
> > What I propose is:
> >
> > 1// Remove Thrift Errors from RawStore
> >
> > Remove all references to
> > NoSuchObjectException/InvalidOperationException/MetaException from the
> > method signature of RawStore.  These Exceptions are generated by Thrift
> and
> > are used to communicate error 

Re: Hive HMS RawStore-ObjectStore Design

2021-05-18 Thread Narayanan Venkateswaran
Hi,

Thank you for the email,

please find some replies inline,

On Mon, May 17, 2021 at 11:12 PM David  wrote:

> Hello Gang,
>
> I just wanted to put out a few thoughts for anyone interested in the
> Metastore, and in particular, the connection handling.
>
> As I understand it, client requests from the Thrift server come into Hive
> via the HMSHandler class.  This class lists all of the services (RPCs) that
> the Hive Metastore provides.
>
> This class's methods do some amount of validation, listener notification,
> but it ultimately calls one or more RawStore/ObjectStore methods to
> interact with the database.
>

I guess you mean one of the RawStore implementations to interact with the
database,

for example it could be CachedStore also.


>
> This entire orchestration needs some work to make this code more easy to
> work with and to improve error handling.
>
> What I propose is:
>
> 1// Remove Thrift Errors from RawStore
>
> Remove all references to
> NoSuchObjectException/InvalidOperationException/MetaException from the
> method signature of RawStore.  These Exceptions are generated by Thrift and
> are used to communicate error conditions across the wire.  They are not
> designed for use as part of the underlying stack, yet in Hive, they have
> been pushed down into these data access operators.  The RawStore should not
> have to be this tightly coupled to the transport layer.  My preference here
> would be to remove all checked Exceptions from RawStore in favor of runtime
> exceptions.


I guess we will have to look at what to do with InvalidOperationExceptions
such as the following,

if (hasNsChange) {
  throw new InvalidOperationException("Cannot change ns; from " +
getNsOrDefault(plan.getNs())
  + " to " + changes.getNs());
 }

So do we retain these?




> This is a popular format and is used (and therefore dovetails
> nicely) with the underlying database access library DataNucleaus.  All of
> the logging of un-checked Exceptions, and transforming them into Thrift
> exceptions, should happen at the HMSHandler code.
>

I am OK with this, but you might end up amalgamating a lot of RawStore
implementation-specific exception handling code in the HMSHandler, an
already cluttered class. For example, the CachedStore implementation may be
throwing an exception different from the ObjectStore and we will probably
end up wrapping both of these in the HMSHandler.

Similarly, each RawStore implementation may need to perform actions in the
event of an exception from the underlying database,
e.g. rollbackAndCleanup. I guess this is where the moving of transaction
management kicks in?



>
> 2// Move Transaction Management
>
> The ObjectStore has a pretty crazy path of handling transactions.  There
> seems to be a lot of extra code around transaction tracking that was put in
> probably because it's so hard to track transaction management within Hive.
>

Agreed!


> All of the boiler-plate transaction management code should be removed from
> ObjectStore and instead brought up into HMS handler as well.


This will need to be thought through. I agree that refactoring transaction
management is a good idea, but moving it into HMSHandler is just going to
further clutter HMSHandler. HMSHandler is already a huge class with a lot
of clutter that needs to be refactored before we add more into it.

This allows
> the handler to create a single transaction per-request and call the
> necessary ObjectStore methods.  This is not currently possible because each
> ObjectStore handles transactions in its own special way. When you include
> all of the open/commit/roll-back, and "transactional listeners," I'm not
> certain all code paths are correct.  For example, I suspect some listeners
> are being alerted outside of a transaction.  I also suspect some actions
> are occurring in multiple transactions that should really be occurring
> within a single transaction.
>

I think a CachedStore also passes down various calls down to the
ObjectStore. Starting a transaction in HMSHandler will keep the transaction
open for a more than necessary duration. I agree that the refactoring needs
to happen, but I am not convinced that moving all transaction handling to
HMSHandler is the way forward. Ofcourse I don't have a better idea as of
now.


>
> I have locally created some helper-code (TransactionOperations) to do this
> from HMSHandler:
>
>   TransactionOperations.newOperation(rawstore).execute(new
> TransactionCallback() {
>
> // This method is called after openTransaction is called on the
> RawStore
> // Runtime Exceptions are caught and cause the transaction to roll
> back
> // The RawStore method commitTransaction is called if method
> completes OK
> @Override
> public void doInTransaction(RawStore rawstore) throws MetaException
> {
>
>   // These RawStore method happen in one transaction
>   rawstore.methodABC();
>   rawstore.methodXXX();
>   

[jira] [Created] (HIVE-25135) Vectorization: Wrong Results issues in IF expressions about Two-level nested UDF

2021-05-18 Thread ZhangQiDong (Jira)
ZhangQiDong created HIVE-25135:
--

 Summary: Vectorization: Wrong Results issues in IF expressions 
about Two-level nested UDF
 Key: HIVE-25135
 URL: https://issues.apache.org/jira/browse/HIVE-25135
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 3.1.0, 4.0.0
Reporter: ZhangQiDong


After set hive.vectorized.execution.enabled = true, if there are two levels 
nested UDF conversion fields in the IF expression, the result will be incorrect.

Test case:
create table if_orc (col string, col2 string) stored as orc;
insert into table if_orc values('1', 'abc'),('1', 'abc'),('2', 'def'),( '2', 
'def');
set hive.vectorized.execution.enabled = true;
select if(col='2', col2, reverse(upper(col2))) from if_orc;

Hivesql:
select if(col='2', col2, reverse(lupper (col2))) from if_orc;

set hive.vectorized.execution.enabled = false;
Right Result:
+--+
| _c0  |
+--+
| CBA  |
| CBA  |
| def  |
| def  |
+--+

set hive.vectorized.execution.enabled = true;
Wrong result:
+--+
| _c0  |
+--+
| CBA  |
| CBA  |
| ABC  |
| ABC  |
+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25134) NPE in TestHiveCli.java

2021-05-18 Thread gaozhan ding (Jira)
gaozhan ding created HIVE-25134:
---

 Summary: NPE in TestHiveCli.java
 Key: HIVE-25134
 URL: https://issues.apache.org/jira/browse/HIVE-25134
 Project: Hive
  Issue Type: Test
  Components: Beeline, Test
Reporter: gaozhan ding


{code:java}
@Before
public void setup() throws IOException, URISyntaxException {
  System.setProperty("datanucleus.schema.autoCreateAll", "true");
  cli = new HiveCli();
  initFromFile();
  redirectOutputStream();
}
{code}
In *setup()*, *initFromFile()* may access *err* before initialization

 

 
{code:java}
[ERROR] org.apache.hive.beeline.cli.TestHiveCli.testSetPromptValue  Time 
elapsed: 1.167 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:249)
at 
org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:315)
at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:288)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)

{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25133) Allow custom configs for database level paths in external table replication

2021-05-18 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25133:
---

 Summary: Allow custom configs for database level paths in external 
table replication
 Key: HIVE-25133
 URL: https://issues.apache.org/jira/browse/HIVE-25133
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Allow a way to provide configurations which should be used only by the external 
data copy task of database level paths



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25132) ReadDatabase event should return HiveOperationType as ShowDatabases

2021-05-18 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25132:


 Summary: ReadDatabase event should return HiveOperationType as 
ShowDatabases
 Key: HIVE-25132
 URL: https://issues.apache.org/jira/browse/HIVE-25132
 Project: Hive
  Issue Type: Bug
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Currently ReadDatabaseEvent should return a HivePrivilegeObject with 
HiveOperationType as ShowDatabases instead of Query. This is useful if we have 
a default policy in ranger that grants access to all databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25131) PreAlterPartitionEvent should have table owner details that can authorized in ranger/sentry

2021-05-18 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25131:


 Summary: PreAlterPartitionEvent should have table owner details 
that can authorized in ranger/sentry 
 Key: HIVE-25131
 URL: https://issues.apache.org/jira/browse/HIVE-25131
 Project: Hive
  Issue Type: Bug
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


PreAlterPartition event should have a table object, so that the call can be 
authorized in ranger/sentry using the owner details of the table object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread Kishen Das (Jira)
Kishen Das created HIVE-25130:
-

 Summary: alter table concat gives NullPointerException, when data 
is inserted from Spark
 Key: HIVE-25130
 URL: https://issues.apache.org/jira/browse/HIVE-25130
 Project: Hive
  Issue Type: Bug
Reporter: Kishen Das


This is the complete stack trace of the NullPointerException

2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
'java.lang.NullPointerException(null)'

java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)

at 
org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)

at 
org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)

at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)

at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)

at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)

at 
org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)

at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)

at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)

at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)

at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)

at 
org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)

at 
org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)

at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)

at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)

at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)

at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)

at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)

at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)

at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)

at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)

at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)

at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift

2021-05-17 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25129:
--

 Summary: Wrong results when timestamps stored in Avro/Parquet fall 
into the DST shift
 Key: HIVE-25129
 URL: https://issues.apache.org/jira/browse/HIVE-25129
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Timestamp values falling into the daylight savings time of the system timezone 
cannot be retrieved as is when those are stored in Parquet/Avro tables. The 
respective SELECT query shifts those timestamps by +1 reflecting the DST shift.

+Example+
{code:sql}
--! qt:timezone:US/Pacific

create table employee (eid int, birthdate timestamp) stored as parquet;

insert into employee values (0, '2019-03-10 02:00:00');
insert into employee values (1, '2020-03-08 02:00:00');
insert into employee values (2, '2021-03-14 02:00:00');

select eid, birthdate from employee order by eid;{code}

+Actual results+
|0|2019-03-10 03:00:00|
|1|2020-03-08 03:00:00|
|2|2021-03-14 03:00:00|

+Expected results+
|0|2019-03-10 02:00:00|
|1|2020-03-08 02:00:00|
|2|2021-03-14 02:00:00|

Storing and retrieving values in columns using the [timestamp data 
type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]
 (equivalent with LocalDateTime java API) should not alter at any way the value 
that the user is seeing. The results are correct for {{TEXTFILE}} and {{ORC}} 
tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-25128:
-

 Summary: Remove Thrift Exceptions From RawStore alterCatalog
 Key: HIVE-25128
 URL: https://issues.apache.org/jira/browse/HIVE-25128
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Fwd: Hive HMS RawStore-ObjectStore Design

2021-05-17 Thread David
And here is a JIRA for continued discussion:

https://issues.apache.org/jira/browse/HIVE-25126

-- Forwarded message -
From: David 
Date: Mon, May 17, 2021 at 1:42 PM
Subject: Hive HMS RawStore-ObjectStore Design
To: dev 


Hello Gang,

I just wanted to put out a few thoughts for anyone interested in the
Metastore, and in particular, the connection handling.

As I understand it, client requests from the Thrift server come into Hive
via the HMSHandler class.  This class lists all of the services (RPCs) that
the Hive Metastore provides.

This class's methods do some amount of validation, listener notification,
but it ultimately calls one or more RawStore/ObjectStore methods to
interact with the database.

This entire orchestration needs some work to make this code more easy to
work with and to improve error handling.

What I propose is:

1// Remove Thrift Errors from RawStore

Remove all references to
NoSuchObjectException/InvalidOperationException/MetaException from the
method signature of RawStore.  These Exceptions are generated by Thrift and
are used to communicate error conditions across the wire.  They are not
designed for use as part of the underlying stack, yet in Hive, they have
been pushed down into these data access operators.  The RawStore should not
have to be this tightly coupled to the transport layer.  My preference here
would be to remove all checked Exceptions from RawStore in favor of runtime
exceptions.  This is a popular format and is used (and therefore dovetails
nicely) with the underlying database access library DataNucleaus.  All of
the logging of un-checked Exceptions, and transforming them into Thrift
exceptions, should happen at the HMSHandler code.


2// Move Transaction Management

The ObjectStore has a pretty crazy path of handling transactions.  There
seems to be a lot of extra code around transaction tracking that was put in
probably because it's so hard to track transaction management within Hive.
All of the boiler-plate transaction management code should be removed from
ObjectStore and instead brought up into HMS handler as well.  This allows
the handler to create a single transaction per-request and call the
necessary ObjectStore methods.  This is not currently possible because each
ObjectStore handles transactions in its own special way. When you include
all of the open/commit/roll-back, and "transactional listeners," I'm not
certain all code paths are correct.  For example, I suspect some listeners
are being alerted outside of a transaction.  I also suspect some actions
are occurring in multiple transactions that should really be occurring
within a single transaction.

I have locally created some helper-code (TransactionOperations) to do this
from HMSHandler:

  TransactionOperations.newOperation(rawstore).execute(new
TransactionCallback() {

// This method is called after openTransaction is called on the
RawStore
// Runtime Exceptions are caught and cause the transaction to roll
back
// The RawStore method commitTransaction is called if method
completes OK
@Override
public void doInTransaction(RawStore rawstore) throws MetaException
{

  // These RawStore method happen in one transaction
  rawstore.methodABC();
  rawstore.methodXXX();
  rawstore.methodXYZ();

  if (!transactionalListeners.isEmpty()) {
transactionalListenersResponses =

MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
EventType.CREATE_XXX,
new CreateXxxEvent(true, HMSHandler.this, xxx));
  }
}
  });


Re-architecting the method signatures to remove the MetaExceptions is a
large-ish task, but trying to unwind all this transaction code is going to
be a bear, it's what prompted me to write this email.

Thanks.


[jira] [Created] (HIVE-25127) Update getCatalogs

2021-05-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-25127:
-

 Summary: Update getCatalogs
 Key: HIVE-25127
 URL: https://issues.apache.org/jira/browse/HIVE-25127
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-25126:
-

 Summary: Remove Thrift Exceptions From RawStore
 Key: HIVE-25126
 URL: https://issues.apache.org/jira/browse/HIVE-25126
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor


Remove all references to 
NoSuchObjectException/InvalidOperationException/MetaException from the method 
signature of RawStore.  These Exceptions are generated by Thrift and are used 
to communicate error conditions across the wire.  They are not designed for use 
as part of the underlying stack, yet in Hive, they have been pushed down into 
these data access operators. 

 

The RawStore should not have to be this tightly coupled to the transport layer.

 

Remove all checked Exceptions from RawStore in favor of Hive runtime 
exceptions.  This is a popular format and is used (and therefore dovetails 
nicely) with the underlying database access library DataNucleaus.

All of the logging of un-checked Exceptions, and transforming them into Thrift 
exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Hive HMS RawStore-ObjectStore Design

2021-05-17 Thread David
Hello Gang,

I just wanted to put out a few thoughts for anyone interested in the
Metastore, and in particular, the connection handling.

As I understand it, client requests from the Thrift server come into Hive
via the HMSHandler class.  This class lists all of the services (RPCs) that
the Hive Metastore provides.

This class's methods do some amount of validation, listener notification,
but it ultimately calls one or more RawStore/ObjectStore methods to
interact with the database.

This entire orchestration needs some work to make this code more easy to
work with and to improve error handling.

What I propose is:

1// Remove Thrift Errors from RawStore

Remove all references to
NoSuchObjectException/InvalidOperationException/MetaException from the
method signature of RawStore.  These Exceptions are generated by Thrift and
are used to communicate error conditions across the wire.  They are not
designed for use as part of the underlying stack, yet in Hive, they have
been pushed down into these data access operators.  The RawStore should not
have to be this tightly coupled to the transport layer.  My preference here
would be to remove all checked Exceptions from RawStore in favor of runtime
exceptions.  This is a popular format and is used (and therefore dovetails
nicely) with the underlying database access library DataNucleaus.  All of
the logging of un-checked Exceptions, and transforming them into Thrift
exceptions, should happen at the HMSHandler code.


2// Move Transaction Management

The ObjectStore has a pretty crazy path of handling transactions.  There
seems to be a lot of extra code around transaction tracking that was put in
probably because it's so hard to track transaction management within Hive.
All of the boiler-plate transaction management code should be removed from
ObjectStore and instead brought up into HMS handler as well.  This allows
the handler to create a single transaction per-request and call the
necessary ObjectStore methods.  This is not currently possible because each
ObjectStore handles transactions in its own special way. When you include
all of the open/commit/roll-back, and "transactional listeners," I'm not
certain all code paths are correct.  For example, I suspect some listeners
are being alerted outside of a transaction.  I also suspect some actions
are occurring in multiple transactions that should really be occurring
within a single transaction.

I have locally created some helper-code (TransactionOperations) to do this
from HMSHandler:

  TransactionOperations.newOperation(rawstore).execute(new
TransactionCallback() {

// This method is called after openTransaction is called on the
RawStore
// Runtime Exceptions are caught and cause the transaction to roll
back
// The RawStore method commitTransaction is called if method
completes OK
@Override
public void doInTransaction(RawStore rawstore) throws MetaException
{

  // These RawStore method happen in one transaction
  rawstore.methodABC();
  rawstore.methodXXX();
  rawstore.methodXYZ();

  if (!transactionalListeners.isEmpty()) {
transactionalListenersResponses =

MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
EventType.CREATE_XXX,
new CreateXxxEvent(true, HMSHandler.this, xxx));
  }
}
  });


Re-architecting the method signatures to remove the MetaExceptions is a
large-ish task, but trying to unwind all this transaction code is going to
be a bear, it's what prompted me to write this email.

Thanks.


[jira] [Created] (HIVE-25124) PTF: Vectorize cume_dist function

2021-05-17 Thread Jira
László Bodor created HIVE-25124:
---

 Summary: PTF: Vectorize cume_dist function
 Key: HIVE-25124
 URL: https://issues.apache.org/jira/browse/HIVE-25124
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25125) PTF: Vectorize percent_rank function

2021-05-17 Thread Jira
László Bodor created HIVE-25125:
---

 Summary: PTF: Vectorize percent_rank function
 Key: HIVE-25125
 URL: https://issues.apache.org/jira/browse/HIVE-25125
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25123) Implement vectorized streaming leag/lag

2021-05-17 Thread Jira
László Bodor created HIVE-25123:
---

 Summary: Implement vectorized streaming leag/lag
 Key: HIVE-25123
 URL: https://issues.apache.org/jira/browse/HIVE-25123
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25122) Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver

2021-05-17 Thread Harish JP (Jira)
Harish JP created HIVE-25122:


 Summary: Intermittent test failures in 
org.apache.hadoop.hive.cli.TestBeeLineDriver
 Key: HIVE-25122
 URL: https://issues.apache.org/jira/browse/HIVE-25122
 Project: Hive
  Issue Type: Bug
Reporter: Harish JP
 Attachments: org.apache.hadoop.hive.cli.TestBeeLineDriver.txt

Hive test is failing with error, disabling tests. The build link where it 
failed: 
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2120/4/tests/]

Error info: [^org.apache.hadoop.hive.cli.TestBeeLineDriver.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25121) Fix qfile results due to disabling discovery.partitions

2021-05-16 Thread Yu-Wen Lai (Jira)
Yu-Wen Lai created HIVE-25121:
-

 Summary: Fix qfile results due to disabling discovery.partitions
 Key: HIVE-25121
 URL: https://issues.apache.org/jira/browse/HIVE-25121
 Project: Hive
  Issue Type: Bug
Reporter: Yu-Wen Lai
Assignee: Yu-Wen Lai


After the patch for HIVE-25039 is merged, some other tests should be updated as 
well.

There are three qfile tests is failing now.
 # testCliDriver[alter_multi_part_table_to_iceberg] – 
org.apache.hadoop.hive.cli.TestIcebergCliDriver
 # testCliDriver[alter_part_table_to_iceberg] – 
org.apache.hadoop.hive.cli.TestIcebergCliDriver
 # testCliDriver[create_table_explain_ddl] – 
org.apache.hadoop.hive.cli.split5.TestMiniLlapLocalCliDriver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25120) VectorizedParquetRecordReader can't to read parquet file with encrypted footer

2021-05-16 Thread George Song (Jira)
George Song created HIVE-25120:
--

 Summary: VectorizedParquetRecordReader can't to read parquet file 
with encrypted footer
 Key: HIVE-25120
 URL: https://issues.apache.org/jira/browse/HIVE-25120
 Project: Hive
  Issue Type: Bug
  Components: Parquet
Reporter: George Song
Assignee: Ganesha Shreedhara
 Fix For: 4.0.0


Taking an example of a parquet table having array of integers as below. 
{code:java}
CREATE EXTERNAL TABLE ( list_of_ints` array)
STORED AS PARQUET 
LOCATION '{location}';
{code}
Parquet file generated using hive will have schema for Type as below:
{code:java}
group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
{code}
Parquet file generated using thrift or any custom tool (using 
org.apache.parquet.io.api.RecordConsumer)

may have schema for Type as below:
{code:java}
required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
VectorizedParquetRecordReader handles only parquet file generated using hive. 
It throws the following exception when parquet file generated using thrift is 
read because of the changes done as part of HIVE-18553 .
{code:java}
Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
not a group
 at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
 

 I have done a small change to handle the case where the child type of group 
type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25119) Upgrade Parquet to 1.12.0

2021-05-16 Thread George Song (Jira)
George Song created HIVE-25119:
--

 Summary: Upgrade Parquet to 1.12.0
 Key: HIVE-25119
 URL: https://issues.apache.org/jira/browse/HIVE-25119
 Project: Hive
  Issue Type: Improvement
Reporter: George Song
Assignee: Chao Sun
 Fix For: 4.0.0


Parquet 1.11.1 has some bug fixes so Hive should consider to upgrade to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25118) CTAS accepts column's with dot(.) if CBO fails

2021-05-14 Thread Naresh P R (Jira)
Naresh P R created HIVE-25118:
-

 Summary: CTAS accepts column's with dot(.) if CBO fails
 Key: HIVE-25118
 URL: https://issues.apache.org/jira/browse/HIVE-25118
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R


create table t1(id int);

create table t2(id int);

create table t3 as select t1.id, t2.id from t1 join t2;

CBO fails if "hive.stats.column.autogather=true" with "SemanticException 
Ambiguous column reference: id" & CTAS passes with following table schema
{code:java}
desc t3;
+---++--+
| col_name  | data_type  | comment  |
+---++--+
| t1.id | int|  |
| t2.id | int|  |
+---++--+{code}
create table t3(`t1.id` int, `t2.id` int); will fail for dot(.) in column name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25117) Vector PTF ClassCastException with Decimal64

2021-05-14 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25117:
-

 Summary: Vector PTF ClassCastException with Decimal64
 Key: HIVE-25117
 URL: https://issues.apache.org/jira/browse/HIVE-25117
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis


Only reproduces when there is at least 1 buffered batch, so needed 2 rows with 
1 row/batch:

{code:java}
set hive.vectorized.testing.reducer.batch.size=1;
{code}

{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.copyNonSelectedColumnVector(VectorizedBatchUtil.java:664)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.forwardBufferedBatches(VectorPTFGroupBatches.java:228)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.fillGroupResultsAndForward(VectorPTFGroupBatches.java:318)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:403)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25116) Exclude slf4j from hive-exec uber Jar included by avatica

2021-05-14 Thread Andras Katona (Jira)
Andras Katona created HIVE-25116:


 Summary: Exclude slf4j from hive-exec uber Jar included by avatica
 Key: HIVE-25116
 URL: https://issues.apache.org/jira/browse/HIVE-25116
 Project: Hive
  Issue Type: Bug
Reporter: Andras Katona
Assignee: Andras Katona


org.apache.calcite.avatica:avatica includes slf4j-api in itself hence hive-exec 
packages it into its uber jar too.
It causes issues where hive-exec is pulled in and caused classloader linkage 
error (it happened in ranger plugin where there's a separate classloader to 
load the plugin).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25115) Compaction queue entries may accumulate in "ready for cleaning" state

2021-05-14 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-25115:


 Summary: Compaction queue entries may accumulate in "ready for 
cleaning" state
 Key: HIVE-25115
 URL: https://issues.apache.org/jira/browse/HIVE-25115
 Project: Hive
  Issue Type: Improvement
Reporter: Karen Coppage
Assignee: Karen Coppage


If the Cleaner does not delete any files, the compaction queue entry is thrown 
back to the queue and remains in "ready for cleaning" state.

Problem: If 2 compactions run on the same table and enter "ready for cleaning" 
state at the same time, only one "cleaning" will remove obsolete files, the 
other entry will remain in the queue in "ready for cleaning" state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Hive 2.3.9 release

2021-05-14 Thread Chao Sun
Hi all,

It's been four months since the 2.3.8 release and there are few commits
accumulated in the branch-2.3, including fixes for interoperability
with Avro 1.10.1, as well as fixes for backward compatibility with HMS <
2.3. Therefore, if there is no objection, I'll start to prepare the 2.3.9
release followed by a vote. Please let me know if you plan to add more
items to this release. Thanks.

Chao


[jira] [Created] (HIVE-25114) Optmize get_tables() api call in HMS

2021-05-13 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25114:


 Summary: Optmize get_tables() api call in HMS
 Key: HIVE-25114
 URL: https://issues.apache.org/jira/browse/HIVE-25114
 Project: Hive
  Issue Type: Improvement
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Optmize get_tables() call in HMS api. There should only be one call to object 
store instead of 2 calls to return the table objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25113) Connection starvation in TxnHandler.getValidWriteIds

2021-05-13 Thread Yu-Wen Lai (Jira)
Yu-Wen Lai created HIVE-25113:
-

 Summary: Connection starvation in TxnHandler.getValidWriteIds
 Key: HIVE-25113
 URL: https://issues.apache.org/jira/browse/HIVE-25113
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Yu-Wen Lai
Assignee: Yu-Wen Lai


 

The current code looks like below.
{code:java}
dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
validTxnList = TxnUtils.createValidReadTxnList(getOpenTxns(), 0);
{code}
In the function getOpenTxns, it will request another connection from pool. That 
is, this thread already held a connection, however, it would request for 
another connection. When there are more than 10 (default connection pool size) 
simultaneous getValidWriteIds requests, it can cause a starvation problem. In 
that situation, each thread holds a connection and waits for another 
connection. Then, we will see the following exception after timeout.
{code:java}
metastore.RetryingHMSHandler: MetaException(message:Unable to select from 
transaction database, java.sql.SQLTransientConnectionException: HikariPool-3 - 
Connection is not available, request timed out after 3ms.{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-25112:
-

 Summary: Simplify TXN Compactor Heartbeat Thread
 Key: HIVE-25112
 URL: https://issues.apache.org/jira/browse/HIVE-25112
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
they already have it.  It is running/interrupted and it is designed to work 
this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-25111:
-

 Summary: Metastore Catalog Methods JDO Persistence
 Key: HIVE-25111
 URL: https://issues.apache.org/jira/browse/HIVE-25111
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25110) Upgrade JDO Persistence to Use DN5 Features

2021-05-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-25110:
-

 Summary: Upgrade JDO Persistence to Use DN5 Features
 Key: HIVE-25110
 URL: https://issues.apache.org/jira/browse/HIVE-25110
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Standalone Metastore
Reporter: David Mollitor
Assignee: David Mollitor


Hive has updated DataNucealus for Hive v4 but is not taking advantage of new 
features and paradigms.  There's a ton of code in Hive that can be removed in 
favor or relying on the underlying libraries using their best practices.

 

https://www.datanucleus.org/products/accessplatform_5_2/index.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-13 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25109:
-

 Summary: CBO fails when updating table has constraints defined
 Key: HIVE-25109
 URL: https://issues.apache.org/jira/browse/HIVE-25109
 Project: Hive
  Issue Type: Bug
  Components: CBO, Logical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table acid_uami_n0(i int,
 de decimal(5,2) constraint nn1 not null enforced,
 vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
decimal(5,2))) enforced)
 clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
('transactional'='true');

-- update
explain cbo
update acid_uami_n0 set de = 893.14 where de = 103.00;
{code}

hive.log
{code}
2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
Schema didn't match Optimized Op Tree Schema
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at 

[jira] [Created] (HIVE-25108) Do Not Log and Throw MetaExceptions

2021-05-12 Thread David Mollitor (Jira)
David Mollitor created HIVE-25108:
-

 Summary: Do Not Log and Throw MetaExceptions
 Key: HIVE-25108
 URL: https://issues.apache.org/jira/browse/HIVE-25108
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


"Log and throw" is a bad pattern and leads to logging the same error multiple 
times.

There is code in Hive that explicitly implements this behavior and should 
therefore be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25107) Thread classpath logging should be optional

2021-05-12 Thread Jira
László Bodor created HIVE-25107:
---

 Summary: Thread classpath logging should be optional
 Key: HIVE-25107
 URL: https://issues.apache.org/jira/browse/HIVE-25107
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25106) Do not exclude avatica and protobuf for Iceberg

2021-05-12 Thread Marton Bod (Jira)
Marton Bod created HIVE-25106:
-

 Summary: Do not exclude avatica and protobuf for Iceberg
 Key: HIVE-25106
 URL: https://issues.apache.org/jira/browse/HIVE-25106
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


When running tests from the IDE, the current dependency exclusions in the 
hive-iceberg pom can result in:
{code:java}
Caused by: java.lang.ClassNotFoundException: 
com.google.protobuf.GeneratedMessageV3 at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418) at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 62 more
 
{code}
and
{code:java}
 Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.calcite.avatica.ConnectionPropertiesImpl at 
org.apache.calcite.avatica.MetaImpl.(MetaImpl.java:72) at 
org.apache.calcite.jdbc.CalciteMetaImpl.(CalciteMetaImpl.java:85) at 
org.apache.calcite.jdbc.Driver.createMeta(Driver.java:169) at 
org.apache.calcite.avatica.AvaticaConnection.(AvaticaConnection.java:121){code}
{{}}
when running {{testCBOWithSelectedColumnsOverlapJoin}} and 
{{testCBOWithSelectedColumnsNonOverlapJoin}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25105) Support Parquet as MV storage format

2021-05-11 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-25105:
--

 Summary: Support Parquet as MV storage format
 Key: HIVE-25105
 URL: https://issues.apache.org/jira/browse/HIVE-25105
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently the support storage formats do not include Parquet:

{code}
...
HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
"ORC",
new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-11 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25104:
--

 Summary: Backward incompatible timestamp serialization in Parquet 
for certain timezones
 Key: HIVE-25104
 URL: https://issues.apache.org/jira/browse/HIVE-25104
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
performed and to some extend how timestamps are serialized and deserialized in 
files (Parquet, Avro, Orc).

In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet 
files is not backwards compatible. In other words writing timestamps with a 
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
another (not including the previous issues) may lead to different results 
depending on the default timezone of the system.

Consider the following scenario where the default system timezone is set to 
US/Pacific.

At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25103) Update row.serde excludes defaults

2021-05-11 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25103:
-

 Summary: Update row.serde excludes defaults
 Key: HIVE-25103
 URL: https://issues.apache.org/jira/browse/HIVE-25103
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


HIVE-16222 introduced row.serde.inputformat.excludes setting to disable 
row.serde for specific NON-Vectorized formats.
Since MapredParquetInputFormat is currently natively vectorized it should be 
removed from that list.

Even when hive.vectorized.use.vectorized.input.format is DISABLED
Vectorizer will not vectorize in row deserialize mode if the input format has 
is natively Vectorized so it is safe to remove.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-11 Thread Jira
László Pintér created HIVE-25102:


 Summary: Cache Iceberg table objects within same query
 Key: HIVE-25102
 URL: https://issues.apache.org/jira/browse/HIVE-25102
 Project: Hive
  Issue Type: Improvement
Reporter: László Pintér
Assignee: László Pintér


We run Catalogs.loadTable(configuration, props) plenty of times which is costly.
We should:
 - Cache it maybe even globally based on the queryId
 - Make sure that the query uses one snapshot during the whole execution of a 
single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25101) Remove HBase libraries from HBase distribution

2021-05-11 Thread Istvan Toth (Jira)
Istvan Toth created HIVE-25101:
--

 Summary: Remove HBase libraries from HBase distribution
 Key: HIVE-25101
 URL: https://issues.apache.org/jira/browse/HIVE-25101
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler, Hive
Affects Versions: 4.0.0
Reporter: Istvan Toth
Assignee: Istvan Toth


Hive currently packages HBase libraries into its lib directory.
It also adds the HBase libraries separately to its classpath in the hive 
startup script.

Having both mechanisms is redundant, and it also causes errors, as the standard 
HBase libraries packaged into Hive are unshaded, while the libraries added by 
_hbase mapredcp_
are shaded, and the two are NOT compatible when custom coprocessors are used, 
and in some cases the classpaths during local execution and for MR/TEZ jobs are 
mutually incompatible.

I propose removing all HBase libraries from the distribution, and pulling them 
via the hbase mapredcp mechanism.

This also solves the old problem of including ancient HBase alpha versions Hive.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25100) Use default values of Iceberg client pool configuration

2021-05-11 Thread Jira
László Pintér created HIVE-25100:


 Summary: Use default values of Iceberg client pool configuration
 Key: HIVE-25100
 URL: https://issues.apache.org/jira/browse/HIVE-25100
 Project: Hive
  Issue Type: Bug
Reporter: László Pintér
Assignee: László Pintér






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25099) Support for spark 3.x execution engine

2021-05-11 Thread Ajith Kumar (Jira)
Ajith Kumar created HIVE-25099:
--

 Summary: Support for spark 3.x execution engine
 Key: HIVE-25099
 URL: https://issues.apache.org/jira/browse/HIVE-25099
 Project: Hive
  Issue Type: New Feature
Reporter: Ajith Kumar


Currently hive does not support newer versions of spark(3.x+).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25098) [CVE-2020-13949] Upgrade thrift from 0.13.0 to 0.14.0 due

2021-05-10 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-25098:


 Summary: [CVE-2020-13949] Upgrade thrift from 0.13.0 to 0.14.0 due
 Key: HIVE-25098
 URL: https://issues.apache.org/jira/browse/HIVE-25098
 Project: Hive
  Issue Type: Bug
Affects Versions: All Versions
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Upgrading thrift from 0.13.0 to 0.14.0 due to 

https://nvd.nist.gov/vuln/detail/CVE-2020-13949




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25097) Make some unit tests run on tez

2021-05-10 Thread Jira
László Bodor created HIVE-25097:
---

 Summary: Make some unit tests run on tez
 Key: HIVE-25097
 URL: https://issues.apache.org/jira/browse/HIVE-25097
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25096) beeline can't get the correct hiveserver2 using the zoopkeeper with serviceDiscoveryMode=zooKeeper.

2021-05-07 Thread xiaozhongcheng (Jira)
xiaozhongcheng created HIVE-25096:
-

 Summary: beeline can't get the correct hiveserver2 using the 
zoopkeeper with serviceDiscoveryMode=zooKeeper.
 Key: HIVE-25096
 URL: https://issues.apache.org/jira/browse/HIVE-25096
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 3.1.2
 Environment: centos7.4

x86_64
Reporter: xiaozhongcheng
 Fix For: 4.0.0


beeline can't get the correct hiveserver2 using the zoopkeeper with 
serviceDiscoveryMode=zooKeeper.
 
You know, HiveServer2#startPrivilegeSynchronizer will create the namespace of 
/hiveserver2/leader in the zookeeper,
however, if you want to connect to the hiveserver2 using beeline with the 
command like 
"jdbc:hive2://vhost-120-26:2181,vhost-120-27:2181,vhost-120-28:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2",
 the beeline will randomly chose the sub-namespace under /hiveserver2.
That arises a problem that the beeline will find the hiveserver2 information in 
the namespace of /hiveserver2/leader, it's impossible to detect the hiveserver2 
connection information.
 
 
That's to say, the sub-namespace of /hiveserver2 is like this:
 
[zk: vhost-120-28:2181,vhost-120-27:2181,vhost-120-26:2181(CONNECTED) 1] ls 
/hiveserver2
[leader, serverUri=vhost-120-26:1;version=3.1.2;sequence=10, 
serverUri=vhost-120-28:1;version=3.1.2;sequence=11]
 
Codes list bellow show {color:#22}HiveServer2#startPrivilegeSynchronizer 
and beeline how to connect the hiveserver2.{color}
 
{color:#22}HiveServer2#startPrivilegeSynchronizer:{color}
 
{color:#22}  public void startPrivilegeSynchronizer(HiveConf hiveConf) 
throws Exception {{color}
{color:#22} {color}
{color:#22}    if (!HiveConf.getBoolVar(hiveConf, 
ConfVars.HIVE_PRIVILEGE_SYNCHRONIZER)) {{color}
{color:#22}      return;{color}
{color:#22}    }{color}
{color:#22}    PolicyProviderContainer policyContainer = new 
PolicyProviderContainer();{color}
{color:#22}    HiveAuthorizer authorizer = 
SessionState.get().getAuthorizerV2();{color}
{color:#22}    if (authorizer.getHivePolicyProvider() != null) {{color}
{color:#22}      policyContainer.addAuthorizer(authorizer);{color}
{color:#22}    }{color}
{color:#22}    if (MetastoreConf.getVar(hiveConf, 
MetastoreConf.ConfVars.PRE_EVENT_LISTENERS) != null &&{color}
{color:#22}        MetastoreConf.getVar(hiveConf, 
MetastoreConf.ConfVars.PRE_EVENT_LISTENERS).contains({color}
{color:#22}        
"org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener")
 &&{color}
{color:#22}        MetastoreConf.getVar(hiveConf, 
MetastoreConf.ConfVars.HIVE_AUTHORIZATION_MANAGER)!= null) {{color}
{color:#22}      List providers = 
HiveUtils.getMetaStoreAuthorizeProviderManagers({color}
{color:#22}          hiveConf, 
HiveConf.ConfVars.HIVE_METASTORE_AUTHORIZATION_MANAGER, 
SessionState.get().getAuthenticator());{color}
{color:#22}      for (HiveMetastoreAuthorizationProvider provider : 
providers) {{color}
{color:#22}        if (provider.getHivePolicyProvider() != null) {{color}
{color:#22}          
policyContainer.addAuthorizationProvider(provider);{color}
{color:#22}        }{color}
{color:#22}      }{color}
{color:#22}    }{color}
{color:#22} {color}
{color:#22}    if (policyContainer.size() > 0) {{color}
{color:#22}      setUpZooKeeperAuth(hiveConf);{color}
{color:#22}      zKClientForPrivSync = 
hiveConf.getZKConfig().startZookeeperClient(zooKeeperAclProvider, true);{color}
{color:#22}      String rootNamespace = 
hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_ZOOKEEPER_NAMESPACE);{color}
{color:#22}      String path = ZooKeeperHiveHelper.ZOOKEEPER_PATH_SEPARATOR 
+ rootNamespace{color}
{color:#22}          + ZooKeeperHiveHelper.ZOOKEEPER_PATH_SEPARATOR + 
"leader";{color}
{color:#22}      LeaderLatch privilegeSynchronizerLatch = new 
LeaderLatch(zKClientForPrivSync, path);{color}
{color:#22}      privilegeSynchronizerLatch.start();{color}
{color:#22}      LOG.info("Find " + policyContainer.size() + " policy to 
synchronize, start PrivilegeSynchronizer");{color}
{color:#22}      Thread privilegeSynchronizerThread = new Thread({color}
{color:#22}          new PrivilegeSynchronizer(privilegeSynchronizerLatch, 
policyContainer, hiveConf), "PrivilegeSynchronizer");{color}
{color:#22}      privilegeSynchronizerThread.setDaemon(true);{color}
{color:#22}      privilegeSynchronizerThread.start();{color}
{color:#22}    } else {{color}
{color:#22}      LOG.warn({color}
{color:#22}          "No policy provider found, skip creating 
PrivilegeSynchronizer");{color}
{color:#22}    }{color}
{color:#22}  }{color}
 
 
{color:#22}ZooKeeperHiveClientHelper#configureConnParams{color}
 
{color:#22}  static 

[jira] [Created] (HIVE-25095) Beeline/hive -e command can't deal with query with trailing quote

2021-05-06 Thread Robbie Zhang (Jira)
Robbie Zhang created HIVE-25095:
---

 Summary: Beeline/hive -e command can't deal with query with 
trailing quote
 Key: HIVE-25095
 URL: https://issues.apache.org/jira/browse/HIVE-25095
 Project: Hive
  Issue Type: Bug
Reporter: Robbie Zhang


The command 
{code:java}
hive -e 'select "hive"'{code}
and
{code:java}
beeline -e 'select "hive"'{code}
fail with such error:
{code:java}
Error: Error while compiling statement: FAILED: ParseException line 1:12 
character '' not supported here (state=42000,code=4){code}
The reason is that org.apache.commons.cli.Util.stripLeadingAndTrailingQuotes in 
commons-cli-1.2.jar strips the trailing quote so the query string is changed to
{code:java}
select "hive{code}
This bug is fixed in commons-cli-1.3.1 and commons-cli-1.4.jar. The workaround 
is to overwrite commons-cli-1.2.jar with commons-cli-1.3.1 or 
commons-cli-1.4.jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25094) when easily insert row with hive, got ERROR exec.StatsTask: Failed to run stats task

2021-05-06 Thread Lei Su (Jira)
Lei Su created HIVE-25094:
-

 Summary: when easily insert row with hive, got ERROR 
exec.StatsTask: Failed to run stats task
 Key: HIVE-25094
 URL: https://issues.apache.org/jira/browse/HIVE-25094
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.2
 Environment: OS: CentOS 7 based on VMWare 16

MEM:4G

DISK:40G

CPU: hostcpu: interl i5-10gen

HADOOP 3.1.3, one namenode, two datanode

HIVE 3.1.2 with metadata changed to mysql
Reporter: Lei Su


Dear profeesors,

As described, after installing hadoop 3.1.3 and hive 3.1.2, I easily insert one 
row into a simple table with SQL

Hive> insert into test2 values ('aaa') ;

I got FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.StatsTask

Then I check into the detailed log, found it seems something wrong with the 
hive stats task on coding level, part of the detailed logs as follows for your 
reference

would you please kindly take a look and advise ?

thanks in advance and kind regards

Lei

"

2021-05-06T14:30:32,609 INFO [38c1f98d-9d5a-4bb6-b052-e63eda036c08 main] 
metastore.RetryingMetaStoreClient: RetryingMetaStoreClient trying reconnect as 
root (auth:SIMPLE)
2021-05-06T14:30:32,609 INFO [38c1f98d-9d5a-4bb6-b052-e63eda036c08 main] 
metastore.HiveMetaStoreClient: Closed a connection to metastore, current 
connections: 1
2021-05-06T14:30:32,610 INFO [38c1f98d-9d5a-4bb6-b052-e63eda036c08 main] 
metastore.HiveMetaStoreClient: Trying to connect to metastore with URI 
thrift://master:9083
2021-05-06T14:30:32,610 INFO [38c1f98d-9d5a-4bb6-b052-e63eda036c08 main] 
metastore.HiveMetaStoreClient: Opened a connection to metastore, current 
connections: 2
2021-05-06T14:30:32,611 INFO [38c1f98d-9d5a-4bb6-b052-e63eda036c08 main] 
metastore.HiveMetaStoreClient: Connected to metastore.
2021-05-06T14:30:32,657 ERROR [38c1f98d-9d5a-4bb6-b052-e63eda036c08 main] 
exec.StatsTask: Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException
 at 
org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4423)
 ~[hive-exec-3.1.2.jar:3.1.2]
 at 
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:179)
 ~[hive-exec-3.1.2.jar:3.1.2]
 at 
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:83)
 ~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:108) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) 
~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
~[hive-cli-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
~[hive-cli-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
~[hive-cli-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) 
~[hive-cli-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) 
~[hive-cli-3.1.2.jar:3.1.2]
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) 
~[hive-cli-3.1.2.jar:3.1.2]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_212]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_212]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
 at org.apache.hadoop.util.RunJar.run(RunJar.java:318) 
~[hadoop-common-3.1.3.jar:?]
 at org.apache.hadoop.util.RunJar.main(RunJar.java:232) 
~[hadoop-common-3.1.3.jar:?]
Caused by: org.apache.thrift.transport.TTransportException
 at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 ~[hive-exec-3.1.2.jar:3.1.2]
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
~[hive-exec-3.1.2.jar:3.1.2]
 at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) 

Sqoop integration with Hive 3.x

2021-05-06 Thread Ankur Khanna
Hi Hive Community,

I’m trying to upgrade the hive dependency in sqoop project, and have a related 
query.
Sqoop depends on hive-exec (shaded jar, and not hive-exec-core which is a thin 
jar). I notice that internally, hive-standalone-metastore is being used in 
sqoop (this is after I’ve built sqoop project with hive-3.1.2).

I wish to know the uber-level differences between hive-metastore and 
hive-standalone-metastore, and that is it expected for sqoop to use 
hive-standalone-metastore as opposed to hive-metastore?

Best,
Ankur Khanna


[jira] [Created] (HIVE-25093) date_format() UDF is returning values in UTC time zone only

2021-05-05 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-25093:


 Summary: date_format() UDF is returning values in UTC time zone 
only 
 Key: HIVE-25093
 URL: https://issues.apache.org/jira/browse/HIVE-25093
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 3.1.2
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Query - select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z")

Result - 2021-05-04 13:00:33.358 UTC

https://issues.apache.org/jira/browse/HIVE-12192 
As part of above jira it was decided to have a common time zone for all 
computation i.e. "UTC". Due to which data_format() function was hard coded to 
"UTC".

https://issues.apache.org/jira/browse/HIVE-21039
But later it was decided that user session time zone value should be the 
default not UTC. 

date_format() was not fixed as part of HIVE-21039.

Dropping this mail to understand what should be the ideal time zone value of 
date_format().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25092) Add a shell script to fetch the statistics of replication data copy taks

2021-05-04 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25092:
---

 Summary: Add a shell script to fetch the statistics of replication 
data copy taks
 Key: HIVE-25092
 URL: https://issues.apache.org/jira/browse/HIVE-25092
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add a shell script which can fetch the statistics of the Mapred(Distcp) jobs 
launched as part of replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2021-05-04 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25091:


 Summary: Implement connector provider for MSSQL and Oracle
 Key: HIVE-25091
 URL: https://issues.apache.org/jira/browse/HIVE-25091
 Project: Hive
  Issue Type: Sub-task
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Provide an implementation of Connector provider for MSSQL and Oracle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25090) Join condition parsing error in subquery

2021-05-04 Thread Soumyakanti Das (Jira)
Soumyakanti Das created HIVE-25090:
--

 Summary: Join condition parsing error in subquery
 Key: HIVE-25090
 URL: https://issues.apache.org/jira/browse/HIVE-25090
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Soumyakanti Das
Assignee: Soumyakanti Das


 

The following query fails
{code:java}
select *
from alltypesagg t1
where t1.id not in
(select tt1.id
 from alltypesagg tt1 LEFT JOIN alltypestiny tt2
 on t1.int_col = tt2.int_col){code}
Stack trace:
{code:java}
 org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
Line 8:8 Invalid table alias or column reference 't1': (possible column names 
are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
tt2.bool_col) 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
Line 8:8 Invalid table alias or column reference 't1': (possible column names 
are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
tt2.bool_col) at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3886)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3899)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3927)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5489)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:2018)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1964)
 at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at 
org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1725)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:565)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12486)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:458)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at 
org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) 
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
 at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 

Default time zone value for date_format() UDF

2021-05-04 Thread Ashish Sharma
Hi all,

Query - select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z")

Result - 2021-05-04 13:00:33.358 UTC

https://issues.apache.org/jira/browse/HIVE-12192
As part of above jira it was decided to have a common time zone for all
computation i.e. "UTC". Due to which data_format() function was hard coded
to "UTC".

https://issues.apache.org/jira/browse/HIVE-21039
But later it was decided that user session time zone value should be the
default not UTC.

date_format() was not fixed as part of HIVE-21039.

Dropping this mail to understand what should be the ideal time zone value
of date_format().

Thanks
Ashish Sharma


[jira] [Created] (HIVE-25089) Move Materialized View rebuild code to AlterMaterializedViewRebuildAnalyzer

2021-05-03 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25089:
-

 Summary: Move Materialized View rebuild code to 
AlterMaterializedViewRebuildAnalyzer
 Key: HIVE-25089
 URL: https://issues.apache.org/jira/browse/HIVE-25089
 Project: Hive
  Issue Type: Task
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25088) Hide iceberg depencies behind a profile in itests/qtest module

2021-05-03 Thread Jira
László Pintér created HIVE-25088:


 Summary: Hide iceberg depencies behind a profile in itests/qtest 
module
 Key: HIVE-25088
 URL: https://issues.apache.org/jira/browse/HIVE-25088
 Project: Hive
  Issue Type: Task
Reporter: László Pintér
Assignee: László Pintér






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25087) Introduce a reusable and configurable periodic/logarithmic logger

2021-05-03 Thread Jira
László Bodor created HIVE-25087:
---

 Summary: Introduce a reusable and configurable 
periodic/logarithmic logger
 Key: HIVE-25087
 URL: https://issues.apache.org/jira/browse/HIVE-25087
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-05-03 Thread Haymant Mangla (Jira)
Haymant Mangla created HIVE-25086:
-

 Summary: Create Ranger Deny Policy for replication db in all cases 
if hive.repl.ranger.target.deny.policy is set to true.
 Key: HIVE-25086
 URL: https://issues.apache.org/jira/browse/HIVE-25086
 Project: Hive
  Issue Type: Improvement
Reporter: Haymant Mangla
Assignee: Haymant Mangla






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25085) MetaStore Clients are being shared across different sessions

2021-05-02 Thread Steve Carlin (Jira)
Steve Carlin created HIVE-25085:
---

 Summary: MetaStore Clients are being shared across different 
sessions
 Key: HIVE-25085
 URL: https://issues.apache.org/jira/browse/HIVE-25085
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Steve Carlin


The Hive object (and the underlying MetaStoreClient object) seems to be getting 
shared across different sessions.  While most operations work, there can be 
occasional glitches.  

One such noted glitch is that when session 1 ends, it closes the connection.  
If session 2 then tries an operation, the first try will fail.  Normally this 
can proceed because the RetryingMetaStoreClient will re-establish a new 
connection, but in some operations, the retrying logic will not kick in (by 
design).

It seems there was an attempt to fix this issue in HIVE-20682.  However, this 
implementation seems to be flawed.  The HiveSessionImpl object creates a Hive 
object and makes sure all thread queries belonging to the same session will run 
with the same Hive object.  The flaw is that the initial Hive Object within 
HiveSessionImpl is created in thread local storage.  The thread being run at 
that moment is not session specific.  It belongs to a thread pool that happens 
to be handling this specific session.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25084) Incorrect aggregate results on bucketed table

2021-04-30 Thread Naresh P R (Jira)
Naresh P R created HIVE-25084:
-

 Summary: Incorrect aggregate results on bucketed table
 Key: HIVE-25084
 URL: https://issues.apache.org/jira/browse/HIVE-25084
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R


Steps to repro
{code:java}
CREATE TABLE test_table(
col1 int,
col2 char(32),
col3 varchar(3))
CLUSTERED BY (col2)
 SORTED BY (
   col2 ASC,
   col3 ASC,
   col1 ASC)
 INTO 32 BUCKETS stored as orc;

set hive.query.results.cache.enabled=false;
insert into test_table values(2, "123456", "15");
insert into test_table values(1, "123456", "15");

SELECT col2, col3, max(col1) AS max_sequence FROM test_table GROUP BY col2, 
col3;
==> LocalFetch correct result <==
123456 15 2 

==> Wrong result with Tez/Llap <==
set hive.fetch.task.conversion=none;
123456 15 2 
123456 15 1 

==> Correct result with Tez/Llap disabling map aggregation <==
set hive.map.aggr=false;
123456 15 2 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25083) Extra reviewer pattern

2021-04-30 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25083:
-

 Summary: Extra reviewer pattern
 Key: HIVE-25083
 URL: https://issues.apache.org/jira/browse/HIVE-25083
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25082) Make SettableTreeReader updateTimezone a default method

2021-04-30 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25082:
-

 Summary: Make SettableTreeReader updateTimezone a default method
 Key: HIVE-25082
 URL: https://issues.apache.org/jira/browse/HIVE-25082
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Avoid useless TimestampStreamReader instance checks by making updateTimezone() 
a default method in SettableTreeReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25081) Put metrics collection behind a feature flag

2021-04-30 Thread Antal Sinkovits (Jira)
Antal Sinkovits created HIVE-25081:
--

 Summary: Put metrics collection behind a feature flag
 Key: HIVE-25081
 URL: https://issues.apache.org/jira/browse/HIVE-25081
 Project: Hive
  Issue Type: Bug
Reporter: Antal Sinkovits
Assignee: Antal Sinkovits


Most metrics we're creating are collected in AcidMetricsService, which is 
behind a feature flag. However there are some metrics that are collected 
outside of the service. These should be behind a feature flag in addition to 
hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state

2021-04-30 Thread Antal Sinkovits (Jira)
Antal Sinkovits created HIVE-25080:
--

 Summary: Create metric about oldest entry in "ready for cleaning" 
state
 Key: HIVE-25080
 URL: https://issues.apache.org/jira/browse/HIVE-25080
 Project: Hive
  Issue Type: Bug
Reporter: Antal Sinkovits
Assignee: Antal Sinkovits


When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated with 
the current time. Then the compaction state is set to "ready for cleaning". 
(... and then the Cleaner runs and the state is set to "succeeded" hopefully)

Based on this we know (roughly) how long a compaction has been in state "ready 
for cleaning".

We should create a metric similar to compaction_oldest_enqueue_age_in_sec that 
would show that the cleaner is blocked by something i.e. find the compaction in 
"ready for cleaning" that has the oldest commit time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-04-30 Thread Antal Sinkovits (Jira)
Antal Sinkovits created HIVE-25079:
--

 Summary: Create new metric about number of writes to tables with 
manually disabled compaction
 Key: HIVE-25079
 URL: https://issues.apache.org/jira/browse/HIVE-25079
 Project: Hive
  Issue Type: Bug
Reporter: Antal Sinkovits
Assignee: Antal Sinkovits


Create a new metric that measures the number of writes tables that has 
compaction turned off manually. It does not matter if the write is committed or 
aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25078) [cachedstore]

2021-04-30 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-25078:


 Summary: [cachedstore]
 Key: HIVE-25078
 URL: https://issues.apache.org/jira/browse/HIVE-25078
 Project: Hive
  Issue Type: Sub-task
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Description

Add Table id check in following while extracting (i.e. get call) cached table 
from cached store
1. Table
2. Partitions
3. Constrains 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25077) Direct SQL to fetch column privileges in refreshPrivileges may be broken in postgres

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25077:
---

 Summary: Direct SQL to fetch column privileges in 
refreshPrivileges may be broken in postgres
 Key: HIVE-25077
 URL: https://issues.apache.org/jira/browse/HIVE-25077
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


HIVE-22512 tried to fix direct-sql for col privileges.

 

However, "GRANT_OPTION" field in "TBL_COL_PRIVS" is marked as smallint in 
postgres. In code, it is retrieved as boolean.

Ref: 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1533]

 
{code:java}
boolean grantOption = 
MetastoreDirectSqlUtils.extractSqlBoolean(privLine[grantOptionIndex]);
{code}
 

[https://github.com/apache/hive/blob/048336bd0c21163920557a60c88135b1d5b42d3d/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L530]

 

MetastoreDirectSqlUtils::extractSqlBoolean should handle integers to support 
directSQL in postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits

2021-04-29 Thread Marton Bod (Jira)
Marton Bod created HIVE-25076:
-

 Summary: Get number of write tasks from jobConf for Iceberg commits
 Key: HIVE-25076
 URL: https://issues.apache.org/jira/browse/HIVE-25076
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


When writing empty data into Iceberg tables, we can end up with 0 succeeded 
task count number. With the current logic, we might then erroneously end up 
taking the number of mapper tasks in the commit logic, which would result in 
failures. We should instead save the number of succeeded task count into the 
JobConf under a specified key and retrieve it from there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25075) Hive::loadPartitionInternal establishes HMS connection for every partition for external tables

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25075:
---

 Summary: Hive::loadPartitionInternal establishes HMS connection 
for every partition for external tables
 Key: HIVE-25075
 URL: https://issues.apache.org/jira/browse/HIVE-25075
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2522

{code}
boolean needRecycle = !tbl.isTemporary()
  && 
ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), 
tbl.getTTable());
{code}

Hive.get() breaks the current connection with HMS. Due to this, for external 
table partition loads, it establishes HMS connection for partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25074) Remove Metastore flushCache usage

2021-04-29 Thread Miklos Szurap (Jira)
Miklos Szurap created HIVE-25074:


 Summary: Remove Metastore flushCache usage
 Key: HIVE-25074
 URL: https://issues.apache.org/jira/browse/HIVE-25074
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Standalone Metastore
Affects Versions: 4.0.0
Reporter: Miklos Szurap


The "flushCache" in HiveMetaStore with the ObjectStore implementation is 
currently a NOOP:
{code:java}
  public void flushCache() {
// NOP as there's no caching
  } {code}
The HBaseStore (HBaseReadWrite) had some logic in it, however it has been 
removed in HIVE-17234.

As I see the calls are going like this:

HiveMetaStoreClient.flushCache() -> CachedStore.flushCache() -> 
ObjectStore.flushCache()

There are significant amount of calls (about 10% of all calls) made from the 
client to the server - to do nothing. We could spare the call to the server 
completely, including getting a DB connection which can take 1+ seconds under 
high load scenarios slowing down Hive queries unnecessarily.

Can we:
 # Deprecate the RawStore.flushCache (if there are other implementations)
 # Deprecate the HiveMetaStoreClient.flushCache()
 # Do the NOOP on the client side in HiveMetaStoreClient.flushCache() (while it 
is not removed in a next version)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25073) Optimise HiveAlterHandler::alterPartitions

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25073:
---

 Summary: Optimise HiveAlterHandler::alterPartitions
 Key: HIVE-25073
 URL: https://issues.apache.org/jira/browse/HIVE-25073
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Rajesh Balamohan


Table details are populated again and again for each partition, which can be 
avoided.

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L5892

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L808



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25072) Optimise ObjectStore::alterPartitions

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25072:
---

 Summary: Optimise ObjectStore::alterPartitions
 Key: HIVE-25072
 URL: https://issues.apache.org/jira/browse/HIVE-25072
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Rajesh Balamohan


Avoid fetching table details for every partition in the table.

Ref:

 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5104

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4986



Following stacktrace may be relevant for apache master as well.
{noformat}

at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:255)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2113)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2152)
at 
org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4951)
at 
org.apache.hadoop.hive.metastore.ObjectStore.alterPartitions(ObjectStore.java:5057)
at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy27.alterPartitions(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:798)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:5695)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_req(HiveMetaStore.java:5647)
at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy28.alter_partitions_req(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:18557)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:18541)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting

2021-04-29 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25071:
-

 Summary: Number of reducers limited to fixed 1 when 
updating/deleting
 Key: HIVE-25071
 URL: https://issues.apache.org/jira/browse/HIVE-25071
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When updating/deleting bucketed tables an extra ReduceSink operator is created 
to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in 
these RS operators.

This can lead to performance degradation.

Prior HIVE-22538 multiple reducers was available such cases. The reason for 
limiting the number of reducers is to ensure RowId ascending order in delete 
delta files produced by the update/delete statements.

This is the plan of delete statement like:

{code}
DELETE FROM t1 WHERE a = 1;
{code}
{code}
TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
{code}

RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of 
reducers were limited to bucket number in the table or hive.exec.reducers.max. 
However RS[5] does not provide any ordering so above plan may generate unsorted 
deleted deltas which leads to corrupted data reads.

Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and 
the resulting RS kept the ordering and enabled multiple reducers. It could do 
because ReduceSinkDeduplication was prepared for ACID writes. This was removed 
by HIVE-22538 to get a more generic ReduceSinkDeduplication.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25070) SessionHiveMetaStoreClient.getValidWriteIdList should handle "Retrying" exceptions

2021-04-28 Thread Steve Carlin (Jira)
Steve Carlin created HIVE-25070:
---

 Summary: SessionHiveMetaStoreClient.getValidWriteIdList should 
handle "Retrying" exceptions
 Key: HIVE-25070
 URL: https://issues.apache.org/jira/browse/HIVE-25070
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Steve Carlin


The method SessionHiveMetaStoreClient.getValidWriteIdList() current catches all 
exceptions and rethrows a RuntimeException.  This bypasses the logic that is in 
RetryingMetaStoreClient.  

Instead, this method should rethrow whatever Exceptions the Retrying class can 
handle (and get in line with the other methods in SessionHiveMetaStoreClient)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25069) Hive Distributed Tracing

2021-04-28 Thread Matt McCline (Jira)
Matt McCline created HIVE-25069:
---

 Summary: Hive Distributed Tracing
 Key: HIVE-25069
 URL: https://issues.apache.org/jira/browse/HIVE-25069
 Project: Hive
  Issue Type: New Feature
Reporter: Matt McCline


Instrument Hive code to gather distributed traces and export trace data to a 
configurable collector.

Distributed tracing is a revolutionary tool for debugging issues.

We will use new OpenTelemetry open-source standard that our industry has 
aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
projects OpenTracing and OpenCensus.

Next step: Add design document that goes into distributed tracing in more 
detail and describes how Hive will enhanced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25068) I need Hive access please

2021-04-28 Thread Cassie Moorhead (Jira)
Cassie Moorhead created HIVE-25068:
--

 Summary: I need Hive access please
 Key: HIVE-25068
 URL: https://issues.apache.org/jira/browse/HIVE-25068
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Cassie Moorhead
 Attachments: Screen Shot 2021-04-28 at 10.41.36 AM.png

I am a TIP employee and just got an Upwork email and now can't access Hive when 
I try to sign in using SSO. I keep getting the message: Sorry, you can't access 
Hive because you are not assigned this app in Okta. If you're wondering why 
this is happening, please [contact your 
administrator|https://upworkcorp.okta.com/app/hive/exk1hrcixpqXq7XMv1d8/sso/saml?SAMLRequest=nVPBjtowEP2VyHcSEsIutYAVBbVFhTaCtFr1Uhl7UiwSO3gcNtuvrxNIy6HLgVOkmef3Zt68jJ%2FqIvdOYFBqNSGh3ydP0zGyIi%2FprLJ7tYFjBWg9B1NI28aEVEZRzVAiVawApJbT7Wy9opHfp6XRVnOdE2%2B5mJCfw4eHON6JSHBgUSSG2WCXEe97J%2BheOCBiBUuFlinrSv0o7PXjXjRKw5jGIY2G%2Frtw8IN4yYX6vVRCql%2B359idQUg%2FpWnSS75uU%2BIt3CZSMdtK760tkQZBVb5oc%2BDalL4%2BWOZzXQSsLIO9PEEA9SHcGy7r8vh8fHxen0IxChB10DhBvBkimIZurhVWBZgtmJPk8G2z%2BifgyPyGrGXu3gbx4xoy8Zri748wGK0%2BS3L2nbZumCvDb%2B%2FJuhHItBPsxMbBFWF31S%2BOYblIdC756z1X%2FaBNwezb6NAP24oUvayFUiiYzGdCGEB0luW5fpkbYBYmxJoKSNCNdskaiDZ5zlIL9V3Jm%2BuiZEZic2aoGbedt9fE89xZt4HsHqdvwjjlDbUrJ%2B7jwiWa4AJ3i6WGKSy1sZfT%2FG%2Be6bn3hh1%2Fu9d%2F5%2FQP=gUn50o3guHvFgPlA61EWMY6VoOiblF6z8nykJjPSkG3d0G6k6-QMsdCQhKbB3J0n4Ck8p5gMAdr9XeBCixKwSDPN-7dZjvA0F94g3yl9Mat5naZbhJBrbM99wm0j-hXJ9wTk6GJ-tKRRztC2RFo9w8uJwnQCouEnsoVOCIovxKsGNHZrxY5QUVew3WRqIJi7PU9otffzXVacvCHvQ1kLbupjZahgY2zhutaqZGJdMlytSoEQM2oY0zRAkuBjajZE14nfOcCT9VDWHq8iq6D6xPWxkHQ175NucEMmgyNIKkZlcVDb_-kpA9SJSOMELgc4KnYzF-Dqwu3DSGLBid6SMg#].
If it's any consolation, we can take you to [your Okta home 
page|https://upworkcorp.okta.com/].

 

I believe my email needs access. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25067) Add more tests to Iceberg partition pruning

2021-04-28 Thread Peter Vary (Jira)
Peter Vary created HIVE-25067:
-

 Summary: Add more tests to Iceberg partition pruning
 Key: HIVE-25067
 URL: https://issues.apache.org/jira/browse/HIVE-25067
 Project: Hive
  Issue Type: Test
Reporter: Peter Vary
Assignee: Peter Vary


As we have qtest for Iceberg now, it would be good to add some partition 
pruning qtest to have better coverage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25066) Show whether a materialized view supports incremental review or not

2021-04-28 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25066:
-

 Summary: Show whether a materialized view supports incremental 
review or not
 Key: HIVE-25066
 URL: https://issues.apache.org/jira/browse/HIVE-25066
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Add information about whether a materialized view supports incremental rebuild 
or not in an additional column in
{code:java}
SHOW MATERIALIZED VIEWS
{code}
statement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25065) Implement ALTER TABLE for setting iceberg table properties

2021-04-28 Thread Jira
László Pintér created HIVE-25065:


 Summary: Implement ALTER TABLE for setting iceberg table properties
 Key: HIVE-25065
 URL: https://issues.apache.org/jira/browse/HIVE-25065
 Project: Hive
  Issue Type: Improvement
Reporter: László Pintér
Assignee: László Pintér


Currently, only the Iceberg -> HMS table property synchronization is 
implemented.

We would like to make sure that {{ALTER TABLE SET TBLPROPERTIES()}} will update 
the Iceberg table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25064) Create separate shader maven module for Iceberg libraries

2021-04-27 Thread Jira
Ádám Szita created HIVE-25064:
-

 Summary: Create separate shader maven module for Iceberg libraries
 Key: HIVE-25064
 URL: https://issues.apache.org/jira/browse/HIVE-25064
 Project: Hive
  Issue Type: Improvement
Reporter: Ádám Szita
Assignee: Ádám Szita






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25063) Enforce hive.default.nulls.last when enforce bucketing

2021-04-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25063:
-

 Summary: Enforce hive.default.nulls.last when enforce bucketing
 Key: HIVE-25063
 URL: https://issues.apache.org/jira/browse/HIVE-25063
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When creating ReduceSink operator for bucketing the sort key null sort order is 
hardcoded:
{code}
  for (int sortOrder : sortOrders) {
order.append(DirectionUtils.codeToSign(sortOrder));
nullOrder.append(sortOrder == DirectionUtils.ASCENDING_CODE ? 'a' : 
'z');
  }
{code}

It should depend on both the setting hive.default.nulls.last and the order 
direction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25062) Iceberg: Fix date partition transform insert issue

2021-04-27 Thread Peter Vary (Jira)
Peter Vary created HIVE-25062:
-

 Summary: Iceberg: Fix date partition transform insert issue
 Key: HIVE-25062
 URL: https://issues.apache.org/jira/browse/HIVE-25062
 Project: Hive
  Issue Type: Bug
Reporter: Peter Vary
Assignee: Peter Vary


{{Repro steps:}}
{code:java}
CREATE EXTERNAL TABLE iceberg_hive_part (id int, part_field date)STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
TBLPROPERTIES (
'iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"part_year","transform":"year","source-id":1,"field-id":1001}]}'
,'write.format.default'='PARQUET'){code}
{code:java}
INSERT INTO iceberg_hive_part values(1, cast('2021-04-20' as date))
{code}
 throws:
{code:java}
(Not an instance of java.lang.Integer: 2021-04-20){code}
Add unit tests covering partition transform reads/writes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25061) Improve BoundaryCache

2021-04-27 Thread Jira
László Bodor created HIVE-25061:
---

 Summary: Improve BoundaryCache
 Key: HIVE-25061
 URL: https://issues.apache.org/jira/browse/HIVE-25061
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25060) Hive Compactor doesn´t launch cleaner

2021-04-27 Thread Fran Gonzalez (Jira)
Fran Gonzalez created HIVE-25060:


 Summary: Hive Compactor doesn´t launch cleaner
 Key: HIVE-25060
 URL: https://issues.apache.org/jira/browse/HIVE-25060
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.1
 Environment: Hive 3.1.0

Hadoop 3.1.1
Reporter: Fran Gonzalez


Hello,

there are problems with Hive Compactor. We can see in hivemetastore.log this 
message "Max block location exceeded for split" and it´s appearing more and 
more times.
After that, the "compactor.Cleaner" is not launched.

We observed that after a Hive Metastore restart, the "compactor.Cleaner" has 
not been launched nevermore, but logs doesn´t display any message about it.

Could be a degradation of the Hive Compactor when delta files are growing in 
the partitions?

Regards.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25059) Alter event is converted to rename during replication

2021-04-27 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25059:
---

 Summary: Alter event is converted to rename during replication
 Key: HIVE-25059
 URL: https://issues.apache.org/jira/browse/HIVE-25059
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


In case the database/table name have different cases, while creating an alter 
event it considers change of name and creates a RENAME event rather than ALTER



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25058) PTF: TimestampValueBoundaryScanner can be optimised during range computation pt2

2021-04-27 Thread Jira
László Bodor created HIVE-25058:
---

 Summary: PTF: TimestampValueBoundaryScanner can be optimised 
during range computation pt2
 Key: HIVE-25058
 URL: https://issues.apache.org/jira/browse/HIVE-25058
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread Jira
László Pintér created HIVE-25057:


 Summary: Implement rollback for hive to iceberg migration
 Key: HIVE-25057
 URL: https://issues.apache.org/jira/browse/HIVE-25057
 Project: Hive
  Issue Type: Improvement
Reporter: László Pintér
Assignee: László Pintér


This is a follow-up Jira of HIVE-25008.

In case of an error during hive to iceberg migration, the original hive table 
must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25056) cast ('000-00-00 00:00:00' as timestamp/datetime) results in wrong conversion

2021-04-26 Thread Anurag Shekhar (Jira)
Anurag Shekhar created HIVE-25056:
-

 Summary: cast ('000-00-00 00:00:00' as timestamp/datetime) results 
in wrong conversion 
 Key: HIVE-25056
 URL: https://issues.apache.org/jira/browse/HIVE-25056
 Project: Hive
  Issue Type: Bug
Reporter: Anurag Shekhar
Assignee: Anurag Shekhar


select cast ('-00-00' as date) , cast ('000-00-00 00:00:00' as timestamp) 

+--+---+
|_c0|_c1|

+--+---+
|0002-11-30|0002-11-30 00:00:00.0|

+--+---+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25055) Improve the exception handling in HMSHandler

2021-04-23 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-25055:
--

 Summary: Improve the exception handling in HMSHandler
 Key: HIVE-25055
 URL: https://issues.apache.org/jira/browse/HIVE-25055
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Zhihua Deng
Assignee: Zhihua Deng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25054) Upgrade jodd-core due to CVE-2018-21234

2021-04-23 Thread Abhay (Jira)
Abhay created HIVE-25054:


 Summary: Upgrade jodd-core due to CVE-2018-21234
 Key: HIVE-25054
 URL: https://issues.apache.org/jira/browse/HIVE-25054
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 3.1.2
Reporter: Abhay
Assignee: Abhay


Hive makes use of 3.5.2 version of the `jodd-core` library which is susceptible 
to CVE-2018-21234. Below is a description of that vulnerability.
CVE-2018-21234  suppress

Jodd before 5.0.4 performs Deserialization of Untrusted JSON Data when 
setClassMetadataName is set.
CWE-502 Deserialization of Untrusted Data

CVSSv2:
Base Score: HIGH (7.5)
Vector: /AV:N/AC:L/Au:N/C:P/I:P/A:P
CVSSv3:
Base Score: CRITICAL (9.8)
Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

References:
MISC - 
https://github.com/oblac/jodd/commit/9bffc3913aeb8472c11bb543243004b4b4376f16MISC
 - https://github.com/oblac/jodd/compare/v5.0.3...v5.0.4MISC - 
https://github.com/oblac/jodd/issues/628Vulnerable Software & Versions:
cpe:2.3:a:jodd:jodd:*:*:*:*:*:*:*:* versions up to (excluding) 5.0.4
 

This library needs to be upgraded. We use a couple of classes 
`JDateTime`([https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java]
 ) and `HtmlEncoder`, which have either been deprecated and/or have been moved 
to a different package called jodd-util.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25053) Support explicit ROW value constructor in SQL statements

2021-04-23 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25053:
--

 Summary: Support explicit ROW value constructor in SQL statements
 Key: HIVE-25053
 URL: https://issues.apache.org/jira/browse/HIVE-25053
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Stamatis Zampetakis


Currently, it is possible to create ROW type values by using the implicit 
syntax with parentheses. However, when the explicit ROW constructor is used a 
{{ParseException}} is raised.

+Example+
{code:sql}
CREATE TABLE person (id int, name string, age int);

EXPLAIN CBO SELECT (id, name), (name, age) FROM person; 
EXPLAIN CBO SELECT ROW(id, name), ROW(name, age) FROM person; 
{code}

The first select statement succeeds and returns the CBO plan while the second 
fails with the exception below:

{noformat}
org.apache.hadoop.hive.ql.parse.ParseException: line 3:19 cannot recognize 
input near 'ROW' '(' 'id' in select clause
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85)
at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


CFP for ApacheCon 2021 closes in ONE WEEK

2021-04-23 Thread Rich Bowen

[You are receiving this because you're subscribed to one or more dev@
mailing lists for an Apache project, or the ApacheCon Announce list.]

Time is running out to submit your talk for ApacheCon 2021.

The Call for Presentations for ApacheCon @Home 2021, focused on Europe
and North America time zones, closes May 3rd, and is at
https://www.apachecon.com/acah2021/cfp.html

The CFP for ApacheCon Asia, focused on Asia/Pacific time zones, is at
https://apachecon.com/acasia2021/cfp.html and also closes on May 3rd.

ApacheCon is our main event, featuring content from any and all of our
projects, and is your best opportunity to get your project in front of
the largest audience of enthusiasts.

Please don't wait for the last minute. Get your talks in today!

--
Rich Bowen, VP Conferences
The Apache Software Foundation
https://apachecon.com/
@apachecon


[jira] [Created] (HIVE-25052) Writing to Iceberg tables can fail when inserting empty result set

2021-04-23 Thread Marton Bod (Jira)
Marton Bod created HIVE-25052:
-

 Summary: Writing to Iceberg tables can fail when inserting empty 
result set
 Key: HIVE-25052
 URL: https://issues.apache.org/jira/browse/HIVE-25052
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25051) Callers can access uninitialized MessageBuilder instance causing NPE

2021-04-23 Thread Jira
Csaba Juhász created HIVE-25051:
---

 Summary: Callers can access uninitialized MessageBuilder instance 
causing NPE
 Key: HIVE-25051
 URL: https://issues.apache.org/jira/browse/HIVE-25051
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Csaba Juhász
Assignee: Csaba Juhász


The creation of the singleton MessageBuilder instance is unsafe, threads can 
access the uninitialized instance.

https://github.com/apache/hive/blob/326abf9685de39cf4f1b3222d84fe9cbc465710a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/messaging/MessageBuilder.java#L154



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25050) Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing backward incompatible change

2021-04-23 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-25050:
-

 Summary: Disable 'hive.metastore.acid.truncate.usebase' config as 
it's introducing backward incompatible change
 Key: HIVE-25050
 URL: https://issues.apache.org/jira/browse/HIVE-25050
 Project: Hive
  Issue Type: Bug
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25049) LlapDaemon preemption should not be triggered for same Vertex tasks

2021-04-22 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25049:
-

 Summary: LlapDaemon preemption should not be triggered for same 
Vertex tasks
 Key: HIVE-25049
 URL: https://issues.apache.org/jira/browse/HIVE-25049
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Due to the asynchronous nature of QueryInfo$FinishableState notification, we 
usually end up receiving finishable state updates across tasks/queryfragments 
with some time difference.

Imagine vertex Map8 with dependency on Map7.
If Map8 vertex is already running with some tasks still pending, we can end-up 
in a situation where on Map7 completion, some of the pending Map8 tasks are 
getting the finishable state update BEFORE the already running Map8 tasks, 
ending up preempting tasks for no reason!


{code:java}
2021-04-22T15:30:45.124Z source:Map 7 updated, notifying: [Map 8] 

2021-04-22T15:30:45.125Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="IPC Server handler 3 on 25000"] vertex: Map 8 Received 
finishable state update for attempt_1619105382691_0001_1_05_14_0, state=true
2021-04-22T15:30:45.125Z query-executor-0 
class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server 
handler 3 on 25000"] Now notifying: Map 8
2021-04-22T15:30:45.125Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Attempting to execute 
TaskWrapper{task=attempt_1619105382691_0001_1_05_14_0, Vertex=Map 8, 
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
canFinish=true, canFinish(in


 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Task 
TaskWrapper{task=attempt_1619105382691_0001_1_05_14_0, Vertex=Map 8, 
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
firstAttemptStartTime=1619105437749, dagStartTime=1619105422608, 
withinDagPriority=74, vertexParallelism= 232, selfAndUpstreamParallelism= 256, 
selfAndUpstreamComplete= 17} managed to preempt task 
TaskWrapper{task=attempt_1619105382691_0001_1_05_06_0, Vertex=Map 8, 
inWaitQueue=false, inPreemptionQueue=true, registeredForNotifications=true, 
canFinish=true, canFinish(in queue)=false, isGuaranteed=false, 
firstAttemptStartTime=1619105437737, dagStartTime=1619105422608, 
withinDagPriority=74, vertexParallelism= 232, selfAndUpstreamParallelism= 256, 
selfAndUpstreamComplete= 15}

 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Invoking kill task for 
attempt_1619105382691_0001_1_05_06_0 due to pre-emption to run 
attempt_1619105382691_0001_1_05_14_0
 2021-04-22T15:30:45.126Z query-executor-0 
class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server 
handler 3 on 25000"] Now notifying: Map 8
 2021-04-22T15:30:45.126Z query-executor-0 class="impl.TaskExecutorService" 
level="INFO" thread="IPC Server handler 3 on 25000"] vertex: Map 8 Received 
finishable state update for attempt_1619105382691_0001_1_05_11_0, state=true
 2021-04-22T15:30:45.127Z query-executor-0 class="impl.TaskRunnerCallable" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Kill task requested for 
id=attempt_1619105382691_0001_1_05_06_0, taskRunnerSetup=true
 2021-04-22T15:30:45.127Z query-executor-0 class="impl.TaskRunnerCallable" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Issuing kill to task 
attempt_1619105382691_0001_1_05_06_0
 2021-04-22T15:30:45.127Z query-executor-0 
class="impl.QueryInfo$FinishableStateTracker" level="INFO" thread="IPC Server 
handler 3 on 25000"] Now notifying: Map 8
 2021-04-22T15:30:45.127Z query-executor-0 class="task.TezTaskRunner2" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Attempting to abort 
attempt_1619105382691_0001_1_05_06_0 due to an invocation of killTask
 2021-04-22T15:30:45.128Z query-executor-0 class="tez.TezProcessor" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Received abort
 2021-04-22T15:30:45.128Z query-executor-0 class="tez.TezProcessor" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Forwarding abort to 
RecordProcessor
 2021-04-22T15:30:45.128Z query-executor-0 
class="runtime.LogicalIOProcessorRuntimeTask" dagId="dag_1619105382691_0001_1" 
fragmentId="1619105382691_0001_1_05_11_0" level="INFO" 
queryId="hive_20210422153013_397b96bf-d5a6-493a-9c51-9446f64eeed4" 
thread="TezTR-382691_1_1_5_11_0"] Waiting for 1 initializers to finish
 2021-04-22T15:30:45.128Z query-executor-0 class="tez.MapRecordProcessor" 
level="INFO" thread="Wait-Queue-Scheduler-0"] Forwarding abort to mapOp: {} MAP
 2021-04-22T15:30:45.128Z query-executor-0 class="vector.VectorMapOperator" 
level="INFO" thread="Wait-Queue-Scheduler-0"] 

[jira] [Created] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-04-22 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-25048:
--

 Summary: Refine the start/end functions in HMSHandler
 Key: HIVE-25048
 URL: https://issues.apache.org/jira/browse/HIVE-25048
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Zhihua Deng
Assignee: Zhihua Deng


Some start/end functions are incomplete in the HMSHandler, the functions can 
audit the use actions, monitor the performance, and notify the listeners.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25047) Remove unused fields/methods and deprecated calls in HiveProject

2021-04-22 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25047:
--

 Summary: Remove unused fields/methods and deprecated calls in 
HiveProject
 Key: HIVE-25047
 URL: https://issues.apache.org/jira/browse/HIVE-25047
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Fix For: 4.0.0


Small refactoring of 
[HiveProject|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java]
 operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25046) Log CBO plans right after major transformations

2021-04-22 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25046:
--

 Summary: Log CBO plans right after major transformations
 Key: HIVE-25046
 URL: https://issues.apache.org/jira/browse/HIVE-25046
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Currently the results of various CBO transformations are logged (in DEBUG mode) 
at the end of the optimization 
[phase|https://github.com/apache/hive/blob/9f5bd72e908244b2fe915e8dc39f55afa94bbffa/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2106]
 and only if we are not in test mode. This has some disadvantages:
* If there is a failure (exception) in some intermediate step we will miss all 
the intermediate  plans, possibly losing track of what plan led to the problem.
* Intermediate logs are very useful for identifying plan problems while working 
on a patch; unfortunately the logs are explicitly disabled in test mode which 
means that in order to appear the respective code needs to change every time we 
need to see those logs.
* Logging at the end necessitates keeping additional local variables that make 
code harder to read.

The goal of this issue is to place DEBUG logging right after major 
transformations and independently if we are running in test mode or not to 
alleviate the shortcomings mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >