[jira] [Created] (HIVE-24325) Cardinality preserving join optimization may fail when column is a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-24325:
--

 Summary: Cardinality preserving join optimization may fail when 
column is a constant
 Key: HIVE-24325
 URL: https://issues.apache.org/jira/browse/HIVE-24325
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


More info to come.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24324) Remove deprecated API usage from Avro

2020-10-28 Thread Chao Sun (Jira)
Chao Sun created HIVE-24324:
---

 Summary: Remove deprecated API usage from Avro
 Key: HIVE-24324
 URL: https://issues.apache.org/jira/browse/HIVE-24324
 Project: Hive
  Issue Type: Improvement
  Components: Avro
Reporter: Chao Sun
Assignee: Chao Sun


{{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
removed since Avro 1.9. This replaces the API usage for this with 
{{getObjectProp}} which doesn't leak Json node from jackson. This will help 
downstream apps to depend on Hive while using higher version of Avro, and also 
help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24323) JDBC driver fails when using Kerberos due to missing dependencies

2020-10-28 Thread N Campbell (Jira)
N Campbell created HIVE-24323:
-

 Summary: JDBC driver fails when using Kerberos due to missing 
dependencies
 Key: HIVE-24323
 URL: https://issues.apache.org/jira/browse/HIVE-24323
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 3.1.0
Reporter: N Campbell


The Apache Hive web pages historically implied that only 3-JAR files are 
required

 hadoop-auth
 hadoop-common
 hive-jdbc

If a connection is attempted using Kerberos authentication, it will fail due to 
several missing dependencies

 hadoop-auth-3.1.1.3.1.5.0-152.jar
 hadoop-common-3.1.1.3.1.5.0-152.jar
 hive-jdbc-3.1.0.3.1.5.0-152-standalone.jar

It is unclear if the intent of the standalone JAR is to include these 
dependencies or not. But does not seem to be any documentation either way. 

It also appears that dependencies are not being shaded, which can result in 
conflicts with guava or wstx jar files in the class path. Such as noted by 
ORACLE {color:#00}Doc ID 2650046.1{color}

 commons-collections-3.2.2.jar
 commons-configuration2.jar
 commons-lang-2.6.jar
 guava-29.0-jre.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.25.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-10-28 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24322:


 Summary: In case of direct insert, the attempt ID has to be 
checked when reading the manifest files
 Key: HIVE-24322
 URL: https://issues.apache.org/jira/browse/HIVE-24322
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0


In [IMPALA-10247|https://issues.apache.org/jira/browse/IMPALA-10247] there was 
an exception from Hive when tyring to load the data:
{noformat}
2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
exec.Task: Job Commit failed with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
 at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
 at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
 at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
 at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at 
org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
 ... 29 more
{noformat}

The reason of the exception was that Hive was trying to read an empty manifest 
file. Manifest files are used in case of direct insert to determine which files 
needs to be kept and which one needs to be cleaned up. They are created by the 
tasks and they use the tast attempt Id as postfix. In this particular test what 
happened is that one of the container ran out of memory so Tez decided to kill 
it right after the manifest file got created but before the pathes got written 
into the manifest file. This was the manifest file for the task attempt 0. Then 
Tez assigned a new container to the task, so a new attemp was made with 
attemptId=1. This one was successful, and wrote the manifest file correctly. 
But Hive didn't know about this, since this out of memory issue got handled by 
Tez under the hood, so there was no exception in Hive, therefore no clean-up in 
the manifest folder. And when Hive is reading the manifest files, it just reads 
every file from the defined folder, so it tried to read the manifest files for 
attemp 0 and 1 as well.
If there are multiple manifest files with the same name but different 
attemptId, Hive should only read the one with the biggest attempt Id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24321) Implement Default getSerDeStats in AbstractSerDe

2020-10-28 Thread David Mollitor (Jira)
David Mollitor created HIVE-24321:
-

 Summary: Implement Default getSerDeStats in AbstractSerDe
 Key: HIVE-24321
 URL: https://issues.apache.org/jira/browse/HIVE-24321
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: David Mollitor
Assignee: David Mollitor


Seems like very few SerDes implement the getSerDeStats feature.  Add a default 
implementation and remove all of the superfluous overrides in the implementing 
classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24320:
---

 Summary: TestMiniLlapLocal sometimes hangs because of some derby 
issues
 Key: HIVE-24320
 URL: https://issues.apache.org/jira/browse/HIVE-24320
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


code in question is a slightly modified version of branch-3

opening ticket to make notes about the investigation

{code}
"dcce5fec-2365-4697-8a8f-04a4dfa5d9f5 main" #1 prio=5 os_prio=0 
tid=0x7fd7c000a800 nid=0x1de23 waiting on condition [0x7fd7c4b7]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xc61635f0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1981)
at 
org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown 
Source)
at 
org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown 
Source)
at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown 
Source)
at 
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown 
Source)
at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source)
at 
org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
Source)
at 
org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndexMinion(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSubKeyConstraint(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorViaIndex(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorsScan(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptors(Unknown
 Source)
- locked <0xc615c9a8> (a 
org.apache.derby.iapi.sql.dictionary.ConstraintDescriptorList)
at 
org.apache.derby.iapi.sql.dictionary.TableDescriptor.getAllRelevantConstraints(Unknown
 Source)
at 
org.apache.derby.impl.sql.compile.DMLModStatementNode.getAllRelevantConstraints(Unknown
 Source)
at 
org.apache.derby.impl.sql.compile.DMLModStatementNode.bindConstraints(Unknown 
Source)
at org.apache.derby.impl.sql.compile.DeleteNode.bindStatement(Unknown 
Source)
at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
at 
org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
 Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
- locked <0xc4bb5fd0> (a 
org.apache.derby.impl.jdbc.EmbedConnection)
at 
org.apache.derby.impl.jdbc.EmbedStatement.executeBatchElement(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeLargeBatch(Unknown 
Source)
- locked <0xc4bb5fd0> (a 
org.apache.derby.impl.jdbc.EmbedConnection)
at org.apache.derby.impl.jdbc.EmbedStatement.executeBatch(Unknown 
Source)
at 
com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
at 
com.zaxxer.hikari.pool.HikariProxyStatement.executeBatch(HikariProxyStatement.java)
at 
org.apache.hadoop.hive.metastore.txn.TxnDbUtil.executeQueriesInBatch(TxnDbUtil.java:658)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.updateCommitIdAndCleanUpMetadata(TxnHandler.java:1338)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1236)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.commit_txn(HiveMetaStore.java:8315)
at sun.reflect.GeneratedMethodAccessor252.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.m

[jira] [Created] (HIVE-24319) unnecessary characters in hive-site.xml file in version 3.1.2

2020-10-28 Thread Revanth (Jira)
Revanth created HIVE-24319:
--

 Summary: unnecessary characters in hive-site.xml file in version 
3.1.2
 Key: HIVE-24319
 URL: https://issues.apache.org/jira/browse/HIVE-24319
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Revanth
 Fix For: 3.1.2


unnecessary characters in hive-site.xml file in version 3.1.2 

 

in line 3215 description part there are some unnecessary characters which are 
blocking schematool operations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24318) When GlobalLimit is efficient, query will run twice with "Retry query with a different approach..."

2020-10-28 Thread libo (Jira)
libo created HIVE-24318:
---

 Summary: When GlobalLimit is efficient, query will run twice with 
"Retry query with a different approach..."
 Key: HIVE-24318
 URL: https://issues.apache.org/jira/browse/HIVE-24318
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.1
 Environment: Hadoop 2.6.0

Hive-2.0.1
Reporter: libo


hive.limit.optimize.enable=true

hive.limit.row.max.size=1000

hive.limit.optimize.fetch.max=1000

hive.fetch.task.conversion.threshold=256

hive.fetch.task.conversion=more

 

*sql eg:*

select db_name,concat(tb_name,'test') from (select * from test1.t3 where 
dt='0909' limit 10)t1;

(only partitioned table)

*console information:*

Retry query with a different approach...

 

*exception stack:*

org.apache.hadoop.hive.ql.CommandNeedRetryException
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2022)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:317)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:475)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:855)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:794)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:236)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24317) External Table is not replicated for Cloud store (e.g. Microsoft ADLS Gen2)

2020-10-28 Thread Nikhil Gupta (Jira)
Nikhil Gupta created HIVE-24317:
---

 Summary: External Table is not replicated for Cloud store (e.g. 
Microsoft ADLS Gen2)
 Key: HIVE-24317
 URL: https://issues.apache.org/jira/browse/HIVE-24317
 Project: Hive
  Issue Type: Bug
  Components: repl
Affects Versions: 4.0.0
Reporter: Nikhil Gupta
Assignee: Nikhil Gupta


External Table is not replicated properly because of distcp options. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2020-10-27 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created HIVE-24316:


 Summary: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
 Key: HIVE-24316
 URL: https://issues.apache.org/jira/browse/HIVE-24316
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 3.1.3
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL

2020-10-27 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24315:


 Summary: Improve validation and semantic analysis in HPL/SQL 
 Key: HIVE-24315
 URL: https://issues.apache.org/jira/browse/HIVE-24315
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar


There are some known issues that need to be fixed. For example it seems that 
arity of a function is not checked when calling it, and same is true for 
parameter types. Calling an undefined function is evaluated to null and 
sometimes it seems that incorrect syntax is silently ignored. 

In cases like this a helpful error message would be expected, thought we should 
also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-10-27 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-24314:


 Summary: compactor.Cleaner should not set state "mark cleaned" if 
it didn't remove any files
 Key: HIVE-24314
 URL: https://issues.apache.org/jira/browse/HIVE-24314
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

2020-10-27 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24313:
---

 Summary: Optimise stats collection for file sizes on cloud storage
 Key: HIVE-24313
 URL: https://issues.apache.org/jira/browse/HIVE-24313
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan


When stats information is not present (e.g external table), RelOptHiveTable 
computes basic stats at runtime.

Following is the codepath.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
{code:java}
Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
hiveTblMetadata, hiveNonPartitionCols, 
nonPartColNamesThatRqrStats, colStatsCached,
nonPartColNamesThatRqrStats, true);
 {code}
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
{code:java}
for (Partition p : partList.getNotDeniedPartns()) {
BasicStats basicStats = basicStatsFactory.build(Partish.buildFor(table, 
p));
partStats.add(basicStats);
  }
 {code}
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]

 
{code:java}
try {
ds = getFileSizeForPath(path);
  } catch (IOException e) {
ds = 0L;
  }
 {code}
 

For a table & query with large number of partitions, this takes long time to 
compute statistics and increases compilation time.  It would be good to fix it 
with "ForkJoinPool" ( 
partList.getNotDeniedPartns().parallelStream().forEach((p) )

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24312) Use column stats to remove "x is not null" filter conditions if they are redundant

2020-10-26 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24312:
---

 Summary: Use column stats to remove "x is not null" filter 
conditions if they are redundant
 Key: HIVE-24312
 URL: https://issues.apache.org/jira/browse/HIVE-24312
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


with HIVE-24241 SharedWorkOptimizer could further merge branches for some 
queries (ex: 
[query32|https://github.com/apache/hive/blob/db895f374bf63b77b683574fdf678bfac91a5ac6/ql/src/test/results/clientpositive/perf/tez/query32.q.out#L118-L163]
 )

...but a little `is not null` difference prevents it from proceeding.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24311) When we clear rows in a Rowcontainer, readBlocks should also be reset to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)
Qiang.Kang created HIVE-24311:
-

 Summary: When we clear rows in a Rowcontainer,  readBlocks should 
also be reset to prevent OOM.
 Key: HIVE-24311
 URL: https://issues.apache.org/jira/browse/HIVE-24311
 Project: Hive
  Issue Type: Bug
Affects Versions: All Versions
Reporter: Qiang.Kang
Assignee: Qiang.Kang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24310) Allow specified number of deserialize errors to be ignored

2020-10-23 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-24310:
--

 Summary: Allow specified number of deserialize errors to be ignored
 Key: HIVE-24310
 URL: https://issues.apache.org/jira/browse/HIVE-24310
 Project: Hive
  Issue Type: Improvement
  Components: Operators
Reporter: Zhihua Deng
Assignee: Zhihua Deng


Sometimes we see some corrupted records in user's raw data,  like one corrupted 
in a file which contains over thousands of records, user has to either give up 
all records or replay the whole data in order to run successfully on hive, we 
should provide a way to ignore such corrupted records. 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24309) Simplify ConvertJoinMapJoin logic

2020-10-23 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-24309:
-

 Summary: Simplify ConvertJoinMapJoin logic 
 Key: HIVE-24309
 URL: https://issues.apache.org/jira/browse/HIVE-24309
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


ConvertMapJoin logic can be further simplified:

[https://github.com/pgaref/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L92]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24308) FIX conditions used for DPHJ conversion

2020-10-23 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-24308:
-

 Summary: FIX conditions used for DPHJ conversion  
 Key: HIVE-24308
 URL: https://issues.apache.org/jira/browse/HIVE-24308
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Found a weird scenario when looking at the ConvertJoinMapJoin logic: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L1198]
 When the distinct keys cannot fit in memory AND the DPHJ ShuffleSize is lower 
than expected the code returns a MJ because of the condition above!

In general, I believe the ShuffleSize check: 
[https://github.com/apache/hive/blob/052c9da958f5cf3998091a7eb4b24192a5bb61e9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L1624]
 should be part of the shuffleJoin DPHJ conversion.

And the preferred conversion would be: MJ > DPHJ > SMB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-22 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-24306:
--

 Summary: Launch single copy task for single batch of partitions in 
repl load for managed table
 Key: HIVE-24306
 URL: https://issues.apache.org/jira/browse/HIVE-24306
 Project: Hive
  Issue Type: Task
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24305) avro decimal schema is not properly populating scale/precision if enclosed in quote

2020-10-22 Thread Naresh P R (Jira)
Naresh P R created HIVE-24305:
-

 Summary: avro decimal schema is not properly populating 
scale/precision if enclosed in quote
 Key: HIVE-24305
 URL: https://issues.apache.org/jira/browse/HIVE-24305
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R
Assignee: Naresh P R


{code:java}
CREATE TABLE test_quoted_scale_precision STORED AS AVRO TBLPROPERTIES 
('avro.schema.literal'='{"type":"record","name":"DecimalTest","namespace":"com.example.test","fields":[{"name":"Decimal24_6","type":["null",{"type":"bytes","logicalType":"decimal","precision":24,"scale":"6"}]}]}');
 
desc test_quoted_scale_precision;
// decimal24_6 decimal(24,0)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24304) Query containing UNION fails with OOM

2020-10-22 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24304:
--

 Summary: Query containing UNION fails with OOM
 Key: HIVE-24304
 URL: https://issues.apache.org/jira/browse/HIVE-24304
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24303) Upgrade spring framework to 4.3.29.RELEASE+ due to CVE-2020-5421

2020-10-22 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-24303:


 Summary: Upgrade spring framework to 4.3.29.RELEASE+ due to 
CVE-2020-5421
 Key: HIVE-24303
 URL: https://issues.apache.org/jira/browse/HIVE-24303
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Hive is pulling in 4.3.18.RELEASE which is vulnerable to CVE-2020-5421. Please 
upgrade to 4.3.29.RELEASE+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24302) Cleaner should not mark compaction queue entry as cleaned if it doesn't remove obsolete files

2020-10-22 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-24302:


 Summary: Cleaner should not mark compaction queue entry as cleaned 
if it doesn't remove obsolete files
 Key: HIVE-24302
 URL: https://issues.apache.org/jira/browse/HIVE-24302
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


Example:
 # open txn 5, leave it open (maybe it's a long-running compaction)
 # insert into table t in txns 6, 7 with writeids 1, 2
 # compactor.Worker runs on table t and compacts writeids 1, 2
 # compactor.Cleaner picks up the compaction queue entry, but doesn't delete 
any files because the min global open txnid is 5, which cannot see writeIds 1, 
2.
 # Cleaner marks the compactor queue entry as cleaned and removes the entry 
from the queue.

delta_1 and delta_2 will remain in the file system until another compaction is 
run on table t.

Step 5 should not happen, we should skip calling markCleaned() and leave it in 
the queue in "ready to clean" state. MarkCleaned() should be called only after 
txn 5 is closed and, following that, the cleaner runs successfully.

This will potentially slow down the cleaner, but on the other hand it won't 
silently "fail" i.e. not do its job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24301) thrift versions and vulnerabilities

2020-10-22 Thread openlookeng (Jira)
openlookeng created HIVE-24301:
--

 Summary: thrift versions and vulnerabilities
 Key: HIVE-24301
 URL: https://issues.apache.org/jira/browse/HIVE-24301
 Project: Hive
  Issue Type: Improvement
Reporter: openlookeng


The vulnerabilities is 
CVE-2018-1320、CVE-2016-5397、CVE-2019-3565、CVE-2018-11798、CVE-2019-3564、CVE-2019-3559、CVE-2019-3558、CVE-2019-3552、



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24300) jackson versions and vulnerabilities

2020-10-22 Thread openlookeng (Jira)
openlookeng created HIVE-24300:
--

 Summary: jackson versions and vulnerabilities
 Key: HIVE-24300
 URL: https://issues.apache.org/jira/browse/HIVE-24300
 Project: Hive
  Issue Type: Improvement
Reporter: openlookeng


The vulnerabilities is CVE-2019-12814



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24299) guava versions and vulnerabilities

2020-10-22 Thread openlookeng (Jira)
openlookeng created HIVE-24299:
--

 Summary:  guava versions and vulnerabilities
 Key: HIVE-24299
 URL: https://issues.apache.org/jira/browse/HIVE-24299
 Project: Hive
  Issue Type: Improvement
Reporter: openlookeng


The vulnerabilities is CVE-2018-10237



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24298) Hive versions and vulnerabilities

2020-10-22 Thread openlookeng (Jira)
openlookeng created HIVE-24298:
--

 Summary:  Hive versions and vulnerabilities
 Key: HIVE-24298
 URL: https://issues.apache.org/jira/browse/HIVE-24298
 Project: Hive
  Issue Type: Improvement
Reporter: openlookeng


The vulnerabilities is 
CVE-2018-10237、CVE-2018-1314、CVE-2018-11777、CVE-2019-12814、CVE-2016-5397、CVE-2018-11798、CVE-2018-1320、CVE-2019-3565、CVE-2019-3564、CVE-2019-3552、CVE-2019-3558、CVE-2019-3559、



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24297) LLAP buffer collision causes NPE

2020-10-22 Thread Jira
Ádám Szita created HIVE-24297:
-

 Summary: LLAP buffer collision causes NPE
 Key: HIVE-24297
 URL: https://issues.apache.org/jira/browse/HIVE-24297
 Project: Hive
  Issue Type: Bug
Reporter: Ádám Szita
Assignee: Ádám Szita


HIVE-23741 introduced an optimization so that CacheTags are not stored on 
buffer level, but rather on file level, as one cache tag can only relate to one 
file. With this change a buffer->filecache reference was introduced so that the 
buffer's tag can be calculated with an extra indirection i.e. 
buffer.filecache.tag.

However during buffer collision in putFileData method, we don't set the 
filecache reference of the collided (new) buffer: 
[https://github.com/apache/hive/commit/2e18a7408a8dd49beecad8d66bfe054b7dc474da#diff-d2ccd7cf3042845a0812a5e118f82db49253d82fc86449ffa408903bf434fb6dR309-R311]

Later this cases NPE when the new (instantly decRef'ed) buffer is evicted:
{code:java}
Caused by: java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)
at 
java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)
at 
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:129)
at 
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.getTagState(CacheContentsTracker.java:125)
at 
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.reportRemoved(CacheContentsTracker.java:109)
at 
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyEvicted(CacheContentsTracker.java:238)
at 
org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(LowLevelLrfuCachePolicy.java:276)
at 
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(CacheContentsTracker.java:177)
at 
org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:98)
at 
org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(LowLevelCacheMemoryManager.java:65)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:323)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(EncodedReaderImpl.java:1302)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:930)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:506)
... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24296) NDV adjusted twice causing reducer task underestimation

2020-10-22 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24296:
---

 Summary: NDV adjusted twice causing reducer task underestimation
 Key: HIVE-24296
 URL: https://issues.apache.org/jira/browse/HIVE-24296
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2550]

 

{{StatsRuleProcFactory::updateColStats}}::
{code:java}
if (ratio <= 1.0) {
  newDV = (long) Math.ceil(ratio * oldDV);
}

cs.setCountDistint(newDV);
{code}
Though RelHive* has the latest statistics, it is adjusted again 
{{StatsRuleProcFactory::updateColStats}} and it is done at linear scale.

 

Because of this, downstream vertex gets lesser number of tasks causing latency 
issues.

E.g Q10 + TPCDS @10 TB scale. Attaching a snippet of "explain analyze" which 
shows stats underestimation.

"Reducer 13" is underestimated 10x, when compared to runtime details. Projected 
NDV from RelHive* was around 65989699.

However, due to the ratio calculation in StatsRuleProcFactory, it gets 
readjusted to ((948122598/14291978461) * 65989699)) ~= 4377723.

It would be good to remove static readjustment in StatsRuleProcFactory.
{noformat}
Edges:
Map 10 <- Map 9 (BROADCAST_EDGE)
Map 12 <- Map 9 (BROADCAST_EDGE)
Map 2 <- Map 7 (BROADCAST_EDGE)
Map 8 <- Map 9 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
Reducer 11 <- Map 10 (SIMPLE_EDGE)
Reducer 13 <- Map 12 (SIMPLE_EDGE)
Reducer 3 <- Map 1 (BROADCAST_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE), Map 8 
(CUSTOM_SIMPLE_EDGE), Reducer 11 (BROADCAST_EDGE), Reducer 13 (BROADCAST_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Map 2 (CUSTOM_SIMPLE_EDGE)


Map 12
Map Operator Tree:
TableScan
  alias: catalog_sales
  filterExpr: cs_ship_customer_sk is not null (type: boolean)
  Statistics: Num rows: 14327953968/552509183 Data size: 
228959459440 Basic stats: COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: cs_ship_customer_sk is not null (type: boolean)
Statistics: Num rows: 14291978461/551122492 Data size: 
228384573968 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: cs_ship_customer_sk (type: bigint), 
cs_sold_date_sk (type: bigint)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 14291978461/551122492 Data size: 
228384573968 Basic stats: COMPLETE Column stats: COMPLETE
  Map Join Operator
condition map:
 Inner Join 0 to 1
keys:
  0 _col1 (type: bigint)
  1 _col0 (type: bigint)
outputColumnNames: _col0
input vertices:
  1 Map 9
Statistics: Num rows: 948122598/551122492 Data size: 
7297899376 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
  keys: _col0 (type: bigint)
  minReductionHashAggr: 0.99
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 126954025/61576194 Data size: 
977191880 Basic stats: COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: bigint)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 126954025/61576194 Data size: 
977191880 Basic stats: COMPLETE Column stats: COMPLETE

...
...
Reducer 13
Execution mode: vectorized, llap
Reduce Operator Tree:
  Group By Operator
keys: KEY._col0 (type: bigint)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 4377725/40166690 Data size: 33696280 
Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: true (type: boolean), _col0 (type: bigint)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 4377725/40166690 Data size: 51207180 
Basic stats: COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col1 (type: bigint)
null sort order: a

[jira] [Created] (HIVE-24295) Apply schema merge to all shared work optimizations

2020-10-22 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24295:
---

 Summary: Apply schema merge to all shared work optimizations
 Key: HIVE-24295
 URL: https://issues.apache.org/jira/browse/HIVE-24295
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-21 Thread Naresh P R (Jira)
Naresh P R created HIVE-24294:
-

 Summary: TezSessionPool sessions can throw AssertionError
 Key: HIVE-24294
 URL: https://issues.apache.org/jira/browse/HIVE-24294
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R
Assignee: Naresh P R


Whenever default TezSessionPool sessions are reopened for some reason, we are 
setting dagResources to null before close & setting it back in openWhenever 
default TezSessionPool sessions are reopened for some reason, we are setting 
dagResources to null before close & setting it back in open
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
If there is an exception in sessionState.close(), we are not restoring the 
dagResource but moving the session back to TezSessionPool.eg., exception trace 
when sessionState.close() failed
{code:java}
2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
client.TezClient (:()) - Failed to shutdown Tez Session via proxy
org.apache.tez.dag.api.SessionNotRunning: Application not running, 
applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
finalApplicationStatus=SUCCEEDED, 
trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
sessionTimeoutInterval=60 ms
Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0 
   at org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
 
at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
at 
org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}

Because of this, all new queries using this corrupted sessions are failing with 
below exception
{code:java}
Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
41774265-b7da-4d58-84a8-1bedfd597aec at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24293) Integer overflow in llap collision mask

2020-10-21 Thread Antal Sinkovits (Jira)
Antal Sinkovits created HIVE-24293:
--

 Summary: Integer overflow in llap collision mask
 Key: HIVE-24293
 URL: https://issues.apache.org/jira/browse/HIVE-24293
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Antal Sinkovits
Assignee: Antal Sinkovits






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24292:
---

 Summary: hive webUI should support keystoretype by config
 Key: HIVE-24292
 URL: https://issues.apache.org/jira/browse/HIVE-24292
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2020-10-20 Thread Peter Varga (Jira)
Peter Varga created HIVE-24291:
--

 Summary: Compaction Cleaner prematurely cleans up deltas
 Key: HIVE-24291
 URL: https://issues.apache.org/jira/browse/HIVE-24291
 Project: Hive
  Issue Type: Bug
Reporter: Peter Varga
Assignee: Peter Varga


Since HIVE-23107 the cleaner can clean up deltas that are still used by running 
queries.

Example:
 * TxnId 1-5 writes to a partition, all commits
 * Compactor starts with txnId=6
 * Long running query starts with txnId=7, it sees txnId=6 as open in its 
snapshot
 * Compaction commits
 * Cleaner runs

Previously min_history_level table would have prevented the Cleaner to delete 
the deltas1-5 until txnId=7 is open, but now they will be deleted and the long 
running query may fail if its tries to access the files.



Solution could be to not run the cleaner until any txn is open that was opened 
before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24290) Explain analyze can be slow in cloud storage

2020-10-20 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24290:
---

 Summary: Explain analyze can be slow in cloud storage
 Key: HIVE-24290
 URL: https://issues.apache.org/jira/browse/HIVE-24290
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


"explain analyze" takes a lot longer time to exit with the following path, 
specifically in cloud environments (where it could be EFS vol).  HIVE-24270 is 
a related ticket as well.

 
{noformat}
at java.io.UnixFileSystem.delete0(Native Method)
at java.io.UnixFileSystem.delete(UnixFileSystem.java:265)
at java.io.File.delete(File.java:1043)
at org.apache.hadoop.fs.FileUtil.deleteImpl(FileUtil.java:229)
at org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:270)
at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:182)
at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:153)
at 
org.apache.hadoop.fs.RawLocalFileSystem.delete(RawLocalFileSystem.java:453)
at 
org.apache.hadoop.fs.ChecksumFileSystem.delete(ChecksumFileSystem.java:685)
at 
org.apache.hadoop.hive.ql.stats.fs.FSStatsAggregator.closeConnection(FSStatsAggregator.java:115)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.aggregateStats(ExplainSemanticAnalyzer.java:261)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:156)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:600)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:546)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:540)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)

 {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24289) RetryingMetaStoreClient should not retry connecting to HMS on genuine errors

2020-10-19 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24289:
---

 Summary: RetryingMetaStoreClient should not retry connecting to 
HMS on genuine errors
 Key: HIVE-24289
 URL: https://issues.apache.org/jira/browse/HIVE-24289
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


When there is genuine error from HMS, it should not be retried in 
RetryingMetaStoreClient. 

For e.g, following query would be retried multiple times (~20+ times) in HMS 
causing huge delay in processing, even though this constraint is available in 
HMS. 

It should just throw exception to client and stop retrying in such cases.

{noformat}
alter table web_sales add constraint tpcds_bin_partitioned_orc_1_ws_s_hd 
foreign key  (ws_ship_hdemo_sk) references household_demographics (hd_demo_sk) 
disable novalidate rely;

org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: Internal error processing 
add_foreign_key
at org.apache.hadoop.hive.ql.metadata.Hive.addForeignKey(Hive.java:5914)
..
...
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
add_foreign_key
   at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
   at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_foreign_key(ThriftHiveMetastore.java:1872)
{noformat}

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L256

For e.g, if exception contains "Internal error processing ", it could stop 
retrying all over again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-19 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-24288:


 Summary: Files created by CompileProcessor have incorrect 
permissions
 Key: HIVE-24288
 URL: https://issues.apache.org/jira/browse/HIVE-24288
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


Compile processor generates some temporary files as part of processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24287) Cookie Signer class should use SHA-512 instead SHA-256 for cookie signature

2020-10-19 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-24287:


 Summary: Cookie Signer class should use SHA-512 instead SHA-256 
for cookie signature
 Key: HIVE-24287
 URL: https://issues.apache.org/jira/browse/HIVE-24287
 Project: Hive
  Issue Type: Bug
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


private static final String SHA_STRING = "SHA-256"; should use SHA-512 instead



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-19 Thread okumin (Jira)
okumin created HIVE-24286:
-

 Summary: Render date and time with progress of Hive on Tez
 Key: HIVE-24286
 URL: https://issues.apache.org/jira/browse/HIVE-24286
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: okumin
Assignee: okumin


Add date/time to each line written by RenderStrategy like MapReduce and Spark.

 
 * 
[https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
 * 
[https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]

 

This ticket would add the current time to the head of each line.

 
{code:java}
2020-10-19 13:32:41,162 Map 1: 0/1  Reducer 2: 0/1  
2020-10-19 13:32:44,231 Map 1: 0/1  Reducer 2: 0/1  
2020-10-19 13:32:46,813 Map 1: 0(+1)/1  Reducer 2: 0/1  
2020-10-19 13:32:49,878 Map 1: 0(+1)/1  Reducer 2: 0/1  
2020-10-19 13:32:51,416 Map 1: 1/1  Reducer 2: 0/1  
2020-10-19 13:32:51,936 Map 1: 1/1  Reducer 2: 0(+1)/1  
2020-10-19 13:32:52,877 Map 1: 1/1  Reducer 2: 1/1  
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24285) GlobalLimitOptimizer not working

2020-10-19 Thread lucusguo (Jira)
lucusguo created HIVE-24285:
---

 Summary: GlobalLimitOptimizer not working
 Key: HIVE-24285
 URL: https://issues.apache.org/jira/browse/HIVE-24285
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 3.1.2, 2.0.1
Reporter: lucusguo
Assignee: Jesus Camacho Rodriguez
 Attachments: image-2020-10-19-18-59-05-302.png, 
image-2020-10-19-19-00-05-123.png, image-2020-10-19-19-00-23-487.png

In our environment,we have set 
hive.limit.optimize.enable=true,hive.limit.row.max.size=1000,hive.limit.optimize.fetch.max=1000,hive.fetch.task.conversion.threshold=256,hive.fetch.task.conversion=more,hive.auto.convert.join=true

when we execute the sql like this,GlobalLimitOptimizer not working.

select db_name,concat('22',cnt) from (select * from lb1 limit 5) t1;

because in this sql there is one LimitOperator, but logs count 2

the code and logs below

 

!image-2020-10-19-19-00-23-487.png!

 

!image-2020-10-19-19-00-05-123.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24284) NPE when parsing druid logs using Hive

2020-10-18 Thread mahesh kumar behera (Jira)
mahesh kumar behera created HIVE-24284:
--

 Summary: NPE when parsing druid logs using Hive
 Key: HIVE-24284
 URL: https://issues.apache.org/jira/browse/HIVE-24284
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera


As per current Sys-logger parser, its always expecting a valid proc id. But as 
per RFC3164 and RFC5424, the proc id can be skipped.So hive should handled it 
by using NILVALUE/empty string in case the proc id is null.

 
{code:java}
Caused by: java.lang.NullPointerException: null
at java.lang.String.(String.java:566)
at 
org.apache.hadoop.hive.ql.log.syslog.SyslogParser.createEvent(SyslogParser.java:361)
at 
org.apache.hadoop.hive.ql.log.syslog.SyslogParser.readEvent(SyslogParser.java:326)
at 
org.apache.hadoop.hive.ql.log.syslog.SyslogSerDe.deserialize(SyslogSerDe.java:95)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24283) Memory leak problem of hiveserver2 when compiling in parallel

2020-10-18 Thread zhaolun7 (Jira)
zhaolun7 created HIVE-24283:
---

 Summary: Memory leak problem of hiveserver2 when compiling in 
parallel
 Key: HIVE-24283
 URL: https://issues.apache.org/jira/browse/HIVE-24283
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 2.3.7
 Environment: CentOS 7.2

openjdk 8

Hadoop 2.9.2

Hive 2.3.7
Reporter: zhaolun7
 Fix For: 2.3.7
 Attachments: image-2020-10-18-22-25-44-271.png, 
image-2020-10-18-22-26-20-436.png

I used JDBC to connect to HIVESERVER2 and got about 25,000 SQL statements as 
test data from the production environment to test parallel compilation. Then 
save the memory snapshot of hiveserver2. Then run the test again and save the 
memory snapshot. I found that the memory occupied became larger and the second 
time running was more slower.

 

This is the first time I have submitted an issue. If there is any incomplete 
description, please point it out. If the format of the issue is incorrect, 
please help me modify it, thank you

 

!image-2020-10-18-22-26-20-436.png!

!image-2020-10-18-22-25-44-271.png!

 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24282) Show columns shouldn't sort table output columns unless explicitly mentioned.

2020-10-17 Thread Naresh P R (Jira)
Naresh P R created HIVE-24282:
-

 Summary: Show columns shouldn't sort table output columns unless 
explicitly mentioned.
 Key: HIVE-24282
 URL: https://issues.apache.org/jira/browse/HIVE-24282
 Project: Hive
  Issue Type: Improvement
Reporter: Naresh P R
Assignee: Naresh P R


CREATE TABLE foo_n7(c INT, b INT, a INT);

show columns in foo_n7;

 
{code:java}
// current output
a
b 
c
// expected
c
b 
a
{code}
 

HIVE-18373 changed the original behaviour to sorted output.

Suggesting to provide an optional keyword sorted to sort the show columns output

eg., 

 
{code:java}
show sorted columns in foo_n7;
a
b 
c
show columns in foo_n7
c
b 
a
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24281) Unable to reopen a closed bug report

2020-10-16 Thread Ankur Tagra (Jira)
Ankur Tagra created HIVE-24281:
--

 Summary: Unable to reopen a closed bug report
 Key: HIVE-24281
 URL: https://issues.apache.org/jira/browse/HIVE-24281
 Project: Hive
  Issue Type: Bug
Reporter: Ankur Tagra
Assignee: Ankur Tagra


Unable to reopen a closed bug report



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24280) Fix a potential NPE

2020-10-15 Thread Xuefu Zhang (Jira)
Xuefu Zhang created HIVE-24280:
--

 Summary: Fix a potential NPE
 Key: HIVE-24280
 URL: https://issues.apache.org/jira/browse/HIVE-24280
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Affects Versions: 3.1.2
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


{code:java}
case STRING:
case CHAR:
case VARCHAR: {
  BytesColumnVector bcv = (BytesColumnVector) cols[colIndex];
  String sVal = value.toString();
  if (sVal == null) {
bcv.noNulls = false;
bcv.isNull[0] = true;
bcv.isRepeating = true;
  } else {
bcv.fill(sVal.getBytes());
  }
}
break;
{code}
The above code snippet seems assuming that sVal can be null, but didn't handle 
the case where value is null. However, if value is not null, it's unlikely that 
value.toString() returns null.

We treat partition column value for default partition of string types as null, 
not as "__HIVE_DEFAULT_PARTITION__", which Hive assumes. Thus, we actually hit 
the problem that sVal is null.

I propose a harmless fix, as shown in the attached patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24279) Hive CompactorThread fails to connect to metastore if the connectionURL was only configs in metastore-site

2020-10-15 Thread Xianyin Xin (Jira)
Xianyin Xin created HIVE-24279:
--

 Summary: Hive CompactorThread fails to connect to metastore if the 
connectionURL was only configs in metastore-site
 Key: HIVE-24279
 URL: https://issues.apache.org/jira/browse/HIVE-24279
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 3.1.2
Reporter: Xianyin Xin


I got some exception when i configured transaction,
{code}
2020-10-15T11:05:41,356 ERROR [Thread-7] compactor.Initiator: Caught an 
exception in the main loop of compactor initiator, exiting 
MetaException(message:Unable to connect to transaction database 
java.sql.SQLSyntaxErrorException: Table/View 'COMPACTION_QUEUE' does not exist.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeLargeUpdate(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown 
Source)
at 
com.zaxxer.hikari.pool.ProxyStatement.executeUpdate(ProxyStatement.java:117)
at 
com.zaxxer.hikari.pool.HikariProxyStatement.executeUpdate(HikariProxyStatement.java)
at 
org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.revokeFromLocalWorkers(CompactionTxnHandler.java:646)
at 
org.apache.hadoop.hive.ql.txn.compactor.Initiator.recoverFailedCompactions(Initiator.java:208)
at 
org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:74)
Caused by: ERROR 42X05: Table/View 'COMPACTION_QUEUE' does not exist.
{code}

After some debugging, i found the conf that passed to the {{CompactorThread}} 
was:
{code}
  public void setConf(Configuration configuration) {
// TODO MS-SPLIT for now, keep a copy of HiveConf around as we need to call 
other methods with
// it. This should be changed to Configuration once everything that this 
calls that requires
// HiveConf is moved to the standalone metastore.
conf = (configuration instanceof HiveConf) ? (HiveConf)configuration :
new HiveConf(configuration, HiveConf.class);
  }
{code}

However, {{new HiveConf(configuration, HiveConf.class)}} would not inherit all 
the configs come from {{configuration}}, actually the {{configuration}} will be 
overwrite by the default values those are not nulls:

{code}
  private void initialize(Class cls) {
hiveJar = (new JobConf(cls)).getJar();

// preserve the original configuration
origProp = getAllProperties();

// Overlay the ConfVars. Note that this ignores ConfVars with null values
addResource(getConfVarInputStream());

// Overlay hive-site.xml if it exists
if (hiveSiteURL != null) {
  addResource(hiveSiteURL);
}

// if embedded metastore is to be used as per config so far
// then this is considered like the metastore server case
String msUri = this.getVar(HiveConf.ConfVars.METASTOREURIS);
// This is hackery, but having hive-common depend on standalone-metastore 
is really bad
// because it will pull all of the metastore code into every module.  We 
need to check that
// we aren't using the standalone metastore.  If we are, we should treat it 
the same as a
// remote metastore situation.
if (msUri == null || msUri.isEmpty()) {
  msUri = this.get("metastore.thrift.uris");
}
LOG.debug("Found metastore URI of " + msUri);
if(HiveConfUtil.isEmbeddedMetaStore(msUri)){
  setLoadMetastoreConfig(true);
}

// load hivemetastore-site.xml if this is metastore and file exists
if (isLoadMetastoreConfig() && hivemetastoreSiteUrl != null) {
  addResource(hivemetastoreSiteUrl);
}
{code}

That is, {{new HiveConf(configuration, HiveConf.class)}} is merely a new 
{{hiveConf}} but the configs which have null default values will be overwrite 
by {{configuration}}. A {{hiveConf}} would not load hivemetastore-site except 
that it is an embedded metastore. If hive-site doesn't have the db connection 
info, {{CompactorThread}} would connect to the default derby which causes the 
failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24278) Implement an UDF for throwing exception in arbitrary vertex

2020-10-15 Thread Jira
László Bodor created HIVE-24278:
---

 Summary: Implement an UDF for throwing exception in arbitrary 
vertex
 Key: HIVE-24278
 URL: https://issues.apache.org/jira/browse/HIVE-24278
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24277) Temporary table with constraints is persisted in HMS

2020-10-14 Thread Adesh Kumar Rao (Jira)
Adesh Kumar Rao created HIVE-24277:
--

 Summary: Temporary table with constraints is persisted in HMS
 Key: HIVE-24277
 URL: https://issues.apache.org/jira/browse/HIVE-24277
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Adesh Kumar Rao
Assignee: Adesh Kumar Rao
 Fix For: 4.0.0


Run below in a session
{noformat}
0: jdbc:hive2://zk1-nikhil.q5dzd3jj30bupgln50> create temporary table ttemp (id 
int default 0);
INFO  : Compiling 
command(queryId=hive_20201015050509_99267861-56f7-4940-ae3f-5a895dc3d2cb): 
create temporary table ttemp (id int default 0)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling 
command(queryId=hive_20201015050509_99267861-56f7-4940-ae3f-5a895dc3d2cb); Time 
taken: 0.625 seconds
INFO  : Executing 
command(queryId=hive_20201015050509_99267861-56f7-4940-ae3f-5a895dc3d2cb): 
create temporary table ttemp (id int default 0)
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20201015050509_99267861-56f7-4940-ae3f-5a895dc3d2cb); Time 
taken: 4.02 seconds
INFO  : OK
No rows affected (5.32 seconds)
{noformat}
Running "show tables" in another session will return that temporary table in 
output
{noformat}
0: jdbc:hive2://zk1-nikhil.q5dzd3jj30bupgln50> show tables
. . . . . . . . . . . . . . . . . . . . . . .> ;
INFO  : Compiling 
command(queryId=hive_20201015050554_7882c055-f084-4919-9a18-800d3fe4dcf7): show 
tables
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, 
type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20201015050554_7882c055-f084-4919-9a18-800d3fe4dcf7); Time 
taken: 0.065 seconds
INFO  : Executing 
command(queryId=hive_20201015050554_7882c055-f084-4919-9a18-800d3fe4dcf7): show 
tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20201015050554_7882c055-f084-4919-9a18-800d3fe4dcf7); Time 
taken: 0.057 seconds
INFO  : OK
+--+
| tab_name |
+--+
| ttemp|
+--+

{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24276) HiveServer2 loggerconf jsp Cross-Site Scripting (XSS) Vulnerability

2020-10-14 Thread Rajkumar Singh (Jira)
Rajkumar Singh created HIVE-24276:
-

 Summary: HiveServer2 loggerconf jsp Cross-Site Scripting (XSS) 
Vulnerability 
 Key: HIVE-24276
 URL: https://issues.apache.org/jira/browse/HIVE-24276
 Project: Hive
  Issue Type: Bug
Reporter: Rajkumar Singh
Assignee: Rajkumar Singh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24275) Introduce a configuration to delay the deletion of obsolete files by the Cleaner

2020-10-14 Thread Kishen Das (Jira)
Kishen Das created HIVE-24275:
-

 Summary: Introduce a configuration to delay the deletion of 
obsolete files by the Cleaner
 Key: HIVE-24275
 URL: https://issues.apache.org/jira/browse/HIVE-24275
 Project: Hive
  Issue Type: New Feature
Reporter: Kishen Das


Whenever compaction happens, the cleaner immediately deletes older obsolete 
files. In certain cases it would be beneficial to retain these for certain 
period. For example : if you are serving the file metadata from cache and don't 
want to invalidate the cache during compaction because of performance reasons. 

For this purpose we should introduce a configuration 
hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
cleaning up obsolete files. There should be a separate configuration 
CLEANER_RETENTION_TIME to specific the duration till which we should retain 
these older obsolete files. 

It might be beneficial to have one more configuration to decide whether to 
retain files involved in an aborted transaction 
hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24274) Implement Query Text based MaterializedView rewrite

2020-10-14 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-24274:
-

 Summary: Implement Query Text based MaterializedView rewrite
 Key: HIVE-24274
 URL: https://issues.apache.org/jira/browse/HIVE-24274
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Besides the way queries are currently rewritten to use materialized views in 
Hive this project provides an alternative:
Compare the query text with the materialized views query text stored. If we 
found a match the original query's logical plan can be replaced by a scan on 
the materialized view.
- Only materialized views which are enabled to rewrite can participate
- Use existing *HiveMaterializedViewsRegistry* through *Hive* object by adding 
a lookup method by query text.
- There might be more than one materialized views which have the same query 
text. In this case chose the first valid one.
- Validation can be done by calling 
*Hive.validateMaterializedViewsFromRegistry()*
- The scope of this first patch is rewriting queries which entire text can be 
matched only.
- Use the expanded query text (fully qualified column and table names) for 
comparing




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24273) grouping key is case sensitive

2020-10-14 Thread zhaolong (Jira)
zhaolong created HIVE-24273:
---

 Summary: grouping  key is case sensitive
 Key: HIVE-24273
 URL: https://issues.apache.org/jira/browse/HIVE-24273
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0, 4.0.0
Reporter: zhaolong
Assignee: zhaolong


grouping key is case sensitive,  the follow step can reproduce

1.create table testaa(name string, age int);

2.select GROUPING(name) from testaa group by name;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24272) LLAP: local directories should be cleaned up on ContainerRunnerImpl.queryFailed

2020-10-14 Thread Jira
László Bodor created HIVE-24272:
---

 Summary: LLAP: local directories should be cleaned up on 
ContainerRunnerImpl.queryFailed
 Key: HIVE-24272
 URL: https://issues.apache.org/jira/browse/HIVE-24272
 Project: Hive
  Issue Type: Bug
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24271) Create managed table relies on hive.create.as.acid settings.

2020-10-13 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-24271:


 Summary: Create managed table relies on hive.create.as.acid 
settings.
 Key: HIVE-24271
 URL: https://issues.apache.org/jira/browse/HIVE-24271
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.acid;
++
|set |
++
| hive.create.as.acid=false  |
++
1 row selected (0.018 seconds)
0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.insert.only;
+---+
|set|
+---+
| hive.create.as.insert.only=false  |
+---+
1 row selected (0.013 seconds)
0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> create managed table mgd_table(a 
int);
INFO  : Compiling 
command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
create managed table mgd_table(a int)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling 
command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); Time 
taken: 0.021 seconds
INFO  : Executing 
command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
create managed table mgd_table(a int)
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); Time 
taken: 0.048 seconds
INFO  : OK
No rows affected (0.107 seconds)
0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> describe formatted mgd_table;
INFO  : Compiling 
command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
describe formatted mgd_table
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, 
type:string, comment:from deserializer), FieldSchema(name:data_type, 
type:string, comment:from deserializer), FieldSchema(name:comment, type:string, 
comment:from deserializer)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); Time 
taken: 0.037 seconds
INFO  : Executing 
command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
describe formatted mgd_table
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); Time 
taken: 0.03 seconds
INFO  : OK
+---+++
|   col_name| data_type 
 |  comment   |
+---+++
| a | int   
 ||
|   | NULL  
 | NULL   |
| # Detailed Table Information  | NULL  
 | NULL   |
| Database: | bothfalseonhs2
 | NULL   |
| OwnerType:| USER  
 | NULL   |
| Owner:| hive  
 | NULL   |
| CreateTime:   | Wed Oct 14 05:35:26 UTC 2020  
 | NULL   |
| LastAccessTime:   | UNKNOWN   
 | NULL   |
| Retention:| 0 
 | NULL   |
| Location: | 
hdfs://ngangam-3.ngangam.root.hwx.site:8020/warehouse/tablespace/external/hive/bothfalseonhs2.db/mgd_table
 | NULL   |
| Table Type:   | EXTERNAL_TABLE
 | NULL   |
| Table Parameters: | NULL  

[jira] [Created] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-24270:
---

 Summary: Move scratchdir cleanup to background
 Key: HIVE-24270
 URL: https://issues.apache.org/jira/browse/HIVE-24270
 Project: Hive
  Issue Type: Improvement
Reporter: Mustafa Iman
Assignee: Mustafa Iman


In cloud environment, scratchdir cleaning at the end of the query may take long 
time. This causes client to hang up to 1 minute even after the results were 
streamed back. During this time client just waits for cleanup to finish. 
Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-10-13 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24269:
---

 Summary: In SharedWorkOptimizer run simplification after merging 
TS filter expressions
 Key: HIVE-24269
 URL: https://issues.apache.org/jira/browse/HIVE-24269
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24268) Investigate srcpart scans in dynamic_partition_pruning test

2020-10-13 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24268:
---

 Summary: Investigate srcpart scans in dynamic_partition_pruning 
test
 Key: HIVE-24268
 URL: https://issues.apache.org/jira/browse/HIVE-24268
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


there seems to be some opportunities missed by shared work optimizer

see srcpart scans around 
[here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24267) RetryingClientTimeBased should perform first invocation immediately

2020-10-12 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-24267:
---

 Summary: RetryingClientTimeBased should perform first invocation 
immediately
 Key: HIVE-24267
 URL: https://issues.apache.org/jira/browse/HIVE-24267
 Project: Hive
  Issue Type: Bug
Reporter: Pravin Sinha
Assignee: Pravin Sinha






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-12 Thread Jira
Ádám Szita created HIVE-24266:
-

 Summary: Committed rows in hflush'd ACID files may be missing from 
query result
 Key: HIVE-24266
 URL: https://issues.apache.org/jira/browse/HIVE-24266
 Project: Hive
  Issue Type: Bug
Reporter: Ádám Szita
Assignee: Ádám Szita


in HDFS environment if a writer is using hflush to write ORC ACID files during 
a transaction commit, the results might be seen as missing when reading the 
table before this file is completely persisted to disk (thus synced)

This is due to hflush not persisting the new buffers to disk, it rather just 
ensures that new readers can see the new content. This causes the block 
information to be incomplete, on which BISplitStrategy relies on. Although the 
side file (_flush_length) tracks the proper end of the file that is being 
written, this information is neglected in the favour of block information, and 
we may end up generating a very short split instead of the larger, available 
length.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24264) Fix failed-to-read errors in precommit runs

2020-10-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24264:
---

 Summary: Fix failed-to-read errors in precommit runs
 Key: HIVE-24264
 URL: https://issues.apache.org/jira/browse/HIVE-24264
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


the following happens:
* this seems to be caused by tests outputting a lot of messages
* some error happens in surefire - and the system-err is discarded
* junit xml becomes corrupted
* jenkins does report the failure - but doesn't take it into account in build 
result setting; so the result will remain green





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24262) Optimise NullScanTaskDispatcher for cloud storage

2020-10-12 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24262:
---

 Summary: Optimise NullScanTaskDispatcher for cloud storage
 Key: HIVE-24262
 URL: https://issues.apache.org/jira/browse/HIVE-24262
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


{noformat}
select count(DISTINCT ss_sold_date_sk) from store_sales;

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
Reducer 2 .. container SUCCEEDED  1  100
   0   0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 5.55 s
--
INFO  : Status: DAG finished successfully in 5.44 seconds
INFO  :
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATIONDURATION
INFO  : 
--
INFO  : Compile Query 102.02s
INFO  : Prepare Plan0.51s
INFO  : Get Query Coordinator (AM)  0.01s
INFO  : Submit Plan 0.33s
INFO  : Start DAG   0.56s
INFO  : Run DAG 5.44s
INFO  : 
--

{noformat}

Reason for this is that, it ends up doing "isEmptyPath" check for every 
partition path and takes lot of time in compilation phase.


If the parent directory of all paths belong to the same path, we could just do 
a recursive listing just once (instead of listing each directory one at a time 
sequentially) in cloud storage systems.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L158

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L121

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java#L101


With a temp hacky fix, it comes down to 2 seconds from 100+ seconds.

{noformat}
INFO  : Dag name: select count(DISTINCT ss_sold_...store_sales (Stage-1)
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1602500203747_0003)

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
Reducer 2 .. container SUCCEEDED  1  100
   0   0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 1.23 s
--
INFO  : Status: DAG finished successfully in 1.20 seconds
INFO  :
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATIONDURATION
INFO  : 
--
INFO  : Compile Query   0.85s
INFO  : Prepare Plan0.17s
INFO  : Get Query Coordinator (AM)  0.00s
INFO  : Submit Plan 0.03s
INFO  : Start DAG   0.03s
INFO  : Run DAG 1.20s
INFO  : 
--
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24263) Create an HMS endpoint to list partition locations

2020-10-12 Thread Szehon Ho (Jira)
Szehon Ho created HIVE-24263:


 Summary: Create an HMS endpoint to list partition locations
 Key: HIVE-24263
 URL: https://issues.apache.org/jira/browse/HIVE-24263
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Szehon Ho


In our company, we have a use-case to get quickly a list of partition 
locations.  Currently it is done via listPartitions, which is a very heavy 
operation in terms of memory and performance.

For example, we have an integration from output of a Hive pipeline to Spark 
jobs that consume directly from HDFS.  It needs to know the partition paths 
that are available for consumation, and does repeated listPartitions() for this.

As there is already an internal method in the ObjectStore for this done for 
dropPartitions, it is only a matter of exposing this API to HiveMetaStoreClient.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24265) Fix acid_stats2 test

2020-10-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24265:
---

 Summary: Fix acid_stats2 test
 Key: HIVE-24265
 URL: https://issues.apache.org/jira/browse/HIVE-24265
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


This test's failure started to create incorrect junit xmls which was not 
counted correctly by jenkins.

I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24261) Create Drop_all_table_constraint api in standalone hive metastore

2020-10-12 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-24261:


 Summary: Create Drop_all_table_constraint api in standalone hive 
metastore  
 Key: HIVE-24261
 URL: https://issues.apache.org/jira/browse/HIVE-24261
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Currently in order to drop all constraint multiple call are need to instead can 
we have a single api which will drop all constraint of the given table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24260) create api create_table_with_constraints with SQLAllTableConstraint

2020-10-12 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-24260:


 Summary: create api create_table_with_constraints with 
SQLAllTableConstraint
 Key: HIVE-24260
 URL: https://issues.apache.org/jira/browse/HIVE-24260
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Create api create api create_table_with_constraints with 
SQLAllTableConstraint.class in standalone hivemetastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24259) [CachedStore] Optimise getAlltableConstraint from 6 cache call to 1 cache call

2020-10-12 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-24259:


 Summary: [CachedStore] Optimise getAlltableConstraint from 6 cache 
call to 1 cache call
 Key: HIVE-24259
 URL: https://issues.apache.org/jira/browse/HIVE-24259
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Description -
currently inorder to get all constraint form the cachedstore. 6 different call 
is made to the store. Instead combine that 6 call in 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24258) Make standalone metastore cachedStore case-insenstive

2020-10-12 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-24258:


 Summary: Make standalone metastore cachedStore case-insenstive
 Key: HIVE-24258
 URL: https://issues.apache.org/jira/browse/HIVE-24258
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Description

Objects like table name, db name, column name etc are case incentives as per 
HIVE contract but standalone metastore cachedstore is case sensitive.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24257) Wrong check constraint naming in Hive metastore

2020-10-10 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-24257:


 Summary: Wrong check constraint naming in Hive metastore
 Key: HIVE-24257
 URL: https://issues.apache.org/jira/browse/HIVE-24257
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Current 

struct SQLCheckConstraint {
  1: string catName, // catalog name
  2: string table_db,// table schema
  3: string table_name,  // table name
  4: string column_name, // column name
  5: string check_expression,// check expression
  6: string dc_name, // default name
  7: bool enable_cstr,   // Enable/Disable
  8: bool validate_cstr, // Validate/No validate
  9: bool rely_cstr  // Rely/No Rely
}


Naming for CheckConstraint is wrong it should be cc_name instead of dc_name




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24256) REPL LOAD fails because of unquoted column name

2020-10-10 Thread Viacheslav Avramenko (Jira)
Viacheslav Avramenko created HIVE-24256:
---

 Summary: REPL LOAD fails because of unquoted column name
 Key: HIVE-24256
 URL: https://issues.apache.org/jira/browse/HIVE-24256
 Project: Hive
  Issue Type: Bug
Reporter: Viacheslav Avramenko
Assignee: Naresh P R


 
{code:java}
CREATE EXTERNAL TABLE dbs(db_id bigint, db_location_uri string, name string, 
owner_name string, owner_type string) STORED BY 
'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES 
('hive.sql.database.type'='METASTORE', 'hive.sql.query'='SELECT `DB_ID`, 
`DB_LOCATION_URI`, `NAME`, `OWNER_NAME`, `OWNER_TYPE` FROM `DBS`'); 
{code}
==> Wrong Result <==
{code:java}
set hive.limit.optimize.enable=true;
select * from dbs limit 1;
--
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--
Map 1 .. container SUCCEEDED 0 0 0 0 0 0
--
VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 0.91 s
--
++--+---+-+-+
| dbs.db_id | dbs.db_location_uri | dbs.name | dbs.owner_name | dbs.owner_type |
++--+---+-+-+
++--+---+-+-+
{code}
==> Correct Result <==
{code:java}
set hive.limit.optimize.enable=false;
select * from dbs limit 1;
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED  1  100
   0   0
--
VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 4.11 s
--+++---+-+-+
| dbs.db_id  |dbs.db_location_uri | dbs.name  | 
dbs.owner_name  | dbs.owner_type  |
+++---+-+-+
| 1  | hdfs://abcd:8020/warehouse/tablespace/managed/hive | default   | 
public  | ROLE|
+++---+-+-+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24255) StorageHandler with select-limit query is returning 0 rows

2020-10-09 Thread Naresh P R (Jira)
Naresh P R created HIVE-24255:
-

 Summary: StorageHandler with select-limit query is returning 0 rows
 Key: HIVE-24255
 URL: https://issues.apache.org/jira/browse/HIVE-24255
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R
Assignee: Naresh P R


 
{code:java}
CREATE EXTERNAL TABLE test_table(db_id bigint, db_location_uri string, name 
string, owner_name string, owner_type string)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES ('hive.sql.database.type'='METASTORE', 'hive.sql.query'='SELECT 
`DB_ID`, `DB_LOCATION_URI`, `NAME`, `OWNER_NAME`, `OWNER_TYPE` FROM `DBS`');
==> Wrong Result <==
set hive.limit.optimize.enable=true;
select * from test_table limit 1;
--
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--
Map 1 .. container SUCCEEDED 0 0 0 0 0 0
--
VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 0.91 s
--
++--+---+-+-+
| dbs.db_id | dbs.db_location_uri | dbs.name | dbs.owner_name | dbs.owner_type |
++--+---+-+-+
++--+---+-+-+
==> Correct Result <==
set hive.limit.optimize.enable=false;
select * from test_table limit 1;
--
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--
Map 1 .. container SUCCEEDED 1 1 0 0 0 0
--
VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 4.11 s
--
+++---+-+-+
| dbs.db_id | dbs.db_location_uri | dbs.name | dbs.owner_name | dbs.owner_type |
+++---+-+-+
| 1 | hdfs://abcd:8020/warehouse/tablespace/managed/hive | default | public | 
ROLE |
{code}

+++---+-+-+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24254) Remove setOwner call in ReplChangeManager

2020-10-09 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-24254:
--

 Summary: Remove setOwner call in ReplChangeManager
 Key: HIVE-24254
 URL: https://issues.apache.org/jira/browse/HIVE-24254
 Project: Hive
  Issue Type: Task
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24253) HMS needs to support keystore/truststores types besides JKS

2020-10-09 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24253:
---

 Summary: HMS needs to support keystore/truststores types besides 
JKS
 Key: HIVE-24253
 URL: https://issues.apache.org/jira/browse/HIVE-24253
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
the default keystore type specified for the JDK and not always use JKS. Same as 
HIVE-23958 for hive, HMS should support to set additional keystore/truststore 
types used for different applications like for FIPS crypto algorithms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24252) Improve decision model for using semijoin reducers

2020-10-09 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-24252:
--

 Summary: Improve decision model for using semijoin reducers
 Key: HIVE-24252
 URL: https://issues.apache.org/jira/browse/HIVE-24252
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


After a few experiments with TPC-DS 10TB dataset, we observed that in some 
cases semijoin reducers were not effective; they didn't reduce the number of 
records or they reduced the relation only a tiny bit. 

In some cases we can make the semijoin reducer more effective by adding more 
columns but this requires also a bigger bloom filter so the decision for the 
number of columns to include in the bloom becomes more delicate.

The current decision model always chooses multi-column semijoin reducers if 
they are available but this may not always beneficial if the a single column 
can reduce significantly the target relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24251) Improve bloom filter size estimation for multi column semijoin reducers

2020-10-09 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-24251:
--

 Summary: Improve bloom filter size estimation for multi column 
semijoin reducers
 Key: HIVE-24251
 URL: https://issues.apache.org/jira/browse/HIVE-24251
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


There are various cases where the expected size of the bloom filter is largely 
underestimated  making the semijoin reducer completely ineffective. This more 
relevant for multi-column semi join reducers since the current 
[code|https://github.com/apache/hive/blob/d61c9160ffa5afbd729887c3db690eccd7ef8238/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBloomFilter.java#L273]
 does not take them into account.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24250) CREATE DATABASE with MANAGEDLOCATION set requires super user priv on location

2020-10-09 Thread Viacheslav Avramenko (Jira)
Viacheslav Avramenko created HIVE-24250:
---

 Summary: CREATE DATABASE with MANAGEDLOCATION set requires super 
user priv on location
 Key: HIVE-24250
 URL: https://issues.apache.org/jira/browse/HIVE-24250
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Viacheslav Avramenko


At  
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1485]

Folder for Database is created as superuser, instead of impersonated one.

This leads to issues when impersonation is on (doAs =true) and superuser have 
no access to location specified in MANAGEDLOCATION of CREATE DATABASE.

If I am, as a user, specify a location where to create my database, I should 
have write access there.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24249) Create View fails if a materialized view exists with the same query

2020-10-09 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-24249:
-

 Summary: Create View fails if a materialized view exists with the 
same query
 Key: HIVE-24249
 URL: https://issues.apache.org/jira/browse/HIVE-24249
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
create table t1(col0 int) STORED AS ORC
  TBLPROPERTIES ('transactional'='true');

create materialized view mv1 as
select * from t1 where col0 > 2;

create view mv1 as
select sub.* from (select * from t1 where col0 > 2) sub
where sub.col0 = 10;
{code}
The planner realize that the view definition has a subquery which match the 
materialized view query and replaces it to the materialized view scan.
{code:java}
HiveProject($f0=[CAST(10):INTEGER])
  HiveFilter(condition=[=(10, $0)])
HiveTableScan(table=[[default, mv1]], table:alias=[default.mv1])
{code}
Then exception is thrown:
{code:java}
 org.apache.hadoop.hive.ql.parse.SemanticException: View definition references 
materialized view default.mv1
at 
org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
at 
org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:174)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:415)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:364)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:358)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunne

[jira] [Created] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-09 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-24248:
--

 Summary: TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
 Key: HIVE-24248
 URL: https://issues.apache.org/jira/browse/HIVE-24248
 Project: Hive
  Issue Type: Bug
Reporter: Zhihua Deng


[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
{code:java}
java.lang.AssertionError:
Client Execution succeeded but contained differences (error code = 1) after 
executing subquery_join_rewrite.q
241,244d240
< 1 1
< 1 2
< 2 1
< 2 2
245a242,243
> 2 2
{code}
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-09 Thread Adesh Kumar Rao (Jira)
Adesh Kumar Rao created HIVE-24247:
--

 Summary: StorageBasedAuthorizationProvider does not look into 
Hadoop ACL while check for access
 Key: HIVE-24247
 URL: https://issues.apache.org/jira/browse/HIVE-24247
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Adesh Kumar Rao
Assignee: Adesh Kumar Rao
 Fix For: 4.0.0


StorageBasedAuthorizationProvider uses
{noformat}
FileSystem.access(Path, Action)
{noformat}
method to check the access.

This method gets the FileStatus object and checks access based on that. ACL's 
are not present in FileStatus.

 

Instead, Hive should use
{noformat}
FileSystem.get(path.toUri(), conf);
{noformat}
{noformat}
.access(Path, Action)
{noformat}
where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24246) Fix for Ranger Deny policy overriding policy with same resource name

2020-10-09 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-24246:
--

 Summary: Fix for Ranger Deny policy overriding policy with same 
resource name 
 Key: HIVE-24246
 URL: https://issues.apache.org/jira/browse/HIVE-24246
 Project: Hive
  Issue Type: Task
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-10-08 Thread Chiran Ravani (Jira)
Chiran Ravani created HIVE-24245:


 Summary: Vectorized PTF with count and distinct over partition 
producing incorrect results.
 Key: HIVE-24245
 URL: https://issues.apache.org/jira/browse/HIVE-24245
 Project: Hive
  Issue Type: Bug
  Components: Hive, PTF-Windowing, Vectorization
Affects Versions: 3.1.2, 3.1.0
Reporter: Chiran Ravani


Vectorized PTF for count and distinct over partition is broken. It produces 
incorrect results.
Below is the test case.

{code}
CREATE TABLE bigd781b_new (
  id int,
  txt1 string,
  txt2 string,
  cda_date int,
  cda_job_name varchar(12));

INSERT INTO bigd781b_new VALUES 
  (1,'2010005759','7164335675012038',20200528,'load1'),
  (2,'2010005759','7164335675012038',20200528,'load2');
{code}

Running below query produces incorrect results

{code}
SELECT
txt1,
txt2,
count(distinct txt1) over(partition by txt1) as n,
count(distinct txt2) over(partition by txt2) as m
FROM bigd781b_new
WHERE cda_date = 20200528 and ( txt2 = '7164335675012038');
{code}

as below.

{code}
+-+---+++
|txt1 |   txt2| n  | m  |
+-+---+++
| 2010005759  | 7164335675012038  | 2  | 2  |
| 2010005759  | 7164335675012038  | 2  | 2  |
+-+---+++
{code}

While the correct output would be

{code}
+-+---+++
|txt1 |   txt2| n  | m  |
+-+---+++
| 2010005759  | 7164335675012038  | 1  | 1  |
| 2010005759  | 7164335675012038  | 1  | 1  |
+-+---+++
{code}


The problem does not appear after setting below property
set hive.vectorized.execution.ptf.enabled=false;




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-24244:
---

 Summary: NPE during Atlas metadata replication
 Key: HIVE-24244
 URL: https://issues.apache.org/jira/browse/HIVE-24244
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Pravin Sinha
Assignee: Pravin Sinha






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24243) Missing table alias in LEFT JOIN causing inconsistent results

2020-10-08 Thread Henrique dos Santos Goulart (Jira)
Henrique dos Santos Goulart created HIVE-24243:
--

 Summary: Missing table alias in LEFT JOIN causing inconsistent 
results
 Key: HIVE-24243
 URL: https://issues.apache.org/jira/browse/HIVE-24243
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.1
Reporter: Henrique dos Santos Goulart


Missing table alias in LEFT JOIN causing inconsistent results:
!alias.png|id=cp-img!


VS

!no_alias.png|id=cp-img!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24242:
---

 Summary: Relax safety checks in SharedWorkOptimizer
 Key: HIVE-24242
 URL: https://issues.apache.org/jira/browse/HIVE-24242
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


there are some checks to lock out problematic cases

For UnionOperator 
[here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571])

This check could prevent the optimization even if the Union is only visible 
from only 1 of the TS ops.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-08 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24241:
---

 Summary: Enable SharedWorkOptimizer to merge downstream operators 
after an optimization step
 Key: HIVE-24241
 URL: https://issues.apache.org/jira/browse/HIVE-24241
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24240) Implement missing features in UDTFStatsRule

2020-10-07 Thread okumin (Jira)
okumin created HIVE-24240:
-

 Summary: Implement missing features in UDTFStatsRule
 Key: HIVE-24240
 URL: https://issues.apache.org/jira/browse/HIVE-24240
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: okumin
Assignee: okumin


Add the following steps.
 * Handle the case in which the num row will be zero
 * Compute runtime stats in case of a re-execution



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24239) Reduce side chain of Map join produce wrong result

2020-10-07 Thread Manoj Kumar (Jira)
Manoj Kumar created HIVE-24239:
--

 Summary: Reduce side chain of Map join produce wrong result 
 Key: HIVE-24239
 URL: https://issues.apache.org/jira/browse/HIVE-24239
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 3.1.0
Reporter: Manoj Kumar
 Attachments: 1602055956.svg

Getting the wrong result on multiple queries.

On further debug it seems queries follow a common pattern on reduce side chain 
of mapjoin.

[Observation]
If we have a chain of mapjoin on the reduce side, each mapjoin gets side tables 
as buckted.
Ideally in a chain of map joins we should get side table as broadcast except 
for the first side table.

Attaching pla of TPCH Q5 for reference.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24238) ClassCastException in order-by query over avro table with uniontype column

2020-10-06 Thread Gabriel C Balan (Jira)
Gabriel C Balan created HIVE-24238:
--

 Summary: ClassCastException in order-by query over avro table with 
uniontype column
 Key: HIVE-24238
 URL: https://issues.apache.org/jira/browse/HIVE-24238
 Project: Hive
  Issue Type: Bug
  Components: Avro
Affects Versions: 3.1.2, 3.1.0
Reporter: Gabriel C Balan


{noformat:title=Reproducer}
create table avro_reproducer (key int, union_col uniontype ) 
stored as avro location '/tmp/avro_reproducer';
INSERT INTO TABLE avro_reproducer values (0, create_union(0, 123, 'not me')),  
(1, create_union(1, -1, 'me, me, me!'));

--these queries are ok:
select count(*) from avro_reproducer;  
select * from avro_reproducer;  
--these queries are not ok
select * from avro_reproducer order by union_col; 
select * from avro_reproducer sort by key; 
select * from avro_reproducer order by 'does not have to be a column, really'; 
{noformat}

I have verified this reproducer on CDH703, HDP301.
It seems the issue is restricted to AVRO; this reproducer does not trigger 
failures against textfile tables, orc tables, and parquet tables.

{noformat:title=Error message in CLI}
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: java.lang.RuntimeException: Error processing row: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:155)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1315)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:970)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:142)
... 14 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion
at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:619)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:351

[jira] [Created] (HIVE-24237) Multi level/dimensional bucketing in Hive

2020-10-06 Thread Pushpender Garg (Jira)
Pushpender Garg created HIVE-24237:
--

 Summary: Multi level/dimensional bucketing in Hive
 Key: HIVE-24237
 URL: https://issues.apache.org/jira/browse/HIVE-24237
 Project: Hive
  Issue Type: New Feature
  Components: Database/Schema
Affects Versions: 3.1.2, 3.1.1
Reporter: Pushpender Garg


Hive can considerably optimize the execution of certain queries like filter, 
aggregations, joins, if bucketed columns are used in query for these 
operations. Buckets can be created on multiple columns as well where hash 
function is computed after merging all bucket columns. 

The problem is that if buckets are created on multiple columns but query is on 
subset of those columns then hive doesn't optimize that query. Unless all 
bucket columns are used as predicate, bucketing will not be utilized. Solution 
proposed in this document is to solve this problem such that even if subset of 
bucket columns are used still hive will be able to optimize that query.

Instead of storing data in single dimensional buckets it can be stored in 
multi-dimensional buckets when multiple columns are given. If subset of 
bucketed columns is used as predicates in query then based on hash value of 
individual columns, appropriate buckets can be identified and only those 
buckets will be scanned. This will enable optimizations even when single column 
or few columns are used in querying  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24236) Connection leak in TxnHandler

2020-10-06 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24236:
---

 Summary: Connection leak in TxnHandler
 Key: HIVE-24236
 URL: https://issues.apache.org/jira/browse/HIVE-24236
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


We see failures in QE tests with cannot allocate connections errors. The 
exception stack like following:
{noformat}
2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler 
(TxnHandler.java:checkRetryable(3733)) - Non-retryable error in 
heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, 
general error (SQLState=null, ErrorCode=0)
2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to 
select from transaction database org.apache.commons.dbcp.SQLNestedException: 
Cannot get a connection, general error
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)
at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1112)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 29 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2747)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
{noformat}

and
{noformat}
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1134)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 53 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:3375)
at 
org.apache.hadoop.hive.metastore.AcidEventListener.onDropTable(AcidEventListener.java:65)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$19.notify(MetaStoreListenerNotifier.java:103)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent

[jira] [Created] (HIVE-24235) Drop and recreate table during MR compaction leaves behind base/delta directory

2020-10-06 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-24235:


 Summary: Drop and recreate table during MR compaction leaves 
behind base/delta directory
 Key: HIVE-24235
 URL: https://issues.apache.org/jira/browse/HIVE-24235
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


If a table is dropped and recreated during MR compaction, the table directory 
and a base (or delta, if minor compaction) directory could be created, with or 
without data, while the table "does not exist".

E.g.
{code:java}
create table c (i int) stored as orc tblproperties 
("NO_AUTO_COMPACTION"="true", "transactional"="true");
insert into c values (9);
insert into c values (9);
alter table c compact 'major';

While compaction job is running: {
drop table c;
create table c (i int) stored as orc tblproperties 
("NO_AUTO_COMPACTION"="true", "transactional"="true");
}
{code}
The table directory should be empty, but table directory could look like this 
after the job is finished:
{code:java}
Oct  6 14:23 c/base_002_v101/._orc_acid_version.crc
Oct  6 14:23 c/base_002_v101/.bucket_0.crc
Oct  6 14:23 c/base_002_v101/_orc_acid_version
Oct  6 14:23 c/base_002_v101/bucket_0
{code}
or perhaps just: 
{code:java}
Oct  6 14:23 c/base_002_v101/._orc_acid_version.crc
Oct  6 14:23 c/base_002_v101/_orc_acid_version
{code}
Insert another row and you have:
{code:java}
Oct  6 14:33 base_002_v101/
Oct  6 14:33 base_002_v101/._orc_acid_version.crc
Oct  6 14:33 base_002_v101/.bucket_0.crc
Oct  6 14:33 base_002_v101/_orc_acid_version
Oct  6 14:33 base_002_v101/bucket_0
Oct  6 14:35 delta_001_001_/._orc_acid_version.crc
Oct  6 14:35 delta_001_001_/.bucket_0_0.crc
Oct  6 14:35 delta_001_001_/_orc_acid_version
Oct  6 14:35 delta_001_001_/bucket_0_0
{code}
Selecting from the table will result in this error because the highest valid 
writeId for this table is 1:
{code:java}
thrift.ThriftCLIService: Error fetching results: 
org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
...
Caused by: java.io.IOException: java.lang.RuntimeException: ORC split 
generation failed with exception: java.io.IOException: Not enough history 
available for (1,x).  Oldest available base: 
.../warehouse/b/base_004_v092
{code}
Solution: Resolve the table again after compaction is finished; compare the id 
with the table id from when compaction began. If the ids do not match, abort 
the compaction's transaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24234) Improve checkHashModeEfficiency in VectorGroupByOperator

2020-10-06 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24234:
---

 Summary: Improve checkHashModeEfficiency in VectorGroupByOperator
 Key: HIVE-24234
 URL: https://issues.apache.org/jira/browse/HIVE-24234
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Currently, {{VectorGroupByOperator::checkHashModeEfficiency}} compares the 
number of entries with the number input records that have been processed. For 
grouping sets, it accounts for grouping set length as well.

Issue is that, the condition becomes invalid after processing large number of 
input records. This prevents the system from switching over to streaming mode. 

e.g Assume 500,000 input records processed, with 9 grouping sets, with 100,000 
entries in hashtable. Hashtable would never cross 4,500, entries as the max 
size itself is 1M by default. 

It would be good to compare the input records (adjusted for grouping sets) with 
number of output records (along with size of hashtable size) to determine 
hashing or streaming mode.

E.g Q67.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24233) except subquery throws nullpointer with cbo disabled

2020-10-06 Thread Peter Varga (Jira)
Peter Varga created HIVE-24233:
--

 Summary: except subquery throws nullpointer with cbo disabled
 Key: HIVE-24233
 URL: https://issues.apache.org/jira/browse/HIVE-24233
 Project: Hive
  Issue Type: Bug
Reporter: Peter Varga
Assignee: Peter Varga


Except and intersect was only implemented with Calcite in HIVE-12764. If cbo is 
disabled it would just throw a nullpointer exception. We should at least throw 
a SemanticException stating this is not supported.

Repro:
set hive.cbo.enable=false;
create table test(id int);
insert into table test values(1);
select id from test except select id from test;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-24232:
--

 Summary: Incorrect translation of rollup expression from Calcite
 Key: HIVE-24232
 URL: https://issues.apache.org/jira/browse/HIVE-24232
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


In Calcite, it is not necessary that the columns in the group set are in the 
same order as the rollup. For instance, this is the Calcite representation of a 
rollup for a given query:
{code}
HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
agg#4=[sum($15)], agg#5=[count($15)])
{code}
When we generate the Hive plan from the Calcite operator, we make such 
assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24231) Enhance shared work optimizer to merge scans with semijoin filters on both sides

2020-10-05 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24231:
---

 Summary: Enhance shared work optimizer to merge scans with 
semijoin filters on both sides
 Key: HIVE-24231
 URL: https://issues.apache.org/jira/browse/HIVE-24231
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-05 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24230:


 Summary: Integrate HPL/SQL into HiveServer2
 Key: HIVE-24230
 URL: https://issues.apache.org/jira/browse/HIVE-24230
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar


HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24229) DirectSql fails in case of OracleDB

2020-10-05 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-24229:
---

 Summary: DirectSql fails in case of OracleDB
 Key: HIVE-24229
 URL: https://issues.apache.org/jira/browse/HIVE-24229
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Direct Sql fails due to different data type mapping incase of Oracle DB




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24228) Support complex types in LLAP

2020-10-05 Thread Yuriy Baltovskyy (Jira)
Yuriy Baltovskyy created HIVE-24228:
---

 Summary: Support complex types in LLAP
 Key: HIVE-24228
 URL: https://issues.apache.org/jira/browse/HIVE-24228
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Yuriy Baltovskyy
Assignee: Yuriy Baltovskyy


The idea of this improvement is to support complex types (arrays, maps, 
structs) returned from LLAP data reader. This is useful when consuming LLAP 
data later in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-04 Thread Arko Sharma (Jira)
Arko Sharma created HIVE-24227:
--

 Summary: sys.replication_metrics table shows incorrect status for 
failed policies
 Key: HIVE-24227
 URL: https://issues.apache.org/jira/browse/HIVE-24227
 Project: Hive
  Issue Type: Bug
Reporter: Arko Sharma
Assignee: Arko Sharma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24226) Avoid Copy of Bytes in Protobuf BinaryWriter

2020-10-02 Thread David Mollitor (Jira)
David Mollitor created HIVE-24226:
-

 Summary: Avoid Copy of Bytes in Protobuf BinaryWriter
 Key: HIVE-24226
 URL: https://issues.apache.org/jira/browse/HIVE-24226
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


{code:java|title=ProtoWriteSupport.java}
  class BinaryWriter extends FieldWriter {
@Override
final void writeRawValue(Object value) {
  ByteString byteString = (ByteString) value;
  Binary binary = Binary.fromConstantByteArray(byteString.toByteArray());
  recordConsumer.addBinary(binary);
}
  }
{code}

{{toByteArray()}} creates a copy of the buffer.  There is already support with 
Parquet and Protobuf to pass instead a ByteBuffer which avoids the copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-02 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-24225:
-

 Summary: FIX S3A recordReader policy selection
 Key: HIVE-24225
 URL: https://issues.apache.org/jira/browse/HIVE-24225
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Dynamic S3A recordReader policy selection can cause issues on lazy initialized 
FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >