date:20170221

[jira] [Commented] (HIVE-16005) miscellaneous small fixes to help with llap debuggability

2017-02-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877707#comment-15877707
 ] 

Prasanth Jayachandran commented on HIVE-16005:
--

for constructUniqueQueryId.. can we use the same format as that of filenames 
generated by query-routing logger? (queryId-dagId) This way easier to locate 
the corresponding log file.

Appending suffix to the thread name, is it primarily to get some context from 
jstack output? For stacktraces that gets logged will already have these info 
via NDC. 



> miscellaneous small fixes to help with llap debuggability
> -
>
> Key: HIVE-16005
> URL: https://issues.apache.org/jira/browse/HIVE-16005
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-16005.01.patch
>
>
> - Include proc_ in cli, beeline, metastore, hs2 process args
> - LLAP history logger - log QueryId instead of dagName (dag name is free 
> flowing text)
> - LLAP JXM ExecutorStatus - Log QueryId instead of dagName. Sort by running / 
> queued
> - Include thread name in TaskRunnerCallable so that it shows up in stack 
> traces (will cause extra output in logs)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-1626) stop using java.util.Stack

2017-02-21 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-1626:
-
Attachment: HIVE-1626.2.patch

Re-uploading the patch.

> stop using java.util.Stack
> --
>
> Key: HIVE-1626
> URL: https://issues.apache.org/jira/browse/HIVE-1626
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Teddy Choi
> Attachments: HIVE-1626.2.patch, HIVE-1626.2.patch
>
>
> We currently use Stack as part of the generic node walking library.  Stack 
> should not be used for this since its inheritance from Vector incurs 
> superfluous synchronization overhead.
> Most projects end up adding an ArrayStack implementation and using that 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16002) Correlated IN subquery with aggregate asserts in sq_count_check UDF

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877678#comment-15877678
 ] 

Hive QA commented on HIVE-16002:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853851/HIVE-16002.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10252 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=151)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=122)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3685/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3685/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853851 - PreCommit-HIVE-Build

> Correlated IN subquery with aggregate asserts in sq_count_check UDF
> ---
>
> Key: HIVE-16002
> URL: https://issues.apache.org/jira/browse/HIVE-16002
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16002.1.patch
>
>
> Reproducer
> {code:SQL}
> create table t(i int, j int);
> insert into t values(0,1), (0,2);
> create table tt(i int, j int);
> insert into tt values(0,3);
> select * from t where i IN (select count(i) from tt where tt.j = t.j);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16006) Incremental REPL LOAD doesn't operate on the target database if name differs from source database.

2017-02-21 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-16006:
---


> Incremental REPL LOAD doesn't operate on the target database if name differs 
> from source database.
> --
>
> Key: HIVE-16006
> URL: https://issues.apache.org/jira/browse/HIVE-16006
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>
> During "Incremental Load", it is not considering the database name input in 
> the command line. Hence load doesn't happen. At the same time, database with 
> original name is getting modified.
> Steps:
> 1. REPL DUMP default FROM 52;
> 2. REPL LOAD replDb FROM '/tmp/dump/1487588522621';
> – This step modifies the default Db instead of replDb.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-1626) stop using java.util.Stack

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877679#comment-15877679
 ] 

Hive QA commented on HIVE-1626:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853867/HIVE-1626.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3686/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3686/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3686/

Messages:
{noformat}
 This message was trimmed, see log for full details 
Apply anyway? [n] 
Skipping patch.
3 out of 3 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
12 out of 12 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
3 out of 3 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinHintOptimizer.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinHintOptimizer.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
6 out of 6 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java.rej
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinResolver.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinResolver.java.rej
patching file

[jira] [Updated] (HIVE-15993) Hive REPL STATUS is not returning last event ID

2017-02-21 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15993:

Affects Version/s: (was: 2.1.0)

> Hive REPL STATUS is not returning last event ID
> ---
>
> Key: HIVE-15993
> URL: https://issues.apache.org/jira/browse/HIVE-15993
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Attachments: HIVE-15993.01.patch
>
>
> While running "REPL STATUS" on target to get last event ID for DB, it returns 
> zero rows.
> 0: jdbc:hive2://localhost:10001/repl> REPL status repl;
> No rows affected (932.167 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-02-21 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877676#comment-15877676
 ] 

Lefty Leverenz commented on HIVE-15570:
---

Doc note:  The new description and behavior of 
*hive.llap.client.consistent.splits* need to be documented in the wiki for 
release 2.2.0:

* [Configuration Properties -- hive.llap.client.consistent.splits | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.llap.client.consistent.splits]

Added a TODOC2.2 label.

(By the way, the parameter description should have included newlines (\n) as 
shown for *hive.llap.validate.acls* right after it, to avoid overlong lines in 
the generated template file hive-default.xml.template.)

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch, 
> HIVE-15570.3.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15993) Hive REPL STATUS is not returning last event ID

2017-02-21 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15993:

Component/s: (was: Parser)

> Hive REPL STATUS is not returning last event ID
> ---
>
> Key: HIVE-15993
> URL: https://issues.apache.org/jira/browse/HIVE-15993
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Attachments: HIVE-15993.01.patch
>
>
> While running "REPL STATUS" on target to get last event ID for DB, it returns 
> zero rows.
> 0: jdbc:hive2://localhost:10001/repl> REPL status repl;
> No rows affected (932.167 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15993) Hive REPL STATUS is not returning last event ID

2017-02-21 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15993:

Component/s: repl
 Parser

> Hive REPL STATUS is not returning last event ID
> ---
>
> Key: HIVE-15993
> URL: https://issues.apache.org/jira/browse/HIVE-15993
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Attachments: HIVE-15993.01.patch
>
>
> While running "REPL STATUS" on target to get last event ID for DB, it returns 
> zero rows.
> 0: jdbc:hive2://localhost:10001/repl> REPL status repl;
> No rows affected (932.167 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15993) Hive REPL STATUS is not returning last event ID

2017-02-21 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15993:

Affects Version/s: 2.1.0

> Hive REPL STATUS is not returning last event ID
> ---
>
> Key: HIVE-15993
> URL: https://issues.apache.org/jira/browse/HIVE-15993
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Attachments: HIVE-15993.01.patch
>
>
> While running "REPL STATUS" on target to get last event ID for DB, it returns 
> zero rows.
> 0: jdbc:hive2://localhost:10001/repl> REPL status repl;
> No rows affected (932.167 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16004) OutOfMemory in SparkReduceRecordHandler with vectorization mode

2017-02-21 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877652#comment-15877652
 ] 

Ferdinand Xu commented on HIVE-16004:
-

LGTM, [~xuefuz], do you have any further comments?

> OutOfMemory in SparkReduceRecordHandler with vectorization mode
> ---
>
> Key: HIVE-16004
> URL: https://issues.apache.org/jira/browse/HIVE-16004
> Project: Hive
>  Issue Type: Bug
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-16004.001.patch, HIVE-16004.002.patch
>
>
> For the query 28 of TPCs-BB with 1T data, the executor memory is set as 30G. 
> Get the following exception:
> java.lang.OutOfMemoryError
>   at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:467)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:367)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:286)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) 
> I think DataOutputBuffer isn't cleared on time cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-02-21 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15570:
--
Labels: TODOC2.2  (was: )

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch, 
> HIVE-15570.3.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15989) Incorrect rounding in decimal data types

2017-02-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877629#comment-15877629
 ] 

Prasanth Jayachandran commented on HIVE-15989:
--

[~sershe] orcfiledump -d option will dump records

> Incorrect rounding in decimal data types
> 
>
> Key: HIVE-15989
> URL: https://issues.apache.org/jira/browse/HIVE-15989
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Reporter: Nikesh
>Priority: Critical
> Attachments: ANA_AUTO_E.csv
>
>
> I have a numeric field in a file in my data lake and created a hive external 
> table pointing to this field. The field value is 
> 0. but when I fetched this record 
> using the query it display only 0.. I tried using DECIMAL and 
> DOUBLE data types but nothing worked.is this a bug or am I not using the 
> exact data type for this?
> Thanks,
> NIkesh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16005) miscellaneous small fixes to help with llap debuggability

2017-02-21 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-16005:
--
Status: Patch Available  (was: Open)

> miscellaneous small fixes to help with llap debuggability
> -
>
> Key: HIVE-16005
> URL: https://issues.apache.org/jira/browse/HIVE-16005
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-16005.01.patch
>
>
> - Include proc_ in cli, beeline, metastore, hs2 process args
> - LLAP history logger - log QueryId instead of dagName (dag name is free 
> flowing text)
> - LLAP JXM ExecutorStatus - Log QueryId instead of dagName. Sort by running / 
> queued
> - Include thread name in TaskRunnerCallable so that it shows up in stack 
> traces (will cause extra output in logs)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16005) miscellaneous small fixes to help with llap debuggability

2017-02-21 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-16005:
-

Assignee: Siddharth Seth

> miscellaneous small fixes to help with llap debuggability
> -
>
> Key: HIVE-16005
> URL: https://issues.apache.org/jira/browse/HIVE-16005
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-16005.01.patch
>
>
> - Include proc_ in cli, beeline, metastore, hs2 process args
> - LLAP history logger - log QueryId instead of dagName (dag name is free 
> flowing text)
> - LLAP JXM ExecutorStatus - Log QueryId instead of dagName. Sort by running / 
> queued
> - Include thread name in TaskRunnerCallable so that it shows up in stack 
> traces (will cause extra output in logs)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16005) miscellaneous small fixes to help with llap debuggability

2017-02-21 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-16005:
--
Summary: miscellaneous small fixes to help with llap debuggability  (was: 
miscellaneous small fixes to help with debuggability)

> miscellaneous small fixes to help with llap debuggability
> -
>
> Key: HIVE-16005
> URL: https://issues.apache.org/jira/browse/HIVE-16005
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
> Attachments: HIVE-16005.01.patch
>
>
> - Include proc_ in cli, beeline, metastore, hs2 process args
> - LLAP history logger - log QueryId instead of dagName (dag name is free 
> flowing text)
> - LLAP JXM ExecutorStatus - Log QueryId instead of dagName. Sort by running / 
> queued
> - Include thread name in TaskRunnerCallable so that it shows up in stack 
> traces (will cause extra output in logs)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16005) miscellaneous small fixes to help with debuggability

2017-02-21 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-16005:
--
Attachment: HIVE-16005.01.patch

cc [~prasanth_j], [~sershe] for review.

> miscellaneous small fixes to help with debuggability
> 
>
> Key: HIVE-16005
> URL: https://issues.apache.org/jira/browse/HIVE-16005
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
> Attachments: HIVE-16005.01.patch
>
>
> - Include proc_ in cli, beeline, metastore, hs2 process args
> - LLAP history logger - log QueryId instead of dagName (dag name is free 
> flowing text)
> - LLAP JXM ExecutorStatus - Log QueryId instead of dagName. Sort by running / 
> queued
> - Include thread name in TaskRunnerCallable so that it shows up in stack 
> traces (will cause extra output in logs)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877617#comment-15877617
 ] 

Wei Zheng commented on HIVE-15999:
--

I cannot find a good answer. As we discussed, this might be a derby bug.

One thing I noticed that is different for TestDbTxnManager2 from other tests is 
that, all other tests call "TxnDbUtil.setConfValues(conf);" before calling 
"TxnDbUtil.prepDb();", but TestDbTxnManager2 only calls 
"TxnDbUtil.setConfValues(conf);" once in a @BeforeClass method. By changing the 
@BeforeClass method into constructor, it's guaranteed to be run for every UT, 
which is consistent to all other tests.

I ran the ptest several times, and didn't see such failure anymore with the fix.

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16004) OutOfMemory in SparkReduceRecordHandler with vectorization mode

2017-02-21 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16004:

Attachment: HIVE-16004.002.patch

[~Ferd], thanks for your review, the patch is updated.

> OutOfMemory in SparkReduceRecordHandler with vectorization mode
> ---
>
> Key: HIVE-16004
> URL: https://issues.apache.org/jira/browse/HIVE-16004
> Project: Hive
>  Issue Type: Bug
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-16004.001.patch, HIVE-16004.002.patch
>
>
> For the query 28 of TPCs-BB with 1T data, the executor memory is set as 30G. 
> Get the following exception:
> java.lang.OutOfMemoryError
>   at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:467)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:367)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:286)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) 
> I think DataOutputBuffer isn't cleared on time cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-02-21 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15570:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch, 
> HIVE-15570.3.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-02-21 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877601#comment-15877601
 ] 

Siddharth Seth commented on HIVE-15570:
---

Test failures are unrelated. Committing. Thanks [~aplusplus]

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch, 
> HIVE-15570.3.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16004) OutOfMemory in SparkReduceRecordHandler with vectorization mode

2017-02-21 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877594#comment-15877594
 ] 

Ferdinand Xu commented on HIVE-16004:
-

Thanks [~colin_mjj] for the patch. Can we reset this buffer instead of allocate 
a new one?
{noformat}
buffer = new DataOutputBuffer();
{noformat}

> OutOfMemory in SparkReduceRecordHandler with vectorization mode
> ---
>
> Key: HIVE-16004
> URL: https://issues.apache.org/jira/browse/HIVE-16004
> Project: Hive
>  Issue Type: Bug
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-16004.001.patch
>
>
> For the query 28 of TPCs-BB with 1T data, the executor memory is set as 30G. 
> Get the following exception:
> java.lang.OutOfMemoryError
>   at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:467)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:367)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:286)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) 
> I think DataOutputBuffer isn't cleared on time cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16004) OutOfMemory in SparkReduceRecordHandler with vectorization mode

2017-02-21 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16004:

Status: Patch Available  (was: Open)

Initial patch updated.

> OutOfMemory in SparkReduceRecordHandler with vectorization mode
> ---
>
> Key: HIVE-16004
> URL: https://issues.apache.org/jira/browse/HIVE-16004
> Project: Hive
>  Issue Type: Bug
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-16004.001.patch
>
>
> For the query 28 of TPCs-BB with 1T data, the executor memory is set as 30G. 
> Get the following exception:
> java.lang.OutOfMemoryError
>   at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:467)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:367)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:286)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) 
> I think DataOutputBuffer isn't cleared on time cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-1555) JDBC Storage Handler

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877579#comment-15877579
 ] 

Hive QA commented on HIVE-1555:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853848/HIVE-1555.6.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10278 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[jdbc_handler] 
(batchId=52)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[1]
 (batchId=173)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[2]
 (batchId=173)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteSmallint 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3684/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3684/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3684/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853848 - PreCommit-HIVE-Build

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.3.patch, HIVE-1555.4.patch, HIVE-1555.5.patch, 
> HIVE-1555.6.patch, JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16004) OutOfMemory in SparkReduceRecordHandler with vectorization mode

2017-02-21 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16004:

Attachment: HIVE-16004.001.patch

> OutOfMemory in SparkReduceRecordHandler with vectorization mode
> ---
>
> Key: HIVE-16004
> URL: https://issues.apache.org/jira/browse/HIVE-16004
> Project: Hive
>  Issue Type: Bug
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-16004.001.patch
>
>
> For the query 28 of TPCs-BB with 1T data, the executor memory is set as 30G. 
> Get the following exception:
> java.lang.OutOfMemoryError
>   at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:467)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:367)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:286)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) 
> I think DataOutputBuffer isn't cleared on time cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16004) OutOfMemory in SparkReduceRecordHandler with vectorization mode

2017-02-21 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma reassigned HIVE-16004:
---


> OutOfMemory in SparkReduceRecordHandler with vectorization mode
> ---
>
> Key: HIVE-16004
> URL: https://issues.apache.org/jira/browse/HIVE-16004
> Project: Hive
>  Issue Type: Bug
>Reporter: Colin Ma
>Assignee: Colin Ma
>
> For the query 28 of TPCs-BB with 1T data, the executor memory is set as 30G. 
> Get the following exception:
> java.lang.OutOfMemoryError
>   at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:467)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:238)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:367)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:286)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) 
> I think DataOutputBuffer isn't cleared on time cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

2017-02-21 Thread KaiXu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877560#comment-15877560
 ] 

KaiXu commented on HIVE-15859:
--

Thanks all for the efforts, I will try the patch.

> Hive client side shows Spark Driver disconnected while Spark Driver side 
> could not get RPC header 
> --
>
> Key: HIVE-15859
> URL: https://issues.apache.org/jira/browse/HIVE-15859
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>Reporter: KaiXu
>Assignee: Rui Li
> Attachments: HIVE-15859.1.patch, HIVE-15859.2.patch
>
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in 
> stage 3.0 (TID 2515)
> 17/02/08 09:51:04 INFO

[jira] [Updated] (HIVE-16003) Blobstores should use fs.listFiles(path, recursive=true) rather than FileUtils.listStatusRecursively

2017-02-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16003:

Description: 
{{FileUtils.listStatusRecursively}} can be slow on blobstores because 
{{listStatus}} calls are applied recursively to a given directory. This can be 
especially bad on tables with multiple levels of partitioning.

The {{FileSystem}} API provides an optimized API called {{listFiles(path, 
recursive)}} that can be used to invoke an optimized recursive directory 
listing.

The problem is that the {{listFiles(path, recursive)}} API doesn't provide a 
option to pass in a {{PathFilter}}, while {{FileUtils.listStatusRecursively}} 
uses a custom HIDDEN_FILES_PATH_FILTER.

To fix this we could either:

1: Modify the FileSystem API to provide a {{listFiles(path, recursive, 
PathFilter)}} method (probably the cleanest solution)
2: Add conditional logic so that blobstores invoke {{listFiles(path, 
recursive)}} and the rest of the code uses the current implementation of 
{{FileUtils.listStatusRecursively}}
3: Replace the implementation of {{FileUtils.listStatusRecursively}} with 
{{listFiles(path, recursive)}} and apply the {{PathFilter}} on the results (not 
sure what optimizations can be made if {{PathFilter}} objects are passed into 
{{FileSystem}} methods - maybe {{PathFilter}} objects are pushed to the 
NameNode?)

  was:
{{FileUtils.listStatusRecursively}} can be slow on blobstores because 
{{listStatus}} calls are applied recursively to a given directory. This can be 
especially bad on tables with multiple levels of partitioning.

The {{FileSystem}} API provides an optimized API called {{listFiles(path, 
recursive)}} that can be used to invoke an optimized recursive directory 
listing.

The problem is that the {{listFiles(path, recursive)}} API doesn't provide a 
option to pass in a {{PathFilter}}, while {{FileUtils.listStatusRecursively}} 
uses a custom HIDDEN_FILES_PATH_FILTER.

To fix this we could either:

1: Modify the FileSystem API to provide a {{listFiles(path, recursive, 
PathFilter)}} method
2: Add conditional logic so that blobstores invoke {{listFiles(path, 
recursive)}} and the rest of the code uses the current implementation of 
{{FileUtils.listStatusRecursively}}
3: Replace the implementation of {{FileUtils.listStatusRecursively}} with 
{{listFiles(path, recursive)}} and apply the {{PathFilter}} on the results


> Blobstores should use fs.listFiles(path, recursive=true) rather than 
> FileUtils.listStatusRecursively
> 
>
> Key: HIVE-16003
> URL: https://issues.apache.org/jira/browse/HIVE-16003
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> {{FileUtils.listStatusRecursively}} can be slow on blobstores because 
> {{listStatus}} calls are applied recursively to a given directory. This can 
> be especially bad on tables with multiple levels of partitioning.
> The {{FileSystem}} API provides an optimized API called {{listFiles(path, 
> recursive)}} that can be used to invoke an optimized recursive directory 
> listing.
> The problem is that the {{listFiles(path, recursive)}} API doesn't provide a 
> option to pass in a {{PathFilter}}, while {{FileUtils.listStatusRecursively}} 
> uses a custom HIDDEN_FILES_PATH_FILTER.
> To fix this we could either:
> 1: Modify the FileSystem API to provide a {{listFiles(path, recursive, 
> PathFilter)}} method (probably the cleanest solution)
> 2: Add conditional logic so that blobstores invoke {{listFiles(path, 
> recursive)}} and the rest of the code uses the current implementation of 
> {{FileUtils.listStatusRecursively}}
> 3: Replace the implementation of {{FileUtils.listStatusRecursively}} with 
> {{listFiles(path, recursive)}} and apply the {{PathFilter}} on the results 
> (not sure what optimizations can be made if {{PathFilter}} objects are passed 
> into {{FileSystem}} methods - maybe {{PathFilter}} objects are pushed to the 
> NameNode?)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16003) Blobstores should use fs.listFiles(path, recursive=true) rather than FileUtils.listStatusRecursively

2017-02-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16003:
---


> Blobstores should use fs.listFiles(path, recursive=true) rather than 
> FileUtils.listStatusRecursively
> 
>
> Key: HIVE-16003
> URL: https://issues.apache.org/jira/browse/HIVE-16003
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> {{FileUtils.listStatusRecursively}} can be slow on blobstores because 
> {{listStatus}} calls are applied recursively to a given directory. This can 
> be especially bad on tables with multiple levels of partitioning.
> The {{FileSystem}} API provides an optimized API called {{listFiles(path, 
> recursive)}} that can be used to invoke an optimized recursive directory 
> listing.
> The problem is that the {{listFiles(path, recursive)}} API doesn't provide a 
> option to pass in a {{PathFilter}}, while {{FileUtils.listStatusRecursively}} 
> uses a custom HIDDEN_FILES_PATH_FILTER.
> To fix this we could either:
> 1: Modify the FileSystem API to provide a {{listFiles(path, recursive, 
> PathFilter)}} method
> 2: Add conditional logic so that blobstores invoke {{listFiles(path, 
> recursive)}} and the rest of the code uses the current implementation of 
> {{FileUtils.listStatusRecursively}}
> 3: Replace the implementation of {{FileUtils.listStatusRecursively}} with 
> {{listFiles(path, recursive)}} and apply the {{PathFilter}} on the results



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15954) LLAP: some Tez INFO logs are too noisy

2017-02-21 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877531#comment-15877531
 ] 

Siddharth Seth commented on HIVE-15954:
---

Did this disable all logging from the classes mentioned?
That's a little too much. The annoying lines are under a separate logger, and 
just those can be disabled.

> LLAP: some Tez INFO logs are too noisy
> --
>
> Key: HIVE-15954
> URL: https://issues.apache.org/jira/browse/HIVE-15954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15954.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15880) Allow insert overwrite query to use auto.purge table property

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877528#comment-15877528
 ] 

Hive QA commented on HIVE-15880:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853833/HIVE-15880.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10252 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3683/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3683/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3683/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853833 - PreCommit-HIVE-Build

> Allow insert overwrite query to use auto.purge table property
> -
>
> Key: HIVE-15880
> URL: https://issues.apache.org/jira/browse/HIVE-15880
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-15880.01.patch
>
>
> It seems inconsistent that auto.purge property is not considered when we do a 
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:03 
> /user/hive/warehouse/temp/00_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:07 
> /user/hive/warehouse/temp/00_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 26 2017-02-09 13:08 
> /user/hive/warehouse/temp/00_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx--   - hive hive  0 2017-02-09 13:08 
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant 
> overhead on slow FileSystems like S3. This could improve the performance of 
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
> partitions on tables located on S3 should the user wish to set auto.purge 
> property to true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15989) Incorrect rounding in decimal data types

2017-02-21 Thread Nikesh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877519#comment-15877519
 ] 

Nikesh commented on HIVE-15989:
---

I do not have an ORC file right now, I have attached the csv file which 
contains the sample record.

> Incorrect rounding in decimal data types
> 
>
> Key: HIVE-15989
> URL: https://issues.apache.org/jira/browse/HIVE-15989
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Reporter: Nikesh
>Priority: Critical
> Attachments: ANA_AUTO_E.csv
>
>
> I have a numeric field in a file in my data lake and created a hive external 
> table pointing to this field. The field value is 
> 0. but when I fetched this record 
> using the query it display only 0.. I tried using DECIMAL and 
> DOUBLE data types but nothing worked.is this a bug or am I not using the 
> exact data type for this?
> Thanks,
> NIkesh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15989) Incorrect rounding in decimal data types

2017-02-21 Thread Nikesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikesh updated HIVE-15989:
--
Attachment: ANA_AUTO_E.csv

> Incorrect rounding in decimal data types
> 
>
> Key: HIVE-15989
> URL: https://issues.apache.org/jira/browse/HIVE-15989
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Reporter: Nikesh
>Priority: Critical
> Attachments: ANA_AUTO_E.csv
>
>
> I have a numeric field in a file in my data lake and created a hive external 
> table pointing to this field. The field value is 
> 0. but when I fetched this record 
> using the query it display only 0.. I tried using DECIMAL and 
> DOUBLE data types but nothing worked.is this a bug or am I not using the 
> exact data type for this?
> Thanks,
> NIkesh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15947) Enhance Templeton service job operations reliability

2017-02-21 Thread Subramanyam Pattipaka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877500#comment-15877500
 ] 

Subramanyam Pattipaka commented on HIVE-15947:
--

[~kiran.kolli], I have fixed comments provided by you. 

[~thejas], Can you please provide comments if you have any? I have provided 
unit tests for threads getting killed and interrupted. I have tried to simulate 
threads getting killed using shutdownNow(). Is there a better way to simulate 
kill thread behavior for webhcat request and verify this behavior?

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Bug
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
> Attachments: HIVE-15947.2.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15947) Enhance Templeton service job operations reliability

2017-02-21 Thread Subramanyam Pattipaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramanyam Pattipaka updated HIVE-15947:
-
Attachment: HIVE-15947.2.patch

Incorporated review comments.

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Bug
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
> Attachments: HIVE-15947.2.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877475#comment-15877475
 ] 

Hive QA commented on HIVE-15999:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853815/HIVE-15999.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10252 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3682/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3682/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853815 - PreCommit-HIVE-Build

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15847) In Progress update refreshes seem slow

2017-02-21 Thread anishek (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877464#comment-15877464
 ] 

anishek commented on HIVE-15847:


Thanks for the review [~thejas]

> In Progress update refreshes seem slow
> --
>
> Key: HIVE-15847
> URL: https://issues.apache.org/jira/browse/HIVE-15847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: current_response.mov, HIVE-15847.1.patch, 
> HIVE-15847.2.patch
>
>
> After HIVE-15473, the refresh rates for in place progress bar seems to be 
> slow on hive cli. 
> As pointed out by [~prasanth_j] 
> {quote}
> The refresh rate is slow. Following video will show it
> before patch: https://asciinema.org/a/2fgcncxg5gjavcpxt6lfb8jg9
> after patch: https://asciinema.org/a/2tht5jf6l9b2dc3ylt5gtztqg
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Status: Open  (was: Patch Available)

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch, 
> HIVE-15955.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Attachment: HIVE-15955.03.patch

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch, 
> HIVE-15955.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Status: Patch Available  (was: Open)

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch, 
> HIVE-15955.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15991) Flaky Test: TestEncryptedHDFSCliDriver encryption_join_with_different_encryption_keys

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877436#comment-15877436
 ] 

Hive QA commented on HIVE-15991:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853813/HIVE-15991.txt

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10252 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join16] 
(batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join21] 
(batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_8] 
(batchId=111)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3681/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3681/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3681/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853813 - PreCommit-HIVE-Build

> Flaky Test: TestEncryptedHDFSCliDriver 
> encryption_join_with_different_encryption_keys
> -
>
> Key: HIVE-15991
> URL: https://issues.apache.org/jira/browse/HIVE-15991
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15991.txt
>
>
> I ran a git-bisect and seems HIVE-15703 started causing this failure. Not 
> entirely sure why, but I updated the .out file and the diff is pretty 
> straightforward, so I think its safe to just update it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15958) LLAP: IPC connections are not being reused for umbilical protocol

2017-02-21 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15958:
-
Attachment: HIVE-15958.2.patch

[~sseth] also added clearing AMNodeInfo on query completion. Can you please 
take another look?

> LLAP: IPC connections are not being reused for umbilical protocol
> -
>
> Key: HIVE-15958
> URL: https://issues.apache.org/jira/browse/HIVE-15958
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15958.1.patch, HIVE-15958.2.patch
>
>
> During concurrency testing, observed 1000s of ipc thread creations. Ideally, 
> the connections to same hosts should be reused.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-1626) stop using java.util.Stack

2017-02-21 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-1626:
-
Status: Patch Available  (was: Open)

> stop using java.util.Stack
> --
>
> Key: HIVE-1626
> URL: https://issues.apache.org/jira/browse/HIVE-1626
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Teddy Choi
> Attachments: HIVE-1626.2.patch
>
>
> We currently use Stack as part of the generic node walking library.  Stack 
> should not be used for this since its inheritance from Vector incurs 
> superfluous synchronization overhead.
> Most projects end up adding an ArrayStack implementation and using that 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-1626) stop using java.util.Stack

2017-02-21 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-1626:
-
Attachment: HIVE-1626.2.patch

125 files are changed. Most of files are subclasses of NodeProcessor and 
Dispatcher. They now use Deque instead of Stack. However, there were dozens of 
Stack.get(int) calls, which is not in ArrayDeque. I implemented 
Utils.get(Deque, int) for it with Deque.decendingIterator(), which impacts GC.

> stop using java.util.Stack
> --
>
> Key: HIVE-1626
> URL: https://issues.apache.org/jira/browse/HIVE-1626
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Teddy Choi
> Attachments: HIVE-1626.2.patch
>
>
> We currently use Stack as part of the generic node walking library.  Stack 
> should not be used for this since its inheritance from Vector incurs 
> superfluous synchronization overhead.
> Most projects end up adding an ArrayStack implementation and using that 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877374#comment-15877374
 ] 

Hive QA commented on HIVE-14901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853818/HIVE-14901.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10253 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
 (batchId=218)
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
 (batchId=217)
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAllowedCommands
 (batchId=218)
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAuthorization1
 (batchId=218)
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testBlackListedUdfUsage
 (batchId=218)
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testConfigWhiteList
 (batchId=218)
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthBinary.testAuthorization1 
(batchId=229)
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthHttp.testAuthorization1 
(batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3680/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3680/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3680/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853818 - PreCommit-HIVE-Build

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877304#comment-15877304
 ] 

Hive QA commented on HIVE-15955:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853824/HIVE-15955.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 111 failed/errored test(s), 10252 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input4] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join0] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parallel_join0] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[plan_json] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join3] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join4] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join6] 
(batchId=38)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[constprog_dpp]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[constprog_semijoin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_5] 
(batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_mat_1] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_mat_2] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_mat_3] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_mat_4] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_mat_5] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[deleteAnalyze]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[empty_join] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_4]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_cache] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_aggregate_without_gby]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_timestamp]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing_gby]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct]
 (batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query12] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query13] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query15] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query17] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query18] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query19] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query1] (batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query20] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query21] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query22] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query25] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query26]

[jira] [Commented] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877258#comment-15877258
 ] 

Eugene Koifman commented on HIVE-15999:
---

could you explain why this is causing a derby error?

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HIVE-15946) Failing test : TestCliDriver cbo_rp_auto_join1

2017-02-21 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-15946.
---
   Resolution: Resolved
Fix Version/s: 2.2.0

Fixed in HIVE-15948

> Failing test : TestCliDriver cbo_rp_auto_join1
> --
>
> Key: HIVE-15946
> URL: https://issues.apache.org/jira/browse/HIVE-15946
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Thejas M Nair
> Fix For: 2.2.0
>
>
> Started failing in master around Feb 14 2017.
> {code}
> at org.junit.Assert.fail(Assert.java:88)
>   at org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:2204)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:186)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>   at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16002) Correlated IN subquery with aggregate asserts in sq_count_check UDF

2017-02-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16002:

Status: Patch Available  (was: Open)

> Correlated IN subquery with aggregate asserts in sq_count_check UDF
> ---
>
> Key: HIVE-16002
> URL: https://issues.apache.org/jira/browse/HIVE-16002
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16002.1.patch
>
>
> Reproducer
> {code:SQL}
> create table t(i int, j int);
> insert into t values(0,1), (0,2);
> create table tt(i int, j int);
> insert into tt values(0,3);
> select * from t where i IN (select count(i) from tt where tt.j = t.j);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16002) Correlated IN subquery with aggregate asserts in sq_count_check UDF

2017-02-21 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16002:
---
Attachment: HIVE-16002.1.patch

> Correlated IN subquery with aggregate asserts in sq_count_check UDF
> ---
>
> Key: HIVE-16002
> URL: https://issues.apache.org/jira/browse/HIVE-16002
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16002.1.patch
>
>
> Reproducer
> {code:SQL}
> create table t(i int, j int);
> insert into t values(0,1), (0,2);
> create table tt(i int, j int);
> insert into tt values(0,3);
> select * from t where i IN (select count(i) from tt where tt.j = t.j);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16002) Correlated IN subquery with aggregate asserts in sq_count_check UDF

2017-02-21 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-16002:
--


> Correlated IN subquery with aggregate asserts in sq_count_check UDF
> ---
>
> Key: HIVE-16002
> URL: https://issues.apache.org/jira/browse/HIVE-16002
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> ==Reproducer==
> {code:SQL}
> create table t(i int, j int);
> insert into t values(0,1), (0,2);
> create table tt(i int, j int);
> insert into tt values(0,3);
> select * from t where i IN (select count(i) from tt where tt.j = t.j);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16002) Correlated IN subquery with aggregate asserts in sq_count_check UDF

2017-02-21 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16002:
---
Description: 
Reproducer

{code:SQL}
create table t(i int, j int);
insert into t values(0,1), (0,2);

create table tt(i int, j int);
insert into tt values(0,3);

select * from t where i IN (select count(i) from tt where tt.j = t.j);
{code}

  was:
==Reproducer==

{code:SQL}
create table t(i int, j int);
insert into t values(0,1), (0,2);

create table tt(i int, j int);
insert into tt values(0,3);

select * from t where i IN (select count(i) from tt where tt.j = t.j);
{code}


> Correlated IN subquery with aggregate asserts in sq_count_check UDF
> ---
>
> Key: HIVE-16002
> URL: https://issues.apache.org/jira/browse/HIVE-16002
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> Reproducer
> {code:SQL}
> create table t(i int, j int);
> insert into t values(0,1), (0,2);
> create table tt(i int, j int);
> insert into tt values(0,3);
> select * from t where i IN (select count(i) from tt where tt.j = t.j);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15938) position alias in order by fails for union queries

2017-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15938:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master after removing the test-time logging. Thanks for the 
reviews!
After discussing with [~ashutoshc], it doesn't seem like the other patch will 
make this significantly easier. Perhaps when that is also committed we can 
simplify this if there's some obvious solution.

> position alias in order by fails for union queries
> --
>
> Key: HIVE-15938
> URL: https://issues.apache.org/jira/browse/HIVE-15938
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15938.01.patch, HIVE-15938.02.patch, 
> HIVE-15938.03.patch, HIVE-15938.04.patch, HIVE-15938.patch
>
>
> The problem is that the query introduces a spurious ALLCOLREF



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877225#comment-15877225
 ] 

Wei Zheng commented on HIVE-15999:
--

[~ekoifman] Can you take a look please?

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877220#comment-15877220
 ] 

Hive QA commented on HIVE-15999:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853815/HIVE-15999.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10251 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[0]
 (batchId=173)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=173)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTimestamp 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3678/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3678/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853815 - PreCommit-HIVE-Build

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-21 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Attachment: HIVE-1555.6.patch

Thanks [~brocknoland]. Fixed cleanupResources to close all 3 even in the case 
of exception.

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.3.patch, HIVE-1555.4.patch, HIVE-1555.5.patch, 
> HIVE-1555.6.patch, JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-21 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Status: Patch Available  (was: Open)

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.3.patch, HIVE-1555.4.patch, HIVE-1555.5.patch, 
> HIVE-1555.6.patch, JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-12492) MapJoin: 4 million unique integers seems to be a probe plateau

2017-02-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877214#comment-15877214
 ] 

Ashutosh Chauhan commented on HIVE-12492:
-

I am not sure if patch is working as intended. For 2-way join, A DHJ is 
selected with 2 CUSTOM_SIMPLE_EDGE going into Reducer where join operator is of 
type MapJoin.
None of the test cases in attached patch has plans of that shape.

See: ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_*.q.out 

> MapJoin: 4 million unique integers seems to be a probe plateau
> --
>
> Key: HIVE-12492
> URL: https://issues.apache.org/jira/browse/HIVE-12492
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12492.patch
>
>
> After 4 million keys, the map-join implementation seems to suffer from a 
> performance degradation. 
> The hashtable build & probe time makes this very inefficient, even if the 
> data is very compact (i.e 2 ints).
> Falling back onto the shuffle join or bucket map-join is useful after 2^22 
> items.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-21 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Status: Open  (was: Patch Available)

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.3.patch, HIVE-1555.4.patch, HIVE-1555.5.patch, 
> JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15626) beeline should not exit after canceling the query on ctrl-c

2017-02-21 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877166#comment-15877166
 ] 

Vihang Karajgaonkar commented on HIVE-15626:


Hi [~leftylev] I updated the BeeLine wiki which documents this behavior 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Cancellingthequery

Please let me know if you need any changes in the text. THanks!

> beeline should not exit after canceling the query on ctrl-c
> ---
>
> Key: HIVE-15626
> URL: https://issues.apache.org/jira/browse/HIVE-15626
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-15626.01.patch
>
>
> I am seeing this in 1.2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877127#comment-15877127
 ] 

Ashutosh Chauhan commented on HIVE-15955:
-

Latest patch looks good. Some minor comments on RB.

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877112#comment-15877112
 ] 

Hive QA commented on HIVE-15881:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853804/HIVE-15881.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10252 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_14]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_15]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucketmapjoin8] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby5_map_skew] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_reorder] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample1] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[skewjoinopt4] 
(batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union10] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union34] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
 (batchId=106)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=211)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3677/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3677/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3677/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853804 - PreCommit-HIVE-Build

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch
>
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16001) add test for merge + runtime filtering

2017-02-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16001:
--
Description: 
make sure merge works with HIVE-15802 and HIVE-15269
add to sqlmerge.q
cc [~jdere]

  was:make sure merge works with HIVE-15802 and HIVE-15269


> add test for merge + runtime filtering
> --
>
> Key: HIVE-16001
> URL: https://issues.apache.org/jira/browse/HIVE-16001
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> make sure merge works with HIVE-15802 and HIVE-15269
> add to sqlmerge.q
> cc [~jdere]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16001) add test for merge + runtime filtering

2017-02-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16001:
-


> add test for merge + runtime filtering
> --
>
> Key: HIVE-16001
> URL: https://issues.apache.org/jira/browse/HIVE-16001
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> make sure merge works with HIVE-15802 and HIVE-15269



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15880) Allow insert overwrite query to use auto.purge table property

2017-02-21 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-15880:
---
Status: Patch Available  (was: Open)

> Allow insert overwrite query to use auto.purge table property
> -
>
> Key: HIVE-15880
> URL: https://issues.apache.org/jira/browse/HIVE-15880
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-15880.01.patch
>
>
> It seems inconsistent that auto.purge property is not considered when we do a 
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:03 
> /user/hive/warehouse/temp/00_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:07 
> /user/hive/warehouse/temp/00_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 26 2017-02-09 13:08 
> /user/hive/warehouse/temp/00_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx--   - hive hive  0 2017-02-09 13:08 
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant 
> overhead on slow FileSystems like S3. This could improve the performance of 
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
> partitions on tables located on S3 should the user wish to set auto.purge 
> property to true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15880) Allow insert overwrite query to use auto.purge table property

2017-02-21 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-15880:
---
Attachment: HIVE-15880.01.patch

Adding the first version of the patch to trigger the pre-commit. Will be 
submitting second version with additional test cases

> Allow insert overwrite query to use auto.purge table property
> -
>
> Key: HIVE-15880
> URL: https://issues.apache.org/jira/browse/HIVE-15880
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-15880.01.patch
>
>
> It seems inconsistent that auto.purge property is not considered when we do a 
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:03 
> /user/hive/warehouse/temp/00_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:07 
> /user/hive/warehouse/temp/00_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 26 2017-02-09 13:08 
> /user/hive/warehouse/temp/00_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx--   - hive hive  0 2017-02-09 13:08 
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant 
> overhead on slow FileSystems like S3. This could improve the performance of 
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
> partitions on tables located on S3 should the user wish to set auto.purge 
> property to true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15934) Downgrade Maven surefire plugin from 2.19.1 to 2.18.1

2017-02-21 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15934:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Zoltan for the review!

> Downgrade Maven surefire plugin from 2.19.1 to 2.18.1
> -
>
> Key: HIVE-15934
> URL: https://issues.apache.org/jira/browse/HIVE-15934
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15934.1.patch
>
>
> Surefire 2.19.1 has some issue 
> (https://issues.apache.org/jira/browse/SUREFIRE-1255) which caused debugging 
> session to abort after a short period of time. Many IntelliJ users have seen 
> this, although it looks fine for Eclipse users. Version 2.18.1 works fine.
> We'd better make the change to not impact the development for IntelliJ guys. 
> We can upgrade again once the root cause is figured out.
> cc [~kgyrtkirk] [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15931) JDBC: Improve logging when using ZooKeeper

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877034#comment-15877034
 ] 

Hive QA commented on HIVE-15931:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853800/HIVE-15931.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10251 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
org.apache.hive.jdbc.TestJdbcDriver2.testBadURL (batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3676/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3676/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3676/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853800 - PreCommit-HIVE-Build

> JDBC: Improve logging when using ZooKeeper
> --
>
> Key: HIVE-15931
> URL: https://issues.apache.org/jira/browse/HIVE-15931
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15931.1.patch, HIVE-15931.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15934) Downgrade Maven surefire plugin from 2.19.1 to 2.18.1

2017-02-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877012#comment-15877012
 ] 

ASF GitHub Bot commented on HIVE-15934:
---

GitHub user weiatwork opened a pull request:

https://github.com/apache/hive/pull/152

HIVE-15934 : Downgrade Maven surefire plugin from 2.19.1 to 2.18.1 (W…

…ei Zheng, reviewed by Zoltan Haindrich)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/hive HIVE-15934

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #152


commit cc9085617b8749b8eb0a69fb893133ac04915eb8
Author: Wei Zheng 
Date:   2017-02-21T23:31:51Z

HIVE-15934 : Downgrade Maven surefire plugin from 2.19.1 to 2.18.1 (Wei 
Zheng, reviewed by Zoltan Haindrich)




> Downgrade Maven surefire plugin from 2.19.1 to 2.18.1
> -
>
> Key: HIVE-15934
> URL: https://issues.apache.org/jira/browse/HIVE-15934
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15934.1.patch
>
>
> Surefire 2.19.1 has some issue 
> (https://issues.apache.org/jira/browse/SUREFIRE-1255) which caused debugging 
> session to abort after a short period of time. Many IntelliJ users have seen 
> this, although it looks fine for Eclipse users. Version 2.18.1 works fine.
> We'd better make the change to not impact the development for IntelliJ guys. 
> We can upgrade again once the root cause is figured out.
> cc [~kgyrtkirk] [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Status: Open  (was: Patch Available)

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Status: Patch Available  (was: Open)

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Attachment: HIVE-15955.02.patch

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch, HIVE-15955.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15955) make explain formatted to include opId and etc

2017-02-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15955:
---
Summary: make explain formatted to include opId and etc  (was: make explain 
formatted to dump JSONObject when user level explain is on)

> make explain formatted to include opId and etc
> --
>
> Key: HIVE-15955
> URL: https://issues.apache.org/jira/browse/HIVE-15955
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15955.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15996) Implement multiargument GROUPING function

2017-02-21 Thread Carter Shanklin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876956#comment-15876956
 ] 

Carter Shanklin commented on HIVE-15996:


[~julianhyde] I don't think so, GROUP_ID seems to depend on the input data (I'm 
no Oracle expert to be fair here) while this is converting the bitmasks to 
numbers.

So i could select all of grouping(c1, c2, c3); grouping(c1, c3) and 
grouping(c2, c3) with different numbers per row whereas GROUP_ID doesn't seem 
to take any arguments.

> Implement multiargument GROUPING function
> -
>
> Key: HIVE-15996
> URL: https://issues.apache.org/jira/browse/HIVE-15996
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
>
> Per the SQL standard section 6.9:
> GROUPING ( CR1, ..., CRN-1, CRN )
> is equivalent to:
> CAST ( ( 2 * GROUPING ( CR1, ..., CRN-1 ) + GROUPING ( CRN ) ) AS IDT )
> So for example:
> select c1, c2, c3, grouping(c1, c2, c3) from e011_02 group by rollup(c1, c2, 
> c3);
> Should be allowed and equivalent to:
> select c1, c2, c3, 4*grouping(c1) + 2*grouping(c2) + grouping(c3) from 
> e011_02 group by rollup(c1, c2, c3);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15991) Flaky Test: TestEncryptedHDFSCliDriver encryption_join_with_different_encryption_keys

2017-02-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15991:

Description: I ran a git-bisect and seems HIVE-15703 started causing this 
failure. Not entirely sure why, but I updated the .out file and the diff is 
pretty straightforward, so I think its safe to just update it.

> Flaky Test: TestEncryptedHDFSCliDriver 
> encryption_join_with_different_encryption_keys
> -
>
> Key: HIVE-15991
> URL: https://issues.apache.org/jira/browse/HIVE-15991
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15991.txt
>
>
> I ran a git-bisect and seems HIVE-15703 started causing this failure. Not 
> entirely sure why, but I updated the .out file and the diff is pretty 
> straightforward, so I think its safe to just update it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876933#comment-15876933
 ] 

Hive QA commented on HIVE-15570:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853219/HIVE-15570.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10251 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3675/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3675/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3675/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853219 - PreCommit-HIVE-Build

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch, 
> HIVE-15570.3.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-21 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876900#comment-15876900
 ] 

Thomas Poepping commented on HIVE-15881:


Hey [~spena], updated the RB. Just one question, otherwise non-binding +1

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch
>
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-21 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876900#comment-15876900
 ] 

Thomas Poepping edited comment on HIVE-15881 at 2/21/17 10:42 PM:
--

Hey [~spena], updated the RB. Just one question, otherwise non-binding +1 
pending QA 


was (Author: poeppt):
Hey [~spena], updated the RB. Just one question, otherwise non-binding +1

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch
>
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-02-21 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Status: In Progress  (was: Patch Available)

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-02-21 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Attachment: HIVE-14901.4.patch

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-02-21 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Status: Patch Available  (was: In Progress)

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15971) LLAP: logs urls should use daemon container id instead of fake container id

2017-02-21 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15971:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master

> LLAP: logs urls should use daemon container id instead of fake container id
> ---
>
> Key: HIVE-15971
> URL: https://issues.apache.org/jira/browse/HIVE-15971
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15971.1.patch, HIVE-15971.2.patch, 
> HIVE-15971.3.patch, HIVE-15971.4.patch
>
>
> The containerId used for log url generation is fake. It should be replaced by 
> the container id of the llap daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15971) LLAP: logs urls should use daemon container id instead of fake container id

2017-02-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876869#comment-15876869
 ] 

Prasanth Jayachandran commented on HIVE-15971:
--

test failures are not related to this patch and has been failing already. 

> LLAP: logs urls should use daemon container id instead of fake container id
> ---
>
> Key: HIVE-15971
> URL: https://issues.apache.org/jira/browse/HIVE-15971
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15971.1.patch, HIVE-15971.2.patch, 
> HIVE-15971.3.patch, HIVE-15971.4.patch
>
>
> The containerId used for log url generation is fake. It should be replaced by 
> the container id of the llap daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15971) LLAP: logs urls should use daemon container id instead of fake container id

2017-02-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876851#comment-15876851
 ] 

Prasanth Jayachandran commented on HIVE-15971:
--

[~sseth] Thanks for the review. Created HIVE-16000 for follow up. Will check if 
the test failures are related before commit. 

> LLAP: logs urls should use daemon container id instead of fake container id
> ---
>
> Key: HIVE-15971
> URL: https://issues.apache.org/jira/browse/HIVE-15971
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15971.1.patch, HIVE-15971.2.patch, 
> HIVE-15971.3.patch, HIVE-15971.4.patch
>
>
> The containerId used for log url generation is fake. It should be replaced by 
> the container id of the llap daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16000) LLAP: LLAP log urls improvements

2017-02-21 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16000:
-
Description: 
Follow up for HIVE-15971 (based on 
https://issues.apache.org/jira/browse/HIVE-15971?focusedCommentId=15876814=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15876814)
1) Make NodeManager web address port available via ServiceInstance or something 
better (other than reading from configuration)
2) When llap node goes down log URL cannot be constructed since we rely on 
information from service registry. Instead YARN NodeId can be extended to 
provided necessary information (container id) for constructing the log url. 

  was:
Follow up for HIVE-15971
1) Make NodeManager web address port available via ServiceInstance or something 
better (other than reading from configuration)
2) When llap node goes down log URL cannot be constructed since we rely on 
information from service registry. Instead YARN NodeId can be extended to 
provided necessary information (container id) for constructing the log url. 


> LLAP: LLAP log urls improvements
> 
>
> Key: HIVE-16000
> URL: https://issues.apache.org/jira/browse/HIVE-16000
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>
> Follow up for HIVE-15971 (based on 
> https://issues.apache.org/jira/browse/HIVE-15971?focusedCommentId=15876814=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15876814)
> 1) Make NodeManager web address port available via ServiceInstance or 
> something better (other than reading from configuration)
> 2) When llap node goes down log URL cannot be constructed since we rely on 
> information from service registry. Instead YARN NodeId can be extended to 
> provided necessary information (container id) for constructing the log url. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15999:
-
Description: 
Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
this:
{code}
java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at 
org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
at 
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
{code}
The failure is due to HiveConf used in the test being polluted by some test, 
e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set to 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.

  was:
Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
this:
{code}
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks
 Error Details
Table/View 'TXNS' already exists in Schema 'APP'.
{code}
The failure is due to HiveConf used in the test being polluted by some test, 
e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set to 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.


> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:75)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.setUp(TestDbTxnManager2.java:90)
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15999:
-
Attachment: HIVE-15999.1.patch

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks
>  Error Details
> Table/View 'TXNS' already exists in Schema 'APP'.
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15999:
-
Status: Patch Available  (was: Open)

> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15999.1.patch
>
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks
>  Error Details
> Table/View 'TXNS' already exists in Schema 'APP'.
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15991) Flaky Test: TestEncryptedHDFSCliDriver encryption_join_with_different_encryption_keys

2017-02-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876837#comment-15876837
 ] 

Ashutosh Chauhan commented on HIVE-15991:
-

+1

> Flaky Test: TestEncryptedHDFSCliDriver 
> encryption_join_with_different_encryption_keys
> -
>
> Key: HIVE-15991
> URL: https://issues.apache.org/jira/browse/HIVE-15991
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15991.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15991) Flaky Test: TestEncryptedHDFSCliDriver encryption_join_with_different_encryption_keys

2017-02-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15991:

Status: Patch Available  (was: Open)

> Flaky Test: TestEncryptedHDFSCliDriver 
> encryption_join_with_different_encryption_keys
> -
>
> Key: HIVE-15991
> URL: https://issues.apache.org/jira/browse/HIVE-15991
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15991.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reassigned HIVE-15999:



> Fix flakiness in TestDbTxnManager2
> --
>
> Key: HIVE-15999
> URL: https://issues.apache.org/jira/browse/HIVE-15999
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>
> Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
> this:
> {code}
> org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks
>  Error Details
> Table/View 'TXNS' already exists in Schema 'APP'.
> {code}
> The failure is due to HiveConf used in the test being polluted by some test, 
> e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set 
> to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15844) Add WriteType to Explain Plan of ReduceSinkOperator and FileSinkOperator

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876829#comment-15876829
 ] 

Hive QA commented on HIVE-15844:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853791/HIVE-15844.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 10251 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_non_partitioned]
 (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_tmp_table] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_whole_partition] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynpart_sort_optimization_acid2]
 (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_update_delete] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts_special_characters]
 (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_non_partitioned]
 (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_partitioned] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_two_cols] 
(batchId=19)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_non_partitioned]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_tmp_table]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_whole_partition]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_update_delete]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_update]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_table_update]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part_update]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_table_update]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_after_multiple_inserts]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_non_partitioned]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_partitioned]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_two_cols]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge2 (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge3 (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMerge2 
(batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMerge3 
(batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge2
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge3
 (batchId=266)
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.schemaEvolutionAddColDynamicPartitioningUpdate
 (batchId=205)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel

[jira] [Updated] (HIVE-15959) LLAP: fix headroom calculation and move it to daemon

2017-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15959:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> LLAP: fix headroom calculation and move it to daemon
> 
>
> Key: HIVE-15959
> URL: https://issues.apache.org/jira/browse/HIVE-15959
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15959.01.patch, HIVE-15959.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15991) Flaky Test: TestEncryptedHDFSCliDriver encryption_join_with_different_encryption_keys

2017-02-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15991:

Attachment: HIVE-15991.txt

> Flaky Test: TestEncryptedHDFSCliDriver 
> encryption_join_with_different_encryption_keys
> -
>
> Key: HIVE-15991
> URL: https://issues.apache.org/jira/browse/HIVE-15991
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15991.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15971) LLAP: logs urls should use daemon container id instead of fake container id

2017-02-21 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876814#comment-15876814
 ] 

Siddharth Seth commented on HIVE-15971:
---

+1. Looks good. There's a couple of follow up items.

Reading the port from YARNConfiguration is not great. A port value of 0 means a 
dynamic port, in which case this completely breaks. We need a follow up jira to 
figure out a good way to make the port available. Likely published from within 
the LLAPDaemon container itself.

Other than this, if the llap instance is not found (e.g. task timed out because 
llap went down) - we won't be able to construct the log URL. Probably need to 
handle this by retaining information for some time.
A bunch of this can be simplified if NodeId could be extended in Yarn. A 
LlapNodeId could include information about the container, NM webaddress, etc - 
at allocation time.


> LLAP: logs urls should use daemon container id instead of fake container id
> ---
>
> Key: HIVE-15971
> URL: https://issues.apache.org/jira/browse/HIVE-15971
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15971.1.patch, HIVE-15971.2.patch, 
> HIVE-15971.3.patch, HIVE-15971.4.patch
>
>
> The containerId used for log url generation is fake. It should be replaced by 
> the container id of the llap daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15867) Add blobstore tests for import/export

2017-02-21 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876820#comment-15876820
 ] 

Sahil Takiar commented on HIVE-15867:
-

No worries, just checking in, thanks for the update!

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15934) Downgrade Maven surefire plugin from 2.19.1 to 2.18.1

2017-02-21 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876819#comment-15876819
 ] 

Zoltan Haindrich commented on HIVE-15934:
-

[~wzheng] I totally aggree; we should downgrade surefire - currently there is 
no better alternative
+1 

> Downgrade Maven surefire plugin from 2.19.1 to 2.18.1
> -
>
> Key: HIVE-15934
> URL: https://issues.apache.org/jira/browse/HIVE-15934
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15934.1.patch
>
>
> Surefire 2.19.1 has some issue 
> (https://issues.apache.org/jira/browse/SUREFIRE-1255) which caused debugging 
> session to abort after a short period of time. Many IntelliJ users have seen 
> this, although it looks fine for Eclipse users. Version 2.18.1 works fine.
> We'd better make the change to not impact the development for IntelliJ guys. 
> We can upgrade again once the root cause is figured out.
> cc [~kgyrtkirk] [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15934) Downgrade Maven surefire plugin from 2.19.1 to 2.18.1

2017-02-21 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876791#comment-15876791
 ] 

Wei Zheng commented on HIVE-15934:
--

Ping [~ashutoshc]..

> Downgrade Maven surefire plugin from 2.19.1 to 2.18.1
> -
>
> Key: HIVE-15934
> URL: https://issues.apache.org/jira/browse/HIVE-15934
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15934.1.patch
>
>
> Surefire 2.19.1 has some issue 
> (https://issues.apache.org/jira/browse/SUREFIRE-1255) which caused debugging 
> session to abort after a short period of time. Many IntelliJ users have seen 
> this, although it looks fine for Eclipse users. Version 2.18.1 works fine.
> We'd better make the change to not impact the development for IntelliJ guys. 
> We can upgrade again once the root cause is figured out.
> cc [~kgyrtkirk] [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15867) Add blobstore tests for import/export

2017-02-21 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876786#comment-15876786
 ] 

Thomas Poepping commented on HIVE-15867:


Hi [~stakiar], yes, still on the radar. My colleague is working on a patch now, 
once it passes our internal review he'll attach it here. Sorry for the wait

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15971) LLAP: logs urls should use daemon container id instead of fake container id

2017-02-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876733#comment-15876733
 ] 

Hive QA commented on HIVE-15971:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853789/HIVE-15971.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10251 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
 (batchId=210)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[0]
 (batchId=173)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3673/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3673/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3673/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853789 - PreCommit-HIVE-Build

> LLAP: logs urls should use daemon container id instead of fake container id
> ---
>
> Key: HIVE-15971
> URL: https://issues.apache.org/jira/browse/HIVE-15971
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15971.1.patch, HIVE-15971.2.patch, 
> HIVE-15971.3.patch, HIVE-15971.4.patch
>
>
> The containerId used for log url generation is fake. It should be replaced by 
> the container id of the llap daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 >

1 - 100 of 162 matches

Mail list logo