date:20170120

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-01-20 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.2.patch

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15675) ql.hooks.TestQueryHooks failure

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832864#comment-15832864
 ] 

Hive QA commented on HIVE-15675:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848485/HIVE-15675.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10974 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3089/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3089/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3089/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848485 - PreCommit-HIVE-Build

> ql.hooks.TestQueryHooks failure
> ---
>
> Key: HIVE-15675
> URL: https://issues.apache.org/jira/browse/HIVE-15675
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jun He
>Assignee: Jun He
> Attachments: HIVE-15675.0.patch
>
>
> ql.parse.TestQBCompact creates table "foo" in initialization but doesn't 
> clean it after its testcases are finished. This will cause 
> ql.hooks.TestQueryHooks::testCompileFailure failed as testCompileFailure 
> expects that "foo" doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15673) Allow multiple queries with disjunction

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832844#comment-15832844
 ] 

Hive QA commented on HIVE-15673:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848479/HIVE-15673.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3088/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3088/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3088/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-21 06:29:27.688
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3088/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-21 06:29:27.691
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at d9343f6 HIVE-15544 : Support scalar subqueries (Vineet Garg via 
Ashutosh Chauhan)
+ git clean -f -d
Removing common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
Removing 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/VertorDeserializeOrcWriter.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at d9343f6 HIVE-15544 : Support scalar subqueries (Vineet Garg via 
Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-21 06:29:28.755
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidQueryBasedInputFormat.java:
 No such file or directory
error: a/pom.xml: No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java:
 No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java:
 No such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: No 
such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848479 - PreCommit-HIVE-Build

> Allow multiple queries with disjunction
> ---
>
> Key: HIVE-15673
> URL: https://issues.apache.org/jira/browse/HIVE-15673
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15673.1.patch
>
>
> HIVE currently doesn't allow multiple subqueries with {{OR}} since calcite 
> has a bug in determining logic for OR expression. See [CALCITE-1546 
> |https://issues.apache.org/jira/browse/CALCITE-1546].
> Once calcite is released containing fix for the bug HIVE will need to lift 
> the restriction and add tests cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832842#comment-15832842
 ] 

Hive QA commented on HIVE-15664:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848464/HIVE-15664.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10974 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3087/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3087/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3087/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848464 - PreCommit-HIVE-Build

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15690) Speed up WebHCat DDL Response Time by Using JDBC

2017-01-20 Thread Amin Abbaspour (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amin Abbaspour updated HIVE-15690:
--
Assignee: Amin Abbaspour
  Status: Patch Available  (was: Open)

Patch is available in github.com PR: https://github.com/apache/hive/pull/133

> Speed up WebHCat DDL Response Time by Using JDBC
> 
>
> Key: HIVE-15690
> URL: https://issues.apache.org/jira/browse/HIVE-15690
> Project: Hive
>  Issue Type: Improvement
>  Components: WebHCat
>Reporter: Amin Abbaspour
>Assignee: Amin Abbaspour
>Priority: Minor
>  Labels: easyfix, patch, performance, security
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> WebHCat launches new hcat scripts for each DDL call which makes it unsuitable 
> for interactive REST environments.
> This change to speed up /ddl query calls by running them over JDBC connection 
> to Hive thrift server.
> Also being JDBC connection, this is secure and compatible with all access 
> policies define in Hive server2. User does not have metadata visibility over 
> other databases (which is the case in hcat mode.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15658) hive.ql.session.SessionState start() is not atomic, SessionState thread local variable can get into inconsistent state

2017-01-20 Thread Michal Klempa (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832835#comment-15832835
 ] 

Michal Klempa commented on HIVE-15658:
--

Hi Sergey, thank you for response, unfortunately, I am not the author of the 
original code and I have problems figuring out all the places where this should 
be fixed. I analyzed all the calls of .setCurrentSession and modified those 
parts, where I was it is not paired with .detach().
If, in HiveSessionImpl the session is already valid, why there is call to 
setSessionState, though. Maybe it may be reomved.

Maybe we need more complex change of this, to centralize the point where the 
Session is created & initialized atomically and then only call this one 
procedure. Standard way of doing this is in constructors. Detaching/closing is 
then done in .close(). I do not know why this time, the .start was chosen to 
decouple construction and starting of object, and why classes using 
SessionState are calling .detach() directly and not only .close(). The outer 
API of SessionState class is not clear to me.

> hive.ql.session.SessionState start() is not atomic, SessionState thread local 
> variable can get into inconsistent state
> --
>
> Key: HIVE-15658
> URL: https://issues.apache.org/jira/browse/HIVE-15658
> Project: Hive
>  Issue Type: Bug
>  Components: API, HCatalog
>Affects Versions: 1.1.0, 1.2.1, 2.0.0, 2.0.1
> Environment: CDH5.8.0, Flume 1.6.0, Hive 1.1.0
>Reporter: Michal Klempa
> Attachments: HIVE-15658_branch-1.2_1.patch, 
> HIVE-15658_branch-2.1_1.patch
>
>
> Method start() in hive.ql.session.SessionState is supposed to setup needed 
> preconditions, like HDFS scratch directories for session.
> This happens to be not an atomic operation with setting thread local 
> variable, which can later be obtained by calling SessionState.get().
> Therefore, even is the start() method itself fails, the SessionState.get() 
> does not return null and further re-use of the thread which previously 
> invoked start() may lead to obtaining SessionState object in inconsistent 
> state.
> I have observed this using Flume Hive Sink, which uses Hive Streaming 
> interface. When the directory /tmp/hive is not writable by session user, the 
> start() method fails (throwing RuntimeException). If the thread is re-used 
> (like it is in Flume), further executions work with wrongly initialized 
> SessionState object (HDFS dirs are non-existent). In Flume, this happens to 
> me when Flume should create partition if not exists (but the code doing this 
> is in Hive Streaming).
> Steps to reproduce:
> 0. create test spooldir and allow flume to write to it, in my case 
> /home/ubuntu/flume_test, 775, ubuntu:flume
> 1. create Flume config (see attachment)
> 2. create Hive table
> {code}
> create table default.flume_test (column1 string, column2 string) partitioned 
> by (dt string) clustered by (column1) INTO 2 BUCKETS STORED AS ORC;
> {code}
> 3. start flume agent:
> {code}
> bin/flume-ng agent -n a1 -c conf -f conf/flume-config.txt
> {code}
> 4. hdfs dfs -chmod 600 /tmp/hive
> 5. put this file into spooldir:
> {code}
> echo value1,value2 > file1
> {code}
> Expected behavior:
> Exception regarding scratch dir permissions to be thrown repeatedly.
> example (note that the line numbers are wrong as Cloudera is cloning the 
> source codes here https://github.com/cloudera/flume-ng/ and here 
> https://github.com/cloudera/hive):
> {code}
> 2017-01-18 12:39:38,926 WARN org.apache.flume.sink.hive.HiveSink: sink_hive_1 
> : Failed connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to 
> EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', database='default', 
> table='flume_test', partitionVals=[20170118] } 
> at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:99)
> at 
> org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
> at 
> org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:296)
> at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:254)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed 
> connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> at 
>

[jira] [Commented] (HIVE-15669) LLAP: Improve aging in shortest job first scheduler

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832822#comment-15832822
 ] 

Hive QA commented on HIVE-15669:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848458/HIVE-15669.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3086/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3086/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3086/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-21 05:33:23.425
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3086/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-21 05:33:23.427
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at d9343f6 HIVE-15544 : Support scalar subqueries (Vineet Garg via 
Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at d9343f6 HIVE-15544 : Support scalar subqueries (Vineet Garg via 
Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-21 05:33:24.446
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java
patching file 
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
patching file 
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/comparator/TestShortestJobFirstComparator.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.6) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MIndex
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRole
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRoleMap
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MGlobalPrivilege
ENHANCED (Persistable) :

[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832821#comment-15832821
 ] 

Hive QA commented on HIVE-15671:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848640/HIVE-15671.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10974 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus 
(batchId=213)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3085/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3085/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3085/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848640 - PreCommit-HIVE-Build

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Attachment: HIVE-15680.1.patch

Uploaded patch. Also posted RB at https://reviews.apache.org/r/55816/.

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Patch Available  (was: Open)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832810#comment-15832810
 ] 

Hive QA commented on HIVE-15439:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848660/HIVE-15439.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3084/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3084/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3084/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-21 04:33:46.755
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3084/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-21 04:33:46.758
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   0e78add..d9343f6  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 0e78add HIVE-15625 : escape1 test fails on Mac (Sergey 
Shelukhin, reviewed by Pengcheng Xiong)
+ git clean -f -d
Removing common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
Removing ql/src/java/org/apache/hadoop/hive/ql/exec/DynamicValueRegistry.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeDynamicValueEvaluator.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java
Removing ql/src/java/org/apache/hadoop/hive/ql/parse/RuntimeValuesInfo.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicValueDesc.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBloomFilter.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInBloomFilter.java
Removing ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q
Removing 
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out
Removing 
storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/LiteralDelegate.java
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at d9343f6 HIVE-15544 : Support scalar subqueries (Vineet Garg via 
Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-21 04:33:48.383
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file accumulo-handler/src/test/results/positive/accumulo_queries.q.out
patching file 
accumulo-handler/src/test/results/positive/accumulo_single_sourced_multi_insert.q.out
patching file druid-handler/pom.xml
patching file 
druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java
patching file 
druid-handler/src/test/org/apache/hadoop/hive/druid/DruidStorageHandlerTest.java
patching file 
druid-handler/src/test/org/apache/hadoop/hive/druid/TestDerbyConnector.java
patching file 
druid-handler/src/test/org/apache/hadoop/hive/ql/io/DruidRecordWriterTest.java
patching file hbase-handler/src/test/results/positive/hbase_queries.q.out
patching file 
hbase-handler/src/test/results/positive/hbase_single_sourced_multi_insert.q.out
patching file hbase-handler/src/test/results/positive/hbasestats.q.out
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookV2.java
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
patching file

[jira] [Commented] (HIVE-15269) Dynamic Min-Max runtime-filtering for Tez

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832809#comment-15832809
 ] 

Hive QA commented on HIVE-15269:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848654/HIVE-15269.16.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10966 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3083/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3083/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3083/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848654 - PreCommit-HIVE-Build

> Dynamic Min-Max runtime-filtering for Tez
> -
>
> Key: HIVE-15269
> URL: https://issues.apache.org/jira/browse/HIVE-15269
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, 
> HIVE-15269.12.patch, HIVE-15269.13.patch, HIVE-15269.14.patch, 
> HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.1.patch, 
> HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, 
> HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, 
> HIVE-15269.8.patch, HIVE-15269.9.patch
>
>
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on (store.id = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that 
> come out of the scan/filter of the store table, and send this min/max value 
> (via Tez edge) to the task which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where 
> this predicate can be pushed down to the storage handler (for example for ORC 
> formats). Pushing a min/max predicate to the ORC reader would allow us to 
> avoid having to entire whole row groups during the table scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15578) Simplify IdentifiersParser

2017-01-20 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15578:

Status: Open  (was: Patch Available)

This patch needs a rebase, now that HIVE-15544 is in.

> Simplify IdentifiersParser
> --
>
> Key: HIVE-15578
> URL: https://issues.apache.org/jira/browse/HIVE-15578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15578.01.patch, HIVE-15578.02.patch
>
>
> before: 1.72M LOC in IdentifiersParser, after: 1.41M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15579) Support HADOOP_PROXY_USER for secure impersonation in hive metastore client

2017-01-20 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15579:
-
Attachment: HIVE-15579.003.patch

Attaching patch file again to kick off the tests once more.

> Support HADOOP_PROXY_USER for secure impersonation in hive metastore client
> ---
>
> Key: HIVE-15579
> URL: https://issues.apache.org/jira/browse/HIVE-15579
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Nanda kumar
> Attachments: HIVE-15579.000.patch, HIVE-15579.001.patch, 
> HIVE-15579.002.patch, HIVE-15579.003.patch, HIVE-15579.003.patch
>
>
> Hadoop clients support HADOOP_PROXY_USER for secure impersonation. It would 
> be useful to have similar feature for hive metastore client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15544) Support scalar subqueries

2017-01-20 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15544:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~vgarg]

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Fix For: 2.2.0
>
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch, HIVE-15544.4.patch, HIVE-15544.5.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832786#comment-15832786
 ] 

Anthony Hsu edited comment on HIVE-15680 at 1/21/17 3:42 AM:
-

Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

Here's the explain plan, which does show a single mapper processing two table 
scans:
{noformat}
hive (default)> explain
  > select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: x
filterExpr: (number = 1) (type: boolean)
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (number = 1) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
Union
  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  TableScan
alias: y
filterExpr: (number = 2) (type: boolean)
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (number = 2) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
Union
  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 0.237 seconds, Fetched: 55 row(s)
{noformat}


was (Author: erwaman):
Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken:

[jira] [Commented] (HIVE-15651) LLAP: llap status tool enhancements

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832785#comment-15832785
 ] 

Hive QA commented on HIVE-15651:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848428/HIVE-15651.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10950 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=103)

[vector_decimal_aggregate.q,ppd_join3.q,auto_join23.q,join10.q,union_remove_11.q,union_ppr.q,join32.q,groupby_multi_single_reducer2.q,input18.q,stats3.q,cbo_simple_select.q,parquet_join.q,join26.q,groupby1.q,join_reorder2.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=152)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=217)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3082/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3082/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3082/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848428 - PreCommit-HIVE-Build

> LLAP: llap status tool enhancements
> ---
>
> Key: HIVE-15651
> URL: https://issues.apache.org/jira/browse/HIVE-15651
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15651.1.patch, HIVE-15651.2.patch
>
>
> Per [~sseth] following enhancements can be made to llap status tool
> 1) If state changes from an ACTIVE state to STOPPED - terminate the script 
> immediately (fail fast)
> 2) Add a threshold of what is acceptable in terms of the running state - 
> RUNNING_PARTIAL may be ok if 80% nodes are up for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832786#comment-15832786
 ] 

Anthony Hsu commented on HIVE-15680:


Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15688) LlapServiceDriver - an option to get rid of run.sh

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832779#comment-15832779
 ] 

Sergey Shelukhin commented on HIVE-15688:
-

[~gopalv] [~hagleitn] fyi

> LlapServiceDriver - an option to get rid of run.sh
> --
>
> Key: HIVE-15688
> URL: https://issues.apache.org/jira/browse/HIVE-15688
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
> connections to RM and other crap, for   2-5sec. of overhead per call, 
> depending on the machine/cluster.
> What we need is a mode for llapservicedriver that would not generate run.sh, 
> but would rather run the cluster immediately by calling the corresponding 4 
> slider APIs. Should probably be the default, too. For compat with scripts we 
> might generate blank run.sh for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15688) LlapServiceDriver - an option to get rid of run.sh

2017-01-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15688:

Description: 
run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
connections to RM and other crap, for   2-5sec. of overhead per call, depending 
on the machine/cluster.
What we need is a mode for llapservicedriver that would not generate run.sh, 
but would rather run the cluster immediately by calling the corresponding 4 
slider APIs. Should probably be the default, too. For compat with scripts we 
might generate blank run.sh for now.

  was:
run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
connections to RM and other crap, for   2-5sec. of overhead per call, depending 
on the machine/cluster.
What we need is a mode for llapservicedriver that would not generate run.sh, 
but would rather run the cluster immediately by calling the corresponding 4 
slider APIs.


> LlapServiceDriver - an option to get rid of run.sh
> --
>
> Key: HIVE-15688
> URL: https://issues.apache.org/jira/browse/HIVE-15688
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
> connections to RM and other crap, for   2-5sec. of overhead per call, 
> depending on the machine/cluster.
> What we need is a mode for llapservicedriver that would not generate run.sh, 
> but would rather run the cluster immediately by calling the corresponding 4 
> slider APIs. Should probably be the default, too. For compat with scripts we 
> might generate blank run.sh for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832751#comment-15832751
 ] 

Hive QA commented on HIVE-15544:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848422/HIVE-15544.5.patch

{color:green}SUCCESS:{color} +1 due to 19 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10974 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] 
(batchId=38)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_subq_not_in]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query69] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_subq_not_in] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_exists] 
(batchId=112)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=122)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3081/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3081/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3081/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848422 - PreCommit-HIVE-Build

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch, HIVE-15544.4.patch, HIVE-15544.5.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832749#comment-15832749
 ] 

Xuefu Zhang commented on HIVE-15671:


[~vanzin], could you please review the patch? Thanks.
[~lirui], Could you also try and review the patch? Thanks.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-15625) escape1 test fails on Mac

2017-01-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reopened HIVE-15625:
-

The patch was not committed properly as part of [~pxiong]'s patch. I will 
actually commit it now.

> escape1 test fails on Mac
> -
>
> Key: HIVE-15625
> URL: https://issues.apache.org/jira/browse/HIVE-15625
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15625.01.patch, HIVE-15625.02.patch, 
> HIVE-15625.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15625) escape1 test fails on Mac

2017-01-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-15625.
-
Resolution: Fixed

> escape1 test fails on Mac
> -
>
> Key: HIVE-15625
> URL: https://issues.apache.org/jira/browse/HIVE-15625
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15625.01.patch, HIVE-15625.02.patch, 
> HIVE-15625.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15649) LLAP IO may NPE on all-column read

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832723#comment-15832723
 ] 

Hive QA commented on HIVE-15649:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848421/HIVE-15649.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10967 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_stats] (batchId=3)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreWithEnvironmentContext.testEnvironmentContext
 (batchId=198)
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections 
(batchId=192)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3080/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3080/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3080/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848421 - PreCommit-HIVE-Build

> LLAP IO may NPE on all-column read
> --
>
> Key: HIVE-15649
> URL: https://issues.apache.org/jira/browse/HIVE-15649
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15649.01.patch, HIVE-15649.02.patch, 
> HIVE-15649.patch
>
>
> It seems like very few paths use READ_ALL_COLUMNS config, but some do. LLAP 
> IO doesn't account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14634) TxnHandler.heartbeatTxnRange() can be done in 1 SQL statement

2017-01-20 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-14634:
-

Assignee: Eugene Koifman

> TxnHandler.heartbeatTxnRange() can be done in 1 SQL statement
> -
>
> Key: HIVE-14634
> URL: https://issues.apache.org/jira/browse/HIVE-14634
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> performance for Streaming Ingest cases can be improved
> see comments in the code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15622) Remove HWI component from Hive

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15622:
-
Attachment: HIVE-15622.4.patch

patch 4 is for removing Jetty23Shims

> Remove HWI component from Hive
> --
>
> Key: HIVE-15622
> URL: https://issues.apache.org/jira/browse/HIVE-15622
> Project: Hive
>  Issue Type: Task
>  Components: Web UI
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15622.1.patch, HIVE-15622.2.patch, 
> HIVE-15622.3.patch, HIVE-15622.4.patch
>
>
> This component seems to be obsolete, as it didn't get any meaningful update 
> since 2012. And we don't see people discussing or complaining issues about 
> this. Moreover, it caused a number of ptest issues which can be avoided.
> We should remove this component as a cleanup effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-15622) Remove HWI component from Hive

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reopened HIVE-15622:
--

Reopening issue since I missed removing Jetty23Shims, which was only needed by 
HWI.

> Remove HWI component from Hive
> --
>
> Key: HIVE-15622
> URL: https://issues.apache.org/jira/browse/HIVE-15622
> Project: Hive
>  Issue Type: Task
>  Components: Web UI
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15622.1.patch, HIVE-15622.2.patch, 
> HIVE-15622.3.patch
>
>
> This component seems to be obsolete, as it didn't get any meaningful update 
> since 2012. And we don't see people discussing or complaining issues about 
> this. Moreover, it caused a number of ptest issues which can be avoided.
> We should remove this component as a cleanup effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15622) Remove HWI component from Hive

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15622:
-
Status: Patch Available  (was: Reopened)

> Remove HWI component from Hive
> --
>
> Key: HIVE-15622
> URL: https://issues.apache.org/jira/browse/HIVE-15622
> Project: Hive
>  Issue Type: Task
>  Components: Web UI
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15622.1.patch, HIVE-15622.2.patch, 
> HIVE-15622.3.patch, HIVE-15622.4.patch
>
>
> This component seems to be obsolete, as it didn't get any meaningful update 
> since 2012. And we don't see people discussing or complaining issues about 
> this. Moreover, it caused a number of ptest issues which can be avoided.
> We should remove this component as a cleanup effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15681:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15653) Some ALTER TABLE commands drop table stats

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832679#comment-15832679
 ] 

Hive QA commented on HIVE-15653:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848395/HIVE-15653.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 27 failed/errored test(s), 10966 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_file_format] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_skewed_table] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_not_sorted] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnStatsUpdateForStatsOptimizer_2]
 (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_alter_list_bucketing_table1]
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_like] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[describe_comment_nonascii]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_tblproperties] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_invalidation] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[unset_table_view_property]
 (batchId=47)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_predicate_pushdown]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_nonvec_table]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_table]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_table]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_table]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=152)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[unset_table_property]
 (batchId=86)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3079/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3079/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3079/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 27 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848395 - PreCommit-HIVE-Build

> Some ALTER TABLE commands drop table stats
> --
>
> Key: HIVE-15653
> URL: https://issues.apache.org/jira/browse/HIVE-15653
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>Assignee: Chaoyu Tang
>Priority: Critical
> Attachments: HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK

[jira] [Commented] (HIVE-13014) RetryingMetaStoreClient is retrying too aggresievley

2017-01-20 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832678#comment-15832678
 ] 

Eugene Koifman commented on HIVE-13014:
---

1. I've not tried to test performance but since Annotations can't change at 
runtime, I would think Java/jit should handle it.
2. It probably does, but I wasn't aiming to do a comprehensive analysis of all 
metastore call - just Acid related ones.  The title sounds too broad.

> RetryingMetaStoreClient is retrying too aggresievley
> 
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-01-20 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-15570:

Attachment: HIVE-15570.2.patch

Same exception may be thrown in following cases and is hard to debug.
1. llap mode is used but there is no llap daemon
2. llap mode is used but llap daemon is during recovery
3. container mode is used but hive.llap.client.consistent.splits is true

In the new patch, hive.llap.client.consistent.splits won't be effective if 
container mode is used. If llap mode is used but there is no running daemon, we 
fall back to locations provided by splits. Then If there is no llap daemon at 
all, LlapTaskSchedulerService will detect this and report "No LLAP Daemons are 
running"; or if llap daemon finish recovery, query can still succeed.

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13014) RetryingMetaStoreClient is retrying too aggresievley

2017-01-20 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832666#comment-15832666
 ] 

Alan Gates commented on HIVE-13014:
---

In general patch looks fine.  I have a couple of questions:
# What's the performance impact of looking up the annotations on the method 
everytime through the retry handler?  Is it enough that we should build a map 
of methods to retriability so that subsequent lookups become O(1)?
# Why does this not apply to other metastore operations, like create table?  
That would seem also to be a case where a timeout but succeeded first attempt 
could be masked by a failed second attempt.

> RetryingMetaStoreClient is retrying too aggresievley
> 
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-01-20 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-15570:
---

Assignee: Zhiyuan Yang

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15662) check startTime in SparkTask to make sure startTime is not less than submitTime

2017-01-20 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832657#comment-15832657
 ] 

zhihai xu commented on HIVE-15662:
--

Thanks for the review [~csun]! I attached a new patch HIVE-15662.001.patch 
which add the comment for this race condition. Please review thanks

> check startTime in SparkTask to make sure startTime is not less than 
> submitTime
> ---
>
> Key: HIVE-15662
> URL: https://issues.apache.org/jira/browse/HIVE-15662
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15662.000.patch, HIVE-15662.001.patch
>
>
> Check startTime in SparkTask to make sure startTime is not less than 
> submitTime. We saw a corner case when the sparkTask is finished in less than 
> 1 second, the startTime may not be set because RemoteSparkJobMonitor will 
> sleep for 1 second then check the state, in this case, right after sleep for 
> one second, the spark job is already completed. one example query with 3 
> spark tasks, the second one finished quickly for around 1 second:
> {code}
> SparkTask1:
> "finishTime":1484638391978
> "submitTime":1484638385933
> "startTime": 1484638386973
> SparkTask2:
> "finishTime":1484638393019
> "submitTime":1484638391979
> "startTime": 1484638386973
> SparkTask3:
> "finishTime":1484638432123
> "submitTime":1484638393020
> "startTime": 1484638394057
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15662) check startTime in SparkTask to make sure startTime is not less than submitTime

2017-01-20 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-15662:
-
Attachment: HIVE-15662.001.patch

> check startTime in SparkTask to make sure startTime is not less than 
> submitTime
> ---
>
> Key: HIVE-15662
> URL: https://issues.apache.org/jira/browse/HIVE-15662
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15662.000.patch, HIVE-15662.001.patch
>
>
> Check startTime in SparkTask to make sure startTime is not less than 
> submitTime. We saw a corner case when the sparkTask is finished in less than 
> 1 second, the startTime may not be set because RemoteSparkJobMonitor will 
> sleep for 1 second then check the state, in this case, right after sleep for 
> one second, the spark job is already completed. one example query with 3 
> spark tasks, the second one finished quickly for around 1 second:
> {code}
> SparkTask1:
> "finishTime":1484638391978
> "submitTime":1484638385933
> "startTime": 1484638386973
> SparkTask2:
> "finishTime":1484638393019
> "submitTime":1484638391979
> "startTime": 1484638386973
> SparkTask3:
> "finishTime":1484638432123
> "submitTime":1484638393020
> "startTime": 1484638394057
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15036) Druid code recently included in Hive pulls in GPL jar

2017-01-20 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15036:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~bslim]

> Druid code recently included in Hive pulls in GPL jar
> -
>
> Key: HIVE-15036
> URL: https://issues.apache.org/jira/browse/HIVE-15036
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Alan Gates
>Assignee: slim bouguerra
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15036.patch
>
>
> Druid pulls in a jar annotation-2.3.jar.  According to its pom file it is 
> licensed under GPL.  We cannot ship a binary distribution that includes this 
> jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2017-01-20 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15439:
--
Attachment: HIVE-15439.5.patch

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.3.patch, HIVE-15439.4.patch, 
> HIVE-15439.5.patch, HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch, 
> HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832652#comment-15832652
 ] 

Sergey Shelukhin commented on HIVE-15546:
-

That makes sense

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832653#comment-15832653
 ] 

Sergey Shelukhin commented on HIVE-15681:
-

sure

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-20 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832647#comment-15832647
 ] 

slim bouguerra commented on HIVE-15586:
---

[~ashutoshc] thanks for review.

> Make Insert and Create statement Transactional
> --
>
> Key: HIVE-15586
> URL: https://issues.apache.org/jira/browse/HIVE-15586
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
> Attachments: HIVE-15586.2.patch, HIVE-15586.patch, HIVE-15586.patch, 
> HIVE-15586.patch
>
>
> Currently insert/create will return the handle to user without waiting for 
> the data been loaded by the druid cluster. In order to avoid that will add a 
> passive wait till the segment are loaded by historical in case the 
> coordinator is UP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2017-01-20 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15439:
--
Attachment: HIVE-15439.4.patch

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.3.patch, HIVE-15439.4.patch, 
> HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch, 
> HIVE-15439.patch, HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-20 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832644#comment-15832644
 ] 

Sahil Takiar commented on HIVE-15546:
-

[~sershe] yes, when {{Utilities.getInputPaths()}} is called (it is called right 
before split generation) the method has to check if each given input path (e.g. 
a partition or a table) is empty or not. In order to do this, it has to run a 
listStatus against the actual filesystem. If it is not empty, Hive can proceed 
normally, if it is empty, then it can use {{NullScanFileSystem}} / 
{{ZeroRowsInputFormat}} to add a "dummy" path to the partition / table 
description.

In this patch, I just made the listStatus against the filesystem 
multi-threaded, since it is unavoidable. Making this listStatus multi-threaded 
should provide significant performance gains.

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-20 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15586:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Slim!

> Make Insert and Create statement Transactional
> --
>
> Key: HIVE-15586
> URL: https://issues.apache.org/jira/browse/HIVE-15586
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
> Attachments: HIVE-15586.2.patch, HIVE-15586.patch, HIVE-15586.patch, 
> HIVE-15586.patch
>
>
> Currently insert/create will return the handle to user without waiting for 
> the data been loaded by the druid cluster. In order to avoid that will add a 
> passive wait till the segment are loaded by historical in case the 
> coordinator is UP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832639#comment-15832639
 ] 

Ashutosh Chauhan commented on HIVE-15586:
-

+1

> Make Insert and Create statement Transactional
> --
>
> Key: HIVE-15586
> URL: https://issues.apache.org/jira/browse/HIVE-15586
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15586.2.patch, HIVE-15586.patch, HIVE-15586.patch, 
> HIVE-15586.patch
>
>
> Currently insert/create will return the handle to user without waiting for 
> the data been loaded by the druid cluster. In order to avoid that will add a 
> passive wait till the segment are loaded by historical in case the 
> coordinator is UP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15269) Dynamic Min-Max runtime-filtering for Tez

2017-01-20 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-15269:
--
Attachment: HIVE-15269.16.patch

> Dynamic Min-Max runtime-filtering for Tez
> -
>
> Key: HIVE-15269
> URL: https://issues.apache.org/jira/browse/HIVE-15269
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, 
> HIVE-15269.12.patch, HIVE-15269.13.patch, HIVE-15269.14.patch, 
> HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.1.patch, 
> HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, 
> HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, 
> HIVE-15269.8.patch, HIVE-15269.9.patch
>
>
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on (store.id = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that 
> come out of the scan/filter of the store table, and send this min/max value 
> (via Tez edge) to the task which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where 
> this predicate can be pushed down to the storage handler (for example for ORC 
> formats). Pushing a min/max predicate to the ORC reader would allow us to 
> avoid having to entire whole row groups during the table scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832628#comment-15832628
 ] 

Hive QA commented on HIVE-15586:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848382/HIVE-15586.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10945 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=161)

[infer_bucket_sort_num_buckets.q,gen_udf_example_add10.q,insert_overwrite_directory2.q,orc_merge5.q,bucketmapjoin6.q,import_exported_table.q,vector_outer_join0.q,orc_merge4.q,temp_table_external.q,orc_merge_incompat1.q,root_dir_external_table.q,constprog_semijoin.q,auto_sortmerge_join_16.q,schemeAuthority.q,index_bitmap3.q,external_table_with_space_in_location_path.q,parallel_orderby.q,infer_bucket_sort_map_operators.q,bucketizedhiveinputformat.q,remote_script.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3078/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3078/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3078/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848382 - PreCommit-HIVE-Build

> Make Insert and Create statement Transactional
> --
>
> Key: HIVE-15586
> URL: https://issues.apache.org/jira/browse/HIVE-15586
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15586.2.patch, HIVE-15586.patch, HIVE-15586.patch, 
> HIVE-15586.patch
>
>
> Currently insert/create will return the handle to user without waiting for 
> the data been loaded by the druid cluster. In order to avoid that will add a 
> passive wait till the segment are loaded by historical in case the 
> coordinator is UP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15651) LLAP: llap status tool enhancements

2017-01-20 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832625#comment-15832625
 ] 

Prasanth Jayachandran commented on HIVE-15651:
--

[~sershe] [~sseth] is out. Can you please take a look?

> LLAP: llap status tool enhancements
> ---
>
> Key: HIVE-15651
> URL: https://issues.apache.org/jira/browse/HIVE-15651
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15651.1.patch, HIVE-15651.2.patch
>
>
> Per [~sseth] following enhancements can be made to llap status tool
> 1) If state changes from an ACTIVE state to STOPPED - terminate the script 
> immediately (fail fast)
> 2) Add a threshold of what is acceptable in terms of the running state - 
> RUNNING_PARTIAL may be ok if 80% nodes are up for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15685) count(distinct) generates different result than expected

2017-01-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832613#comment-15832613
 ] 

Ashutosh Chauhan commented on HIVE-15685:
-

+1

> count(distinct) generates different result than expected
> 
>
> Key: HIVE-15685
> URL: https://issues.apache.org/jira/browse/HIVE-15685
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15685.01.patch
>
>
> Following query with count(distinct) generates different result than expected 
> on hive master:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
> {noformat}
> Expected output generated using postgres:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
>  count  | count 
> +---
>  24 |  1823
> (1 row)
> {noformat}
> Actual output
> {noformat}
> 241824
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-01-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832589#comment-15832589
 ] 

Hadoop QA commented on HIVE-15686:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HADOOP-14015 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-14015 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12848638/HADOOP-14015.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11482/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Mithun Radhakrishnan
> Attachments: HADOOP-14015.1.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832582#comment-15832582
 ] 

Xuefu Zhang edited comment on HIVE-15671 at 1/20/17 11:12 PM:
--

Patch #1 followed what [~vanzin] suggested. With it, I observed the following 
behavior:

1. Increasing *server.connect.timeout* will make hive wait longer for the 
driver to connect back, which solves the busy cluster problem.
2. Killing driver while the job is running immediately fails the query on Hive 
side with the following error:
{code}
2017-01-20 22:01:08,235 Stage-2_0: 7(+3)/685Stage-3_0: 0/1  
2017-01-20 22:01:09,237 Stage-2_0: 16(+6)/685   Stage-3_0: 0/1  
Failed to monitor Job[ 1] with exception 'java.lang.IllegalStateException(RPC 
channel is closed.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
{code}

This meets my expectation.

However, I didn't test the case of driver death before connecting back to Hive. 
(It's also hard to construct such a test case.) In that case, I assume that 
Hive will wait for *server.connect.timeout* before declaring a failure. I guess 
there isn't much we can do for this case. I don't think the change here has any 
implication on this.


was (Author: xuefuz):
Patch #1 followed what [~vanzin] suggested. With it, I observed the following 
behavior:

1. Increasing *server.connect.timeout* will make hive wait longer for the 
driver to connect back, which solves the busy cluster problem.
2. Killing driver while the job is running immediately fails the query on Hive 
side with the following error:
{code}
2017-01-20 22:01:08,235 Stage-2_0: 7(+3)/685Stage-3_0: 0/1  
2017-01-20 22:01:09,237 Stage-2_0: 16(+6)/685   Stage-3_0: 0/1  
Failed to monitor Job[ 1] with exception 'java.lang.IllegalStateException(RPC 
channel is closed.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
{code}

This meets my expectation.

However, I didn't test the case of driver death before connecting back to Hive. 
(It's also hard to construct such a test case.) In that case, I assume that 
Hive will wait for *server.connect.timeout* before declares a failure. I guess 
there isn't much we can do for this case. I don't think the change here has any 
implication on this.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15646) Column level lineage is not available for table Views

2017-01-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15646:
---
Status: Patch Available  (was: Open)

> Column level lineage is not available for table Views
> -
>
> Key: HIVE-15646
> URL: https://issues.apache.org/jira/browse/HIVE-15646
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15646.01.patch, HIVE-15646.02.patch, 
> HIVE-15646.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15646) Column level lineage is not available for table Views

2017-01-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15646:
---
Status: Open  (was: Patch Available)

> Column level lineage is not available for table Views
> -
>
> Key: HIVE-15646
> URL: https://issues.apache.org/jira/browse/HIVE-15646
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15646.01.patch, HIVE-15646.02.patch, 
> HIVE-15646.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15646) Column level lineage is not available for table Views

2017-01-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15646:
---
Attachment: HIVE-15646.03.patch

> Column level lineage is not available for table Views
> -
>
> Key: HIVE-15646
> URL: https://issues.apache.org/jira/browse/HIVE-15646
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15646.01.patch, HIVE-15646.02.patch, 
> HIVE-15646.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832582#comment-15832582
 ] 

Xuefu Zhang commented on HIVE-15671:


Patch #1 followed what [~vanzin] suggested. With it, I observed the following 
behavior:

1. Increasing *server.connect.timeout* will make hive wait longer for the 
driver to connect back, which solves the busy cluster problem.
2. Killing driver while the job is running immediately fails the query on Hive 
side with the following error:
{code}
2017-01-20 22:01:08,235 Stage-2_0: 7(+3)/685Stage-3_0: 0/1  
2017-01-20 22:01:09,237 Stage-2_0: 16(+6)/685   Stage-3_0: 0/1  
Failed to monitor Job[ 1] with exception 'java.lang.IllegalStateException(RPC 
channel is closed.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
{code}

This meets my expectation.

However, I didn't test the case of driver death before connecting back to Hive. 
(It's also hard to construct such a test case.) In that case, I assume that 
Hive will wait for *server.connect.timeout* before declares a failure. I guess 
there isn't much we can do for this case. I don't think the change here has any 
implication on this.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15554) Add task information to LLAP AM heartbeat

2017-01-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15554:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> Add task information to LLAP AM heartbeat
> -
>
> Key: HIVE-15554
> URL: https://issues.apache.org/jira/browse/HIVE-15554
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15554.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15617) Improve the avg performance for Range based window

2017-01-20 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15617:

Attachment: HIVE-15617.2.patch

patch-2: previous patch would also count null in the rows. Change to 
mergePartial method to aggregate.

> Improve the avg performance for Range based window
> --
>
> Key: HIVE-15617
> URL: https://issues.apache.org/jira/browse/HIVE-15617
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Affects Versions: 1.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15617.1.patch, HIVE-15617.2.patch
>
>
> Similar to HIVE-15520, we need to improve the performance for avg().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-01-20 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang moved HADOOP-14015 to HIVE-15686:
-

Affects Version/s: (was: 2.1.1-beta)
   (was: 1.2.1)
   1.2.1
  Key: HIVE-15686  (was: HADOOP-14015)
  Project: Hive  (was: Hadoop Common)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Mithun Radhakrishnan
> Attachments: HADOOP-14015.1.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-15671:
---
Attachment: HIVE-15671.1.patch

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15684) Wrong posBigTable used in VectorMapJoinOuterFilteredOperator

2017-01-20 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832524#comment-15832524
 ] 

Gopal V commented on HIVE-15684:


LGTM - +1 tests pending

> Wrong posBigTable used in VectorMapJoinOuterFilteredOperator
> 
>
> Key: HIVE-15684
> URL: https://issues.apache.org/jira/browse/HIVE-15684
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15684.01.patch
>
>
> Wrong posBigTable used when accessing inputObjInspectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832523#comment-15832523
 ] 

Hive QA commented on HIVE-15439:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848449/HIVE-15439.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3077/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3077/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3077/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-20 22:34:44.797
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3077/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-20 22:34:44.800
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   811b3e3..ab7f6f3  master -> origin/master
   40dab03..1695f68  storage-branch-2.2 -> origin/storage-branch-2.2
+ git reset --hard HEAD
HEAD is now at 811b3e3 HIVE-15580: Eliminate unbounded memory usage for orderBy 
and groupBy in Hive on Spark (reviewed by Chao Sun)
+ git clean -f -d
Removing pom.xml.orig
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at ab7f6f3 HIVE-15390 : Orc reader unnecessarily reading stripe 
footers with hive.optimize.index.filter set to true (Abhishek Somani, reviewed 
by Sergey Shelukhin and Prasanth Jayachandran)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-20 22:34:46.489
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
accumulo-handler/src/test/results/positive/accumulo_single_sourced_multi_insert.q.out:1
error: 
accumulo-handler/src/test/results/positive/accumulo_single_sourced_multi_insert.q.out:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848449 - PreCommit-HIVE-Build

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.3.patch, HIVE-15439.patch, HIVE-15439.patch, 
> HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15036) Druid code recently included in Hive pulls in GPL jar

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832517#comment-15832517
 ] 

Hive QA commented on HIVE-15036:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848373/HIVE-15036.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10951 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=152)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=96)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3076/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3076/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3076/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848373 - PreCommit-HIVE-Build

> Druid code recently included in Hive pulls in GPL jar
> -
>
> Key: HIVE-15036
> URL: https://issues.apache.org/jira/browse/HIVE-15036
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Alan Gates
>Assignee: slim bouguerra
>Priority: Blocker
> Attachments: HIVE-15036.patch
>
>
> Druid pulls in a jar annotation-2.3.jar.  According to its pom file it is 
> licensed under GPL.  We cannot ship a binary distribution that includes this 
> jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15685) count(distinct) generates different result than expected

2017-01-20 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832506#comment-15832506
 ] 

Pengcheng Xiong commented on HIVE-15685:


[~ashutoshc], could u take a look? Thanks.

> count(distinct) generates different result than expected
> 
>
> Key: HIVE-15685
> URL: https://issues.apache.org/jira/browse/HIVE-15685
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15685.01.patch
>
>
> Following query with count(distinct) generates different result than expected 
> on hive master:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
> {noformat}
> Expected output generated using postgres:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
>  count  | count 
> +---
>  24 |  1823
> (1 row)
> {noformat}
> Actual output
> {noformat}
> 241824
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832507#comment-15832507
 ] 

Wei Zheng commented on HIVE-15681:
--

Oh I see. If that's the case, I will commit another patch for HWI removal 
ticket by also removing Jetty23Shims. Does this sound right?

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15685) count(distinct) generates different result than expected

2017-01-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15685:
---
Attachment: HIVE-15685.01.patch

> count(distinct) generates different result than expected
> 
>
> Key: HIVE-15685
> URL: https://issues.apache.org/jira/browse/HIVE-15685
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15685.01.patch
>
>
> Following query with count(distinct) generates different result than expected 
> on hive master:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
> {noformat}
> Expected output generated using postgres:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
>  count  | count 
> +---
>  24 |  1823
> (1 row)
> {noformat}
> Actual output
> {noformat}
> 241824
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15685) count(distinct) generates different result than expected

2017-01-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15685:
---
Status: Patch Available  (was: Open)

> count(distinct) generates different result than expected
> 
>
> Key: HIVE-15685
> URL: https://issues.apache.org/jira/browse/HIVE-15685
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15685.01.patch
>
>
> Following query with count(distinct) generates different result than expected 
> on hive master:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
> {noformat}
> Expected output generated using postgres:
> {noformat}
> select count(distinct ss_ticket_number), count(distinct ss_sold_date_sk) from 
> store_sales;
>  count  | count 
> +---
>  24 |  1823
> (1 row)
> {noformat}
> Actual output
> {noformat}
> 241824
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832501#comment-15832501
 ] 

Sergey Shelukhin commented on HIVE-15681:
-

Why is Jetty23Shims required? IIRC it was only used for HWI

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832495#comment-15832495
 ] 

Wei Zheng commented on HIVE-15681:
--

It's required by Jetty23Shims when compiling against Hadoop 3. I was about to 
ask you to review :)

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15684) Wrong posBigTable used in VectorMapJoinOuterFilteredOperator

2017-01-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15684:

Status: Patch Available  (was: Open)

> Wrong posBigTable used in VectorMapJoinOuterFilteredOperator
> 
>
> Key: HIVE-15684
> URL: https://issues.apache.org/jira/browse/HIVE-15684
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15684.01.patch
>
>
> Wrong posBigTable used when accessing inputObjInspectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15684) Wrong posBigTable used in VectorMapJoinOuterFilteredOperator

2017-01-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15684:

Attachment: HIVE-15684.01.patch

> Wrong posBigTable used in VectorMapJoinOuterFilteredOperator
> 
>
> Key: HIVE-15684
> URL: https://issues.apache.org/jira/browse/HIVE-15684
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15684.01.patch
>
>
> Wrong posBigTable used when accessing inputObjInspectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832479#comment-15832479
 ] 

Sergey Shelukhin commented on HIVE-15681:
-

Why do we need it?

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15658) hive.ql.session.SessionState start() is not atomic, SessionState thread local variable can get into inconsistent state

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832465#comment-15832465
 ] 

Sergey Shelukhin edited comment on HIVE-15658 at 1/20/17 9:54 PM:
--

In the last catch of the SessionState.java part of the patch, can you add a 
call to tezSessionState.close, if present, protected by try-catch? 
Doesn't look like it should fail leaving it in open state, but I wonder if open 
(and esp. beginOpen) calls are atomic... we'd rather not have that leak. 

Also why is HiveSessionImpl change necessary? As far as I understand, the 
session at that point is already valid. At least, it's retained in the field 
(for future use?) even though we detach it.


was (Author: sershe):
In the last catch of the SessionState.java part of the patch, can you add a 
call to tezSessionState.close, if present, protected by try-catch? 
Doesn't look like it should fail leaving it in open state, but I wonder if open 
(and esp. beginOpen) calls are atomic... we'd rather not have that leak. 

Also why is HiveSessionImpl change necessary? As far as I understand, the 
session at that point is already valid. At least, it's retained in the field at 
the same time as we detach it.

> hive.ql.session.SessionState start() is not atomic, SessionState thread local 
> variable can get into inconsistent state
> --
>
> Key: HIVE-15658
> URL: https://issues.apache.org/jira/browse/HIVE-15658
> Project: Hive
>  Issue Type: Bug
>  Components: API, HCatalog
>Affects Versions: 1.1.0, 1.2.1, 2.0.0, 2.0.1
> Environment: CDH5.8.0, Flume 1.6.0, Hive 1.1.0
>Reporter: Michal Klempa
> Attachments: HIVE-15658_branch-1.2_1.patch, 
> HIVE-15658_branch-2.1_1.patch
>
>
> Method start() in hive.ql.session.SessionState is supposed to setup needed 
> preconditions, like HDFS scratch directories for session.
> This happens to be not an atomic operation with setting thread local 
> variable, which can later be obtained by calling SessionState.get().
> Therefore, even is the start() method itself fails, the SessionState.get() 
> does not return null and further re-use of the thread which previously 
> invoked start() may lead to obtaining SessionState object in inconsistent 
> state.
> I have observed this using Flume Hive Sink, which uses Hive Streaming 
> interface. When the directory /tmp/hive is not writable by session user, the 
> start() method fails (throwing RuntimeException). If the thread is re-used 
> (like it is in Flume), further executions work with wrongly initialized 
> SessionState object (HDFS dirs are non-existent). In Flume, this happens to 
> me when Flume should create partition if not exists (but the code doing this 
> is in Hive Streaming).
> Steps to reproduce:
> 0. create test spooldir and allow flume to write to it, in my case 
> /home/ubuntu/flume_test, 775, ubuntu:flume
> 1. create Flume config (see attachment)
> 2. create Hive table
> {code}
> create table default.flume_test (column1 string, column2 string) partitioned 
> by (dt string) clustered by (column1) INTO 2 BUCKETS STORED AS ORC;
> {code}
> 3. start flume agent:
> {code}
> bin/flume-ng agent -n a1 -c conf -f conf/flume-config.txt
> {code}
> 4. hdfs dfs -chmod 600 /tmp/hive
> 5. put this file into spooldir:
> {code}
> echo value1,value2 > file1
> {code}
> Expected behavior:
> Exception regarding scratch dir permissions to be thrown repeatedly.
> example (note that the line numbers are wrong as Cloudera is cloning the 
> source codes here https://github.com/cloudera/flume-ng/ and here 
> https://github.com/cloudera/hive):
> {code}
> 2017-01-18 12:39:38,926 WARN org.apache.flume.sink.hive.HiveSink: sink_hive_1 
> : Failed connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to 
> EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', database='default', 
> table='flume_test', partitionVals=[20170118] } 
> at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:99)
> at 
> org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
> at 
> org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:296)
> at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:254)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed 
> connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083',

[jira] [Commented] (HIVE-15658) hive.ql.session.SessionState start() is not atomic, SessionState thread local variable can get into inconsistent state

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832465#comment-15832465
 ] 

Sergey Shelukhin commented on HIVE-15658:
-

In the last catch of the SessionState.java part of the patch, can you add a 
call to tezSessionState.close, if present, protected by try-catch? 
Doesn't look like it should fail leaving it in open state, but I wonder if open 
(and esp. beginOpen) calls are atomic... we'd rather not have that leak. 

Also why is HiveSessionImpl change necessary? As far as I understand, the 
session at that point is already valid. At least, it's retained in the field at 
the same time as we detach it.

> hive.ql.session.SessionState start() is not atomic, SessionState thread local 
> variable can get into inconsistent state
> --
>
> Key: HIVE-15658
> URL: https://issues.apache.org/jira/browse/HIVE-15658
> Project: Hive
>  Issue Type: Bug
>  Components: API, HCatalog
>Affects Versions: 1.1.0, 1.2.1, 2.0.0, 2.0.1
> Environment: CDH5.8.0, Flume 1.6.0, Hive 1.1.0
>Reporter: Michal Klempa
> Attachments: HIVE-15658_branch-1.2_1.patch, 
> HIVE-15658_branch-2.1_1.patch
>
>
> Method start() in hive.ql.session.SessionState is supposed to setup needed 
> preconditions, like HDFS scratch directories for session.
> This happens to be not an atomic operation with setting thread local 
> variable, which can later be obtained by calling SessionState.get().
> Therefore, even is the start() method itself fails, the SessionState.get() 
> does not return null and further re-use of the thread which previously 
> invoked start() may lead to obtaining SessionState object in inconsistent 
> state.
> I have observed this using Flume Hive Sink, which uses Hive Streaming 
> interface. When the directory /tmp/hive is not writable by session user, the 
> start() method fails (throwing RuntimeException). If the thread is re-used 
> (like it is in Flume), further executions work with wrongly initialized 
> SessionState object (HDFS dirs are non-existent). In Flume, this happens to 
> me when Flume should create partition if not exists (but the code doing this 
> is in Hive Streaming).
> Steps to reproduce:
> 0. create test spooldir and allow flume to write to it, in my case 
> /home/ubuntu/flume_test, 775, ubuntu:flume
> 1. create Flume config (see attachment)
> 2. create Hive table
> {code}
> create table default.flume_test (column1 string, column2 string) partitioned 
> by (dt string) clustered by (column1) INTO 2 BUCKETS STORED AS ORC;
> {code}
> 3. start flume agent:
> {code}
> bin/flume-ng agent -n a1 -c conf -f conf/flume-config.txt
> {code}
> 4. hdfs dfs -chmod 600 /tmp/hive
> 5. put this file into spooldir:
> {code}
> echo value1,value2 > file1
> {code}
> Expected behavior:
> Exception regarding scratch dir permissions to be thrown repeatedly.
> example (note that the line numbers are wrong as Cloudera is cloning the 
> source codes here https://github.com/cloudera/flume-ng/ and here 
> https://github.com/cloudera/hive):
> {code}
> 2017-01-18 12:39:38,926 WARN org.apache.flume.sink.hive.HiveSink: sink_hive_1 
> : Failed connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to 
> EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', database='default', 
> table='flume_test', partitionVals=[20170118] } 
> at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:99)
> at 
> org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
> at 
> org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:296)
> at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:254)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed 
> connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> at 
> org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:380)
> at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:86)
> ... 6 more
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root 
> scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: 
> rw---
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:540)
> at 
>

[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832435#comment-15832435
 ] 

Hive QA commented on HIVE-14949:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848378/HIVE-14949.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10966 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge 
(batchId=263)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testDynamicPartitionsMerge
 (batchId=273)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDynamicPartitionsMerge
 (batchId=270)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3075/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3075/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3075/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848378 - PreCommit-HIVE-Build

> Enforce that target:source is not 1:N
> -
>
> Key: HIVE-14949
> URL: https://issues.apache.org/jira/browse/HIVE-14949
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14949.01.patch
>
>
> If > 1 row on source side matches the same row on target side that means that 
>  we are forced update (or delete) the same row in target more than once as 
> part of the same SQL statement.  This should raise an error per SQL Spec
> ISO/IEC 9075-2:2011(E)
> Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B
> There is no sure way to do this via static analysis of the query.
> Can we add something to ROJ operator to pay attention to ROW__ID of target 
> side row and compare it with ROW__ID of target side of previous row output?  
> If they are the same, that means > 1 source row matched.
> Or perhaps just mark each row in the hash table that it matched.  And if it 
> matches again, throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15662) check startTime in SparkTask to make sure startTime is not less than submitTime

2017-01-20 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832429#comment-15832429
 ] 

Chao Sun commented on HIVE-15662:
-

+1. Maybe better to add a comment above the code to explain when that will 
happen. Thanks.

> check startTime in SparkTask to make sure startTime is not less than 
> submitTime
> ---
>
> Key: HIVE-15662
> URL: https://issues.apache.org/jira/browse/HIVE-15662
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15662.000.patch
>
>
> Check startTime in SparkTask to make sure startTime is not less than 
> submitTime. We saw a corner case when the sparkTask is finished in less than 
> 1 second, the startTime may not be set because RemoteSparkJobMonitor will 
> sleep for 1 second then check the state, in this case, right after sleep for 
> one second, the spark job is already completed. one example query with 3 
> spark tasks, the second one finished quickly for around 1 second:
> {code}
> SparkTask1:
> "finishTime":1484638391978
> "submitTime":1484638385933
> "startTime": 1484638386973
> SparkTask2:
> "finishTime":1484638393019
> "submitTime":1484638391979
> "startTime": 1484638386973
> SparkTask3:
> "finishTime":1484638432123
> "submitTime":1484638393020
> "startTime": 1484638394057
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832395#comment-15832395
 ] 

Anthony Hsu commented on HIVE-15680:


[~gopalv]: I only tested with MRv2. Not sure about other execution engines but 
I will test.

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13014) RetryingMetaStoreClient is retrying too aggresievley

2017-01-20 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832394#comment-15832394
 ] 

Eugene Koifman commented on HIVE-13014:
---

no related test failures

> RetryingMetaStoreClient is retrying too aggresievley
> 
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark

2017-01-20 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-15527.

Resolution: Not A Problem

> Memory usage is unbound in SortByShuffler for Spark
> ---
>
> Key: HIVE-15527
> URL: https://issues.apache.org/jira/browse/HIVE-15527
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Chao Sun
> Attachments: HIVE-15527.0.patch, HIVE-15527.0.patch, 
> HIVE-15527.1.patch, HIVE-15527.2.patch, HIVE-15527.3.patch, 
> HIVE-15527.4.patch, HIVE-15527.5.patch, HIVE-15527.6.patch, 
> HIVE-15527.7.patch, HIVE-15527.8.patch, HIVE-15527.patch
>
>
> In SortByShuffler.java, an ArrayList is used to back the iterator for values 
> that have the same key in shuffled result produced by spark transformation 
> sortByKey. It's possible that memory can be exhausted because of a large key 
> group.
> {code}
> @Override
> public Tuple2 next() {
>   // TODO: implement this by accumulating rows with the same key 
> into a list.
>   // Note that this list needs to improved to prevent excessive 
> memory usage, but this
>   // can be done in later phase.
>   while (it.hasNext()) {
> Tuple2 pair = it.next();
> if (curKey != null && !curKey.equals(pair._1())) {
>   HiveKey key = curKey;
>   List values = curValues;
>   curKey = pair._1();
>   curValues = new ArrayList();
>   curValues.add(pair._2());
>   return new Tuple2(key, 
> values);
> }
> curKey = pair._1();
> curValues.add(pair._2());
>   }
>   if (curKey == null) {
> throw new NoSuchElementException();
>   }
>   // if we get here, this should be the last element we have
>   HiveKey key = curKey;
>   curKey = null;
>   return new Tuple2(key, 
> curValues);
> }
> {code}
> Since the output from sortByKey is already sorted on key, it's possible to 
> backup the value iterable using the same input iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark

2017-01-20 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-15527:
---
Status: Open  (was: Patch Available)

The issue was addressed in HIVE-15580. Cancel the patch here and close this for 
now.

> Memory usage is unbound in SortByShuffler for Spark
> ---
>
> Key: HIVE-15527
> URL: https://issues.apache.org/jira/browse/HIVE-15527
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Chao Sun
> Attachments: HIVE-15527.0.patch, HIVE-15527.0.patch, 
> HIVE-15527.1.patch, HIVE-15527.2.patch, HIVE-15527.3.patch, 
> HIVE-15527.4.patch, HIVE-15527.5.patch, HIVE-15527.6.patch, 
> HIVE-15527.7.patch, HIVE-15527.8.patch, HIVE-15527.patch
>
>
> In SortByShuffler.java, an ArrayList is used to back the iterator for values 
> that have the same key in shuffled result produced by spark transformation 
> sortByKey. It's possible that memory can be exhausted because of a large key 
> group.
> {code}
> @Override
> public Tuple2 next() {
>   // TODO: implement this by accumulating rows with the same key 
> into a list.
>   // Note that this list needs to improved to prevent excessive 
> memory usage, but this
>   // can be done in later phase.
>   while (it.hasNext()) {
> Tuple2 pair = it.next();
> if (curKey != null && !curKey.equals(pair._1())) {
>   HiveKey key = curKey;
>   List values = curValues;
>   curKey = pair._1();
>   curValues = new ArrayList();
>   curValues.add(pair._2());
>   return new Tuple2(key, 
> values);
> }
> curKey = pair._1();
> curValues.add(pair._2());
>   }
>   if (curKey == null) {
> throw new NoSuchElementException();
>   }
>   // if we get here, this should be the last element we have
>   HiveKey key = curKey;
>   curKey = null;
>   return new Tuple2(key, 
> curValues);
> }
> {code}
> Since the output from sortByKey is already sorted on key, it's possible to 
> backup the value iterable using the same input iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13282) GroupBy and select operator encounter ArrayIndexOutOfBoundsException

2017-01-20 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832388#comment-15832388
 ] 

Matt McCline commented on HIVE-13282:
-

Vikram had worked on it.  I took over.  There were 2 issues I recall.  One was 
the index out of bounds but when that was fixed another issue wrong results 
appeared.  It had to do with handling of operator close when there are multiple 
parents.  I wasn't able to work that one very well.

> GroupBy and select operator encounter ArrayIndexOutOfBoundsException
> 
>
> Key: HIVE-13282
> URL: https://issues.apache.org/jira/browse/HIVE-13282
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13282.01.patch, smb_fail_issue.patch, 
> smb_groupby.q, smb_groupby.q.out
>
>
> The group by and select operators run into the ArrayIndexOutOfBoundsException 
> when they incorrectly initialize themselves with tag 0 but the incoming tag 
> id is different.
> {code}
> select count(*) from
> (select rt1.id from
> (select t1.key as id, t1.value as od from tab t1 group by key, value) rt1) vt1
> join
> (select rt2.id from
> (select t2.key as id, t2.value as od from tab_part t2 group by key, value) 
> rt2) vt2
> where vt1.id=vt2.id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15664:

Attachment: (was: HIVE-15664.WIP.patch)

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15580) Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-15580:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks, Chao!

> Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
> -
>
> Key: HIVE-15580
> URL: https://issues.apache.org/jira/browse/HIVE-15580
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15580.1.patch, HIVE-15580.1.patch, 
> HIVE-15580.2.patch, HIVE-15580.2.patch, HIVE-15580.3.patch, 
> HIVE-15580.4.patch, HIVE-15580.5.patch, HIVE-15580.patch
>
>
> Currently, orderBy (sortBy) and groupBy in Hive on Spark uses unbounded 
> memory. For orderBy, Hive accumulates key groups using ArrayList (described 
> in HIVE-15527). For groupBy, Hive currently uses Spark's groupByKey operator, 
> which has a shortcoming of not being able to spill to disk within a key 
> group. Thus, for large key group, memory usage is also unbounded.
> It's likely that this will impact performance. We will profile and optimize 
> afterwards. We could also make this change configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15580) Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832383#comment-15832383
 ] 

Xuefu Zhang commented on HIVE-15580:


Thanks, Chao! I will commit this first and create a couple of followups. 
[~lirui], it would be great if you can also take a look at the patch. I will 
incorporate your comments (if any) as followups as well. Thanks.

> Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
> -
>
> Key: HIVE-15580
> URL: https://issues.apache.org/jira/browse/HIVE-15580
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15580.1.patch, HIVE-15580.1.patch, 
> HIVE-15580.2.patch, HIVE-15580.2.patch, HIVE-15580.3.patch, 
> HIVE-15580.4.patch, HIVE-15580.5.patch, HIVE-15580.patch
>
>
> Currently, orderBy (sortBy) and groupBy in Hive on Spark uses unbounded 
> memory. For orderBy, Hive accumulates key groups using ArrayList (described 
> in HIVE-15527). For groupBy, Hive currently uses Spark's groupByKey operator, 
> which has a shortcoming of not being able to spill to disk within a key 
> group. Thus, for large key group, memory usage is also unbounded.
> It's likely that this will impact performance. We will profile and optimize 
> afterwards. We could also make this change configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832381#comment-15832381
 ] 

Sergey Shelukhin commented on HIVE-15664:
-

Yes, why? :)

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch, HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15580) Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832377#comment-15832377
 ] 

Chao Sun commented on HIVE-15580:
-

+1

> Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
> -
>
> Key: HIVE-15580
> URL: https://issues.apache.org/jira/browse/HIVE-15580
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15580.1.patch, HIVE-15580.1.patch, 
> HIVE-15580.2.patch, HIVE-15580.2.patch, HIVE-15580.3.patch, 
> HIVE-15580.4.patch, HIVE-15580.5.patch, HIVE-15580.patch
>
>
> Currently, orderBy (sortBy) and groupBy in Hive on Spark uses unbounded 
> memory. For orderBy, Hive accumulates key groups using ArrayList (described 
> in HIVE-15527). For groupBy, Hive currently uses Spark's groupByKey operator, 
> which has a shortcoming of not being able to spill to disk within a key 
> group. Thus, for large key group, memory usage is also unbounded.
> It's likely that this will impact performance. We will profile and optimize 
> afterwards. We could also make this change configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14707) ACID: Insert shuffle sort-merges on blank KEY

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832375#comment-15832375
 ] 

Hive QA commented on HIVE-14707:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848376/HIVE-14707.24.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3074/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3074/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3074/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-20 20:41:04.385
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3074/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-20 20:41:04.388
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at f968cf7 HIVE-15297: Hive should not split semicolon within 
quoted string literals (Pengcheng Xiong, reviewed by Ashutosh Chauhan) 
(addendum V)
+ git clean -f -d
Removing 
common/src/java/org/apache/hadoop/hive/common/classification/RetrySemantics.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at f968cf7 HIVE-15297: Hive should not split semicolon within 
quoted string literals (Pengcheng Xiong, reviewed by Ashutosh Chauhan) 
(addendum V)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-20 20:41:05.410
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: ql/src/test/results/clientpositive/perf/query87.q.out:16
error: ql/src/test/results/clientpositive/perf/query87.q.out: patch does not 
apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848376 - PreCommit-HIVE-Build

> ACID: Insert shuffle sort-merges on blank KEY
> -
>
> Key: HIVE-14707
> URL: https://issues.apache.org/jira/browse/HIVE-14707
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Eugene Koifman
> Attachments: HIVE-14707.01.patch, HIVE-14707.02.patch, 
> HIVE-14707.03.patch, HIVE-14707.04.patch, HIVE-14707.05.patch, 
> HIVE-14707.06.patch, HIVE-14707.08.patch, HIVE-14707.09.patch, 
> HIVE-14707.10.patch, HIVE-14707.11.patch, HIVE-14707.13.patch, 
> HIVE-14707.14.patch, HIVE-14707.16.patch, HIVE-14707.17.patch, 
> HIVE-14707.18.patch, HIVE-14707.19.patch, HIVE-14707.19.patch, 
> HIVE-14707.20.patch, HIVE-14707.21.patch, HIVE-14707.22.patch, 
> HIVE-14707.23.patch, HIVE-14707.24.patch
>
>
> The ACID insert codepath uses a sorted shuffle, while they key used for 
> shuffle is always 0 bytes long.
> {code}
> hive (sales_acid)> explain insert into sales values(1, 2, 
> '3400---009', 1, null);
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: gopal_20160906172626_80261c4c-79cc-4e02-87fe-3133be404e55:2
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: values__tmp__table__2
>   Statistics: Num rows: 1 Data size: 28 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: tmp_values_col1 (type: string), 
> tmp_values_col2 (type: string), tmp_values_col3 (type: string), 
>

[jira] [Commented] (HIVE-13014) RetryingMetaStoreClient is retrying too aggresievley

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832372#comment-15832372
 ] 

Hive QA commented on HIVE-13014:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848366/HIVE-13014.07.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10968 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3073/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3073/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3073/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848366 - PreCommit-HIVE-Build

> RetryingMetaStoreClient is retrying too aggresievley
> 
>
> Key: HIVE-13014
> URL: https://issues.apache.org/jira/browse/HIVE-13014
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-13014.01.patch, HIVE-13014.02.patch, 
> HIVE-13014.03.patch, HIVE-13014.04.patch, HIVE-13014.05.patch, 
> HIVE-13014.06.patch, HIVE-13014.07.patch
>
>
> Not all metastore operations are idempotent.  For example, commit_txn() 
> consists of 
> 1. request from client to server
> 2. server action
> 3. ack to client
> If network connection is broken after (or during) 2 but before 3 happens, 
> RetryingMetastoreClient will retry the operation thus causing an attempt to 
> commit the same txn twice (sometimes in concurrently)
> The 2nd attempt is guaranteed to fail and thus return an error to the caller 
> (which doesn't know the operation is being retried), while the first attempt 
> has actually succeeded.  Thus the caller thinks commit failed and will likely 
> attempt to redo the transactions - not what we want in most cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15681:
-
Status: Patch Available  (was: Open)

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15681:
-
Attachment: HIVE-15681.1.patch

> Pull specified version of jetty for Hive
> 
>
> Key: HIVE-15681
> URL: https://issues.apache.org/jira/browse/HIVE-15681
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15681.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832257#comment-15832257
 ] 

Marcelo Vanzin commented on HIVE-15671:
---

bq. I got a different problem when the driver suddenly dies (due to OOM, for 
instance) ... Hive wouldn't detect the driver was gone until 10m later.

If you mean it dies before the SASL handshake is complete, then in that case 
maybe my understanding that the server timeout applies to the whole connection 
+ handshake is wrong and that should be fixed. i.e. the timeout set up in 
{{registerClient}} should apply to the whole handshake and not only until 
there's a connection.

But if it dies after the SASL handshake, then it seems like the problem is 
somewhere else and shouldn't really be related to either of these timeouts.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832245#comment-15832245
 ] 

Xuefu Zhang commented on HIVE-15671:


[~vanzin], thanks for your insight. I think we are approaching to something. 
I'm going to change #2 to use {{getConnectTimeoutMs()}} and try it out. Naming 
is one thing, but yes, the server-side timeout should be bigger. When I tested 
with my patch, I actually made *client.connect.timeout* much bigger than 
*server.connect.timeout* and that's why I didn't have the problem that [~lirui] 
got. 

{quote}That's kinda hard to solve, because the server doesn't know which client 
connected until...{quote}
My original problem (with no patch so ever) was about a busy cluster where it 
took longer time (up to 10m) to get a container to run the driver. To overcome 
that, I increased *server.connect.timeout* to 10m which worked. With that, 
however, I got a different problem when the driver suddenly dies (due to OOM, 
for instance), at which point the driver had already connected back to Hive and 
the job was running. In such a case, Hive wouldn't detect the driver was gone 
until 10m later. My patch here was to solve this problem.

With the new understanding, I'd like to make sure that both the problems are 
solved: 1. user should be able to increase *server.connect.timeout* to handler 
longer startup of the driver. 2. Hive should be able to immediately detect the 
death of the driver (after connection has been made).

Any additional thoughts?

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10562) Add version column to NOTIFICATION_LOG table and DbNotificationListener

2017-01-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832240#comment-15832240
 ] 

Hive QA commented on HIVE-10562:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848355/HIVE-10562.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10950 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=125)

[table_access_keys_stats.q,bucketmapjoin11.q,auto_join4.q,mapjoin_decimal.q,join34.q,nullgroup.q,mergejoins_mixed.q,sort.q,stats8.q,auto_join28.q,join17.q,union17.q,skewjoinopt11.q,groupby1_map.q,load_dyn_part11.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[specialChar] (batchId=22)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[cluster_tasklog_retrieval]
 (batchId=87)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[minimr_broken_pipe]
 (batchId=87)
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade (batchId=211)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=215)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=217)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3071/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3071/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3071/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848355 - PreCommit-HIVE-Build

> Add version column to NOTIFICATION_LOG table and DbNotificationListener
> ---
>
> Key: HIVE-10562
> URL: https://issues.apache.org/jira/browse/HIVE-10562
> Project: Hive
>  Issue Type: Sub-task
>  Components: Import/Export
>Affects Versions: 1.2.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-10562.2.patch, HIVE-10562.3.patch, HIVE-10562.patch
>
>
> Currently, we have a JSON encoded message being stored in the 
> NOTIFICATION_LOG table.
> If we want to be future proof, we need to allow for versioning of this 
> message, since we might change what gets stored in the message. A prime 
> example of what we'd want to change is as in HIVE-10393.
> MessageFactory already has stubs to allow for versioning of messages, and we 
> could expand on this further in the future. NotificationListener currently 
> encodes the message version into the header for the JMS message it sends, 
> which seems to be the right place for a message version (instead of being 
> contained in the message, for eg.).
> So, we should have a similar ability for DbEventListener as well, and the 
> place this makes the most sense is to and add a version column to the 
> NOTIFICATION_LOG table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15623) Use customized version of netty for llap

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15623:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed patch 2 to master. Thanks Sergey for review!

> Use customized version of netty for llap
> 
>
> Key: HIVE-15623
> URL: https://issues.apache.org/jira/browse/HIVE-15623
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15623.1.patch, HIVE-15623.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15662) check startTime in SparkTask to make sure startTime is not less than submitTime

2017-01-20 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-15662:
-
Description: 
Check startTime in SparkTask to make sure startTime is not less than 
submitTime. We saw a corner case when the sparkTask is finished in less than 1 
second, the startTime may not be set because RemoteSparkJobMonitor will sleep 
for 1 second then check the state, in this case, right after sleep for one 
second, the spark job is already completed. one example query with 3 spark 
tasks, the second one finished quickly for around 1 second:
{code}
SparkTask1:
"finishTime":1484638391978
"submitTime":1484638385933
"startTime": 1484638386973
SparkTask2:
"finishTime":1484638393019
"submitTime":1484638391979
"startTime": 1484638386973
SparkTask3:
"finishTime":1484638432123
"submitTime":1484638393020
"startTime": 1484638394057
{code}


  was:
Check startTime in SparkTask to make sure startTime is not less than 
submitTime. We saw a corner case when the sparkTask is finished in less than 1 
second, the startTime may not be set because RemoteSparkJobMonitor will sleep 
for 1 second then check the state, in this case, right after sleep for one 
second, the spark job is already completed.



> check startTime in SparkTask to make sure startTime is not less than 
> submitTime
> ---
>
> Key: HIVE-15662
> URL: https://issues.apache.org/jira/browse/HIVE-15662
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15662.000.patch
>
>
> Check startTime in SparkTask to make sure startTime is not less than 
> submitTime. We saw a corner case when the sparkTask is finished in less than 
> 1 second, the startTime may not be set because RemoteSparkJobMonitor will 
> sleep for 1 second then check the state, in this case, right after sleep for 
> one second, the spark job is already completed. one example query with 3 
> spark tasks, the second one finished quickly for around 1 second:
> {code}
> SparkTask1:
> "finishTime":1484638391978
> "submitTime":1484638385933
> "startTime": 1484638386973
> SparkTask2:
> "finishTime":1484638393019
> "submitTime":1484638391979
> "startTime": 1484638386973
> SparkTask3:
> "finishTime":1484638432123
> "submitTime":1484638393020
> "startTime": 1484638394057
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15623) Use customized version of netty for llap

2017-01-20 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832227#comment-15832227
 ] 

Wei Zheng commented on HIVE-15623:
--

subquery_notin is the only failure that has age==1 and it runs fine locally

> Use customized version of netty for llap
> 
>
> Key: HIVE-15623
> URL: https://issues.apache.org/jira/browse/HIVE-15623
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15623.1.patch, HIVE-15623.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15621) Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15621:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed patch 4 to master. Thanks Sergey for review!

> Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP
> --
>
> Key: HIVE-15621
> URL: https://issues.apache.org/jira/browse/HIVE-15621
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15621.1.patch, HIVE-15621.2.patch, 
> HIVE-15621.3.patch, HIVE-15621.4.patch
>
>
> This is to avoid dependency on Hadoop's JvmPauseMonitor since Hive already 
> has its own implementation. HiveServer2 is already using Hive's .
> Need to follow up in HIVE-15644 to add the 3 missing JVM metrics for LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15629) Set DDLTask’s exception with its subtask’s exception

2017-01-20 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832207#comment-15832207
 ] 

zhihai xu commented on HIVE-15629:
--

Thanks [~jxiang] for reviewing and committing the patch!

> Set DDLTask’s exception with its subtask’s exception
> 
>
> Key: HIVE-15629
> URL: https://issues.apache.org/jira/browse/HIVE-15629
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15629.000.patch
>
>
> Set DDLTask’s exception with its subtask’s exception, So the exception from 
> subtask in DDLTask can be propagated to TaskRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15621) Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15621:
-
Description: 
This is to avoid dependency on Hadoop's JvmPauseMonitor since Hive already has 
its own implementation. HiveServer2 is already using Hive's .

Need to follow up in HIVE-15644 to add the 3 missing JVM metrics for LLAP.

  was:This is to avoid dependency on Hadoop's JvmPauseMonitor since Hive 
already has its own implementation. Need to follow up in HIVE-15644 to add the 
3 missing JVM metrics for Hive's JvmPauseMonitor.


> Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP
> --
>
> Key: HIVE-15621
> URL: https://issues.apache.org/jira/browse/HIVE-15621
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15621.1.patch, HIVE-15621.2.patch, 
> HIVE-15621.3.patch, HIVE-15621.4.patch
>
>
> This is to avoid dependency on Hadoop's JvmPauseMonitor since Hive already 
> has its own implementation. HiveServer2 is already using Hive's .
> Need to follow up in HIVE-15644 to add the 3 missing JVM metrics for LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15621) Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP

2017-01-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15621:
-
Description: This is to avoid dependency on Hadoop's JvmPauseMonitor since 
Hive already has its own implementation. Need to follow up in HIVE-15644 to add 
the 3 missing JVM metrics for Hive's JvmPauseMonitor.

> Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP
> --
>
> Key: HIVE-15621
> URL: https://issues.apache.org/jira/browse/HIVE-15621
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15621.1.patch, HIVE-15621.2.patch, 
> HIVE-15621.3.patch, HIVE-15621.4.patch
>
>
> This is to avoid dependency on Hadoop's JvmPauseMonitor since Hive already 
> has its own implementation. Need to follow up in HIVE-15644 to add the 3 
> missing JVM metrics for Hive's JvmPauseMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 136 matches

Mail list logo