date:20141216


[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247911#comment-14247911
 ] 

Chengxiang Li commented on HIVE-9094:
-

Here is the TimeoutException in spark client side:
{noformat}
2014-12-15 12:14:09,062 ERROR [main]: ql.Driver 
(SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
spark memory/core info: java.util.concurrent.TimeoutException
org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
memory/core info: java.util.concurrent.TimeoutException
at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16(TestSparkCliDriver.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: java.util.concurrent.TimeoutException
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:92)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77)
at

[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed


 [ 
https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-7797:

Summary: upgrade hive schema from 0.9.0 to 0.13.1 failed   (was: upgrade 
sql 014-HIVE-3764.postgres.sql failed)

 upgrade hive schema from 0.9.0 to 0.13.1 failed 
 

 Key: HIVE-7797
 URL: https://issues.apache.org/jira/browse/HIVE-7797
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Nemon Lou

 The sql is :
 INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
 (1, '', 'Initial value');
 And the result is:
 ERROR:  null value in column SCHEMA_VERSION violates not-null constraint
 DETAIL:  Failing row contains (1, null, Initial value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

How to debug hive unit test in eclipse ?

2014-12-16 Thread Jeff Zhang

The wiki page looks like already out-dated,
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode

Is there any new wiki page for how to debug hive unit test in eclipse ?



-- 
Best Regards

Jeff Zhang

[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed


 [ 
https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-7797:

Description: 
Using hive schema tool with the following command to upgrade hive schema failed:
schematool -dbType postgres -upgradeSchemaFrom 0.9.0

ERROR: null value in column SCHEMA_VERSION violates not-null constraint

  was:
The sql is :
INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
(1, '', 'Initial value');

And the result is:
ERROR:  null value in column SCHEMA_VERSION violates not-null constraint
DETAIL:  Failing row contains (1, null, Initial value).


 upgrade hive schema from 0.9.0 to 0.13.1 failed 
 

 Key: HIVE-7797
 URL: https://issues.apache.org/jira/browse/HIVE-7797
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Nemon Lou

 Using hive schema tool with the following command to upgrade hive schema 
 failed:
 schematool -dbType postgres -upgradeSchemaFrom 0.9.0
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed


 [ 
https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-7797:

Affects Version/s: 0.14.0

 upgrade hive schema from 0.9.0 to 0.13.1 failed 
 

 Key: HIVE-7797
 URL: https://issues.apache.org/jira/browse/HIVE-7797
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 0.13.1
Reporter: Nemon Lou

 Using hive schema tool with the following command to upgrade hive schema 
 failed:
 schematool -dbType postgres -upgradeSchemaFrom 0.9.0
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint
 Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed.
 The sql in it is :
 INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
 (1, '', 'Initial value');
 And the result is:
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint
 DETAIL: Failing row contains (1, null, Initial value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed


 [ 
https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-7797:

Description: 
Using hive schema tool with the following command to upgrade hive schema failed:
schematool -dbType postgres -upgradeSchemaFrom 0.9.0

ERROR: null value in column SCHEMA_VERSION violates not-null constraint

Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed.
The sql in it is :
INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
(1, '', 'Initial value');
And the result is:
ERROR: null value in column SCHEMA_VERSION violates not-null constraint
DETAIL: Failing row contains (1, null, Initial value).

  was:
Using hive schema tool with the following command to upgrade hive schema failed:
schematool -dbType postgres -upgradeSchemaFrom 0.9.0

ERROR: null value in column SCHEMA_VERSION violates not-null constraint


 upgrade hive schema from 0.9.0 to 0.13.1 failed 
 

 Key: HIVE-7797
 URL: https://issues.apache.org/jira/browse/HIVE-7797
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 0.13.1
Reporter: Nemon Lou

 Using hive schema tool with the following command to upgrade hive schema 
 failed:
 schematool -dbType postgres -upgradeSchemaFrom 0.9.0
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint
 Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed.
 The sql in it is :
 INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
 (1, '', 'Initial value');
 And the result is:
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint
 DETAIL: Failing row contains (1, null, Initial value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7977) Avoid creating serde for partitions if possible in FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247946#comment-14247946
 ] 

Hive QA commented on HIVE-7977:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687358/HIVE-7977.6.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6705 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2088/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2088/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2088/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687358 - PreCommit-HIVE-TRUNK-Build

 Avoid creating serde for partitions if possible in FetchTask
 

 Key: HIVE-7977
 URL: https://issues.apache.org/jira/browse/HIVE-7977
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7977.1.patch.txt, HIVE-7977.2.patch.txt, 
 HIVE-7977.3.patch.txt, HIVE-7977.4.patch.txt, HIVE-7977.5.patch.txt, 
 HIVE-7977.6.patch.txt


 Currently, FetchTask creates SerDe instance thrice for each partition, which 
 can be avoided if it's same with table SerDe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed


 [ 
https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-7797:

Attachment: HIVE-7797.1.patch

Using blank space instead of '' ,so postgres won't convert the empty string 
into null.

 upgrade hive schema from 0.9.0 to 0.13.1 failed 
 

 Key: HIVE-7797
 URL: https://issues.apache.org/jira/browse/HIVE-7797
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 0.13.1
Reporter: Nemon Lou
 Attachments: HIVE-7797.1.patch


 Using hive schema tool with the following command to upgrade hive schema 
 failed:
 schematool -dbType postgres -upgradeSchemaFrom 0.9.0
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint
 Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed.
 The sql in it is :
 INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
 (1, '', 'Initial value');
 And the result is:
 ERROR: null value in column SCHEMA_VERSION violates not-null constraint
 DETAIL: Failing row contains (1, null, Initial value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true

2014-12-16 Thread Dong Chen (JIRA)

Dong Chen created HIVE-9120:
---

 Summary: Hive Query log does not work when hive.exec.parallel is 
true
 Key: HIVE-9120
 URL: https://issues.apache.org/jira/browse/HIVE-9120
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Logging
Reporter: Dong Chen


When hive.exec.parallel is true, the query log is not saved and Beeline can not 
retrieve it.
When parallel, Driver.launchTask() may run the task in a new thread if other 
conditions are also on. TaskRunner.start() is invoked instead of 
TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to 
be null and query logs are not logged.
The OperationLog object should be set in the new thread in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true

2014-12-16 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen reassigned HIVE-9120:
---

Assignee: Dong Chen

 Hive Query log does not work when hive.exec.parallel is true
 

 Key: HIVE-9120
 URL: https://issues.apache.org/jira/browse/HIVE-9120
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Logging
Reporter: Dong Chen
Assignee: Dong Chen

 When hive.exec.parallel is true, the query log is not saved and Beeline can 
 not retrieve it.
 When parallel, Driver.launchTask() may run the task in a new thread if other 
 conditions are also on. TaskRunner.start() is invoked instead of 
 TaskRunner.runSequential(). This cause the threadlocal variable OperationLog 
 to be null and query logs are not logged.
 The OperationLog object should be set in the new thread in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9107) Non-lowercase field names in structs causes NullPointerException


[ 
https://issues.apache.org/jira/browse/HIVE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247953#comment-14247953
 ] 

Lefty Leverenz commented on HIVE-9107:
--

Is this related to HIVE-8386 or HIVE-8870 (also HIVE-6198)?

 Non-lowercase field names in structs causes NullPointerException
 

 Key: HIVE-9107
 URL: https://issues.apache.org/jira/browse/HIVE-9107
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Jim Pivarski

 If an HQL query references a struct field with mixed or upper case, Hive 
 throws a NullPointerException instead of giving a better error message or 
 simply lower-casing the name.
 For example, if I have a struct in column mystruct with a field named 
 myfield, a query like
 select mystruct.MyField from tablename;
 passes the local initialize (it submits an M-R job) but the remote initialize 
 jobs throw NullPointerExceptions.  The exception is on line 61 of 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator, which is right after 
 the field name is extracted and not forced to be lower-case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li reassigned HIVE-9094:
---

Assignee: Chengxiang Li

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li

 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:176)
 at junit.framework.TestCase.runBare(TestCase.java:141)
 at junit.framework.TestResult$1.protect(TestResult.java:122)
 at junit.framework.TestResult.runProtected(TestResult.java:142)
 at junit.framework.TestResult.run(TestResult.java:125)
 at junit.framework.TestCase.run(TestCase.java:129)
 at junit.framework.TestSuite.runTest(TestSuite.java:255)
 at junit.framework.TestSuite.run(TestSuite.java:250)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Caused by:

[jira] [Updated] (HIVE-6468) HS2 Metastore using SASL out of memory error when curl sends a get request


 [ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6468:
-
Labels: TODOC14  (was: )

 HS2  Metastore using SASL out of memory error when curl sends a get request
 

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.13.1
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab
Assignee: Navis
  Labels: TODOC14
 Fix For: 0.14.1

 Attachments: HIVE-6468.0.patch, HIVE-6468.1.patch.txt, 
 HIVE-6468.2.patch.txt, HIVE-6468.3.patch, HIVE-6468.4.patch, 
 HIVE-6468.5.patch, HIVE-6468.branch-0.14.patch, HIVE-6468.branch-0.14.patch


 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 curl localhost:1
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6468) HS2 Metastore using SASL out of memory error when curl sends a get request


[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247972#comment-14247972
 ] 

Lefty Leverenz commented on HIVE-6468:
--

Doc note:  Add *hive.thrift.sasl.message.limit* to the wiki in Configuration 
Properties.  But which section?  Perhaps it belongs in the Metastore section 
after *hive.metastore.sasl.enabled* -- most other mentions of SASL occur in the 
HiveServer2 section.

* [Configuration Properties -- Metastore | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore]
** [hive.metastore.sasl.enabled | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.sasl.enabled]
* [Configuration Properties -- HiveServer2 | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2]
** [hive.server2.authentication | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.authentication]
** [hive.server2.thrift.sasl.qop | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.thrift.sasl.qop]

bq.  On trunk this issue has been resolved with a thrift version upgrade.

Does that mean *hive.thrift.sasl.message.limit* will only exist in the 0.14.1 
release?

 HS2  Metastore using SASL out of memory error when curl sends a get request
 

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.13.1
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab
Assignee: Navis
  Labels: TODOC14
 Fix For: 0.14.1

 Attachments: HIVE-6468.0.patch, HIVE-6468.1.patch.txt, 
 HIVE-6468.2.patch.txt, HIVE-6468.3.patch, HIVE-6468.4.patch, 
 HIVE-6468.5.patch, HIVE-6468.branch-0.14.patch, HIVE-6468.branch-0.14.patch


 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 curl localhost:1
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]

Chengxiang Li created HIVE-9121:
---

 Summary: Enable beeline query progress information for Spark 
job[Spark Branch]
 Key: HIVE-9121
 URL: https://issues.apache.org/jira/browse/HIVE-9121
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical


We could not get query progress information in Beeline as SparkJobMonitor is 
filtered out of operation log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9121:

Status: Patch Available  (was: Open)

 Enable beeline query progress information for Spark job[Spark Branch]
 -

 Key: HIVE-9121
 URL: https://issues.apache.org/jira/browse/HIVE-9121
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M4
 Attachments: HIVE-9121.1-spark.patch


 We could not get query progress information in Beeline as SparkJobMonitor is 
 filtered out of operation log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9121:

Attachment: HIVE-9121.1-spark.patch

*NOTE*:
I enable Beeline query progress information on Spark with this patch while 
hive.exec.parallel=false. with hive.exec.parallel=true, Hive hit into another 
Beeline query progress issue which is tracked by HIVE-9120.

 Enable beeline query progress information for Spark job[Spark Branch]
 -

 Key: HIVE-9121
 URL: https://issues.apache.org/jira/browse/HIVE-9121
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M4
 Attachments: HIVE-9121.1-spark.patch


 We could not get query progress information in Beeline as SparkJobMonitor is 
 filtered out of operation log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247996#comment-14247996
 ] 

Chengxiang Li commented on HIVE-8993:
-

[~brocknoland], I talked with Dong offline, we found the reasons why Beeline 
query progress does not work, and created HIVE-9120 and HIVE-9121 to track them.

 Make sure Spark + HS2 work [Spark Branch]
 -

 Key: HIVE-8993
 URL: https://issues.apache.org/jira/browse/HIVE-8993
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
  Labels: TODOC-SPARK
 Fix For: spark-branch

 Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
 HIVE-8993.3-spark.patch


 We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout for all transports types


[ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248013#comment-14248013
 ] 

Lefty Leverenz commented on HIVE-6679:
--

Doc note:  Patch 4 adds two configuration parameters (which are different from 
the two in patch 1):  *hive.server2.tcp.socket.blocking.timeout* and 
*hive.server2.tcp.socket.keepalive*.

Code review:  The time specification for 
*hive.server2.tcp.socket.blocking.timeout* could be done with a TimeValidator 
like others in HiveConf.java since 0.14.0, for example:

{code}

METASTORE_CLIENT_CONNECT_RETRY_DELAY(hive.metastore.client.connect.retry.delay,
 1s,
new TimeValidator(TimeUnit.SECONDS),
{code}

 HiveServer2 should support configurable the server side socket timeout for 
 all transports types
 ---

 Key: HIVE-6679
 URL: https://issues.apache.org/jira/browse/HIVE-6679
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0
Reporter: Prasad Mujumdar
Assignee: Navis
 Fix For: 0.14.1

 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
 HIVE-6679.3.patch, HIVE-6679.4.patch


  HiveServer2 should support configurable the server side socket read timeout 
 and TCP keep-alive option. Metastore server already support this (and the so 
 is the old hive server). 
 We now have multiple client connectivity options like Kerberos, Delegation 
 Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
 configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356


[ 
https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248074#comment-14248074
 ] 

Hive QA commented on HIVE-9115:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687365/HIVE-9115.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6705 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_pop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2089/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2089/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2089/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687365 - PreCommit-HIVE-TRUNK-Build

 Hive build failure on hadoop-2.7 due to HADOOP-11356
 

 Key: HIVE-9115
 URL: https://issues.apache.org/jira/browse/HIVE-9115
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
 Attachments: HIVE-9115.1.patch


 HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, 
 causing build break on Hive when compiling against hadoop-2.7:
 {noformat}
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63]
  cannot find symbol
   symbol:   class AccessControlException
   location: package org.apache.hadoop.fs.permission
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248156#comment-14248156
 ] 

Hive QA commented on HIVE-9076:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687375/HIVE-9076.3.patch.txt

{color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 6706 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lb_fs_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_truncate_column_list_bucketing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2090/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2090/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2090/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 26 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687375 - PreCommit-HIVE-TRUNK-Build

 incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
 check
 ---

 Key: HIVE-9076
 URL: https://issues.apache.org/jira/browse/HIVE-9076
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-9076.1.patch.txt, HIVE-9076.2.patch.txt, 
 HIVE-9076.3.patch.txt


 In some file composition, AbstractFileMergeOperator removes incompatible 
 files. For example,
 {noformat}
 00_0 (v12)
 00_0_copy_1 (v12)
 00_1 (v11)
 00_1_copy_1 (v11)
 00_1_copy_2 (v11)
 00_2 (v12)
 {noformat}
 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols


 [ 
https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8827:
---
Affects Version/s: 0.14.0

 Remove SSLv2Hello from list of disabled protocols
 -

 Key: HIVE-8827
 URL: https://issues.apache.org/jira/browse/HIVE-8827
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-8827.1.patch


 Turns out SSLv2Hello is not the same as SSLv2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248160#comment-14248160
 ] 

Hive QA commented on HIVE-9121:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687465/HIVE-9121.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7235 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/553/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/553/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-553/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687465 - PreCommit-HIVE-SPARK-Build

 Enable beeline query progress information for Spark job[Spark Branch]
 -

 Key: HIVE-9121
 URL: https://issues.apache.org/jira/browse/HIVE-9121
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M4
 Attachments: HIVE-9121.1-spark.patch


 We could not get query progress information in Beeline as SparkJobMonitor is 
 filtered out of operation log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols


 [ 
https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8827:
---
Component/s: JDBC
 HiveServer2

 Remove SSLv2Hello from list of disabled protocols
 -

 Key: HIVE-8827
 URL: https://issues.apache.org/jira/browse/HIVE-8827
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-8827.1.patch


 Turns out SSLv2Hello is not the same as SSLv2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9097:
-
Attachment: HIVE-9097.1-spark.patch

The patch splits the original spark task into two tasks so that conditional map 
joins can be inserted to process skewed data.
Changes to golden files are all in query plan.

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9097:
-
Status: Patch Available  (was: Open)

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248217#comment-14248217
 ] 

Hive QA commented on HIVE-9097:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687476/HIVE-9097.1-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7235 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/554/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/554/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-554/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687476 - PreCommit-HIVE-SPARK-Build

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9113) Explain on query failed with NPE

2014-12-16 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248235#comment-14248235
 ] 

Prabhu Joseph commented on HIVE-9113:
-

Hi Chao,

   The subquery inside IN clause does not have from clause. The correct query 
is below,

{noformat}
select p.p_partkey, li.suppkey
from (select distinct partkey as p_partkey from lineitem) p join lineitem li on 
p.p_partkey = li.partkey
where li.l_linenumber = 1 and
 li.l_orderkey in (select l_orderkey from lineitem where l_linenumber = 
li.l_linenumber);
{noformat}

 Explain on query failed with NPE
 

 Key: HIVE-9113
 URL: https://issues.apache.org/jira/browse/HIVE-9113
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chao

 Run explain on the following query:
 {noformat}
 select p.p_partkey, li.l_suppkey
 from (select distinct l_partkey as p_partkey from lineitem) p join lineitem 
 li on p.p_partkey = li.l_partkey
 where li.l_linenumber = 1 and
  li.l_orderkey in (select l_orderkey where l_linenumber = li.l_linenumber)
 ;
 {noformat}
 gave me NPE:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.QBSubQuery.validateAndRewriteAST(QBSubQuery.java:516)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2605)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8866)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9745)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9638)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10125)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}
 Is this query invalid? If so, it should at least give some explanations, not 
 just a plain NPE message, and left user clueless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8819) Create unit test where we read from an read only encrypted table

2014-12-16 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-8819:
--

Assignee: Ferdinand Xu

 Create unit test where we read from an read only encrypted table
 

 Key: HIVE-8819
 URL: https://issues.apache.org/jira/browse/HIVE-8819
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu

 the table should be chmoded 555



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8131) Support timestamp in Avro

2014-12-16 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8131:
---
Attachment: HIVE-8131.1.patch

 Support timestamp in Avro
 -

 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9109) Add support for Java 8 specific q-test out files


[ 
https://issues.apache.org/jira/browse/HIVE-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248266#comment-14248266
 ] 

Hive QA commented on HIVE-9109:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687376/HIVE-9109.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6705 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2091/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2091/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2091/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687376 - PreCommit-HIVE-TRUNK-Build

 Add support for Java 8 specific q-test out files
 

 Key: HIVE-9109
 URL: https://issues.apache.org/jira/browse/HIVE-9109
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Attachments: HIVE-9109.1.patch, HIVE-9109.patch


 Hash function differences between Java 7 and Java 8 lead to result order 
 differences. While we have been able to fix a good number by converting hash 
 maps to insert order hash maps, there are several cases where doing so is 
 either not possible (because changes originate in external APIs) or change 
 leads to even more out file differences.
 For example:
 (1) In TestCliDriver.testCliDriver_varchar_udf1, for the following query:
 {code}
 select
   str_to_map('a:1,b:2,c:3',',',':'),
   str_to_map(cast('a:1,b:2,c:3' as varchar(20)),',',':')
 from varchar_udf_1 limit 1;”)
 {code}
 the {{StandardMapObjectInspector}} used in {{LazySimpleSerDe}} to serialize 
 the final output uses a {{HashMap}}. Changing it to {{LinkedHashMap}} will 
 lead to several other q-test output differences.
 (2) In TestCliDriver.testCliDriver_parquet_map_null, data with {{map}} column 
 is read from an Avro table into a Parquet table. Avro API, specifically 
 {{GenericData.Record}} uses {{HashMap}} and returns data in different order.
 This patch adds supports to specify a hint called 
 {{JAVA_VERSION_SPECIFIC_OUTPUT}} which may be added to a q-test, only if 
 different outputs are expected for Java versions.
 For example:
   Under Java 7, test output file has .java1.7.out extension. 
   Under Java 8, test output file has .java1.8.out extension.
 If hint is not added, we continue to generate a single .out file for the 
 test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8131) Support timestamp in Avro

2014-12-16 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248271#comment-14248271
 ] 

Ferdinand Xu commented on HIVE-8131:


Thank [~mohitsabharwal] for your review. [~brocknoland], can you help me review 
it when you have some time?

 Support timestamp in Avro
 -

 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode


[ 
https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248272#comment-14248272
 ] 

Lefty Leverenz commented on HIVE-8893:
--

Doc issue:  [~prasadm], I noticed that you put 
*hive.server2.builtin.udf.whitelist* and *hive.server2.builtin.udf.blacklist* 
in the Configuration Properties doc after 
*hive.security.authorization.sqlstd.confwhitelist*, which is in the SQL 
Standard Based Authorization section.  Don't they belong in the HiveServer2 
section instead?  Or do they only apply to SQL standard-based authorization?

Wherever they go, I'll add links in the Restricted List and Whitelist 
subsection of Authentication/Authorization just like the link for 
*hive.security.authorization.sqlstd.confwhitelist*.  If you have better ideas 
about how to organize all these parameters, please let me know.

Quick reference:

* [hive.security.authorization.sqlstd.confwhitelist | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.security.authorization.sqlstd.confwhitelist]
followed by hive.server2.builtin.udf.whitelist and 
hive.server2.builtin.udf.blacklist
* [HiveServer2 | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2]
* [Restricted List and Whitelist | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-RestrictedListandWhitelist]

 Implement whitelist for builtin UDFs to avoid untrused code execution in 
 multiuser mode
 ---

 Key: HIVE-8893
 URL: https://issues.apache.org/jira/browse/HIVE-8893
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HiveServer2, SQL
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.15.0

 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch, 
 HIVE-8893.6.patch


 The udfs like reflect() or java_method() enables executing a java method as 
 udf. While this offers lot of flexibility in the standalone mode, it can 
 become a security loophole in a secure multiuser environment. For example, in 
  HiveServer2 one can execute any available java code with user hive's 
 credentials.
 We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins


[ 
https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248301#comment-14248301
 ] 

Lefty Leverenz commented on HIVE-8611:
--

Added *hive.security.authorization.task.factory* to Configuration Properties in 
the SQL Standard Based Authorization section:

* [hive.security.authorization.task.factory | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.security.authorization.task.factory]

 grant/revoke syntax should support additional objects for authorization 
 plugins
 ---

 Key: HIVE-8611
 URL: https://issues.apache.org/jira/browse/HIVE-8611
 Project: Hive
  Issue Type: Bug
  Components: Authentication, SQL
Affects Versions: 0.13.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.15.0

 Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, 
 HIVE-8611.3.patch, HIVE-8611.4.patch


 The authorization framework supports URI and global objects. The SQL syntax 
 however doesn't allow granting privileges on these objects. We should allow 
 the compiler to parse these so that it can be handled by authorization 
 plugins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9106) improve the performance of null scan optimizer when several table scans share a physical path


[ 
https://issues.apache.org/jira/browse/HIVE-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248366#comment-14248366
 ] 

Hive QA commented on HIVE-9106:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687378/HIVE-9106.00.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6705 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2092/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2092/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2092/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687378 - PreCommit-HIVE-TRUNK-Build

 improve the performance of null scan optimizer when several table scans share 
 a physical path
 -

 Key: HIVE-9106
 URL: https://issues.apache.org/jira/browse/HIVE-9106
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-9106.00.patch


 Current fix HIVE-9053 addresses the correctness issue. The solution can be 
 improved further when several table scans share a physical path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: How to debug hive unit test in eclipse ?

2014-12-16 Thread Xuefu Zhang

Besides running the test in eclipse (if possible), the other way to do is
to enable debug for sure-fire and connect it with eclipse using remote
application debugging.

http://maven.apache.org/surefire/maven-surefire-plugin/examples/debugging.html

--Xuefu


On Tue, Dec 16, 2014 at 12:17 AM, Jeff Zhang zjf...@gmail.com wrote:

 The wiki page looks like already out-dated,

 https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode

 Is there any new wiki page for how to debug hive unit test in eclipse ?



 --
 Best Regards

 Jeff Zhang

[jira] [Commented] (HIVE-9107) Non-lowercase field names in structs causes NullPointerException

2014-12-16 Thread Jim Pivarski (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248386#comment-14248386
 ] 

Jim Pivarski commented on HIVE-9107:


It's the same stack trace as HIVE-8870, but not the other two.  (I couldn't 
find it when I searched for preexisting issues, and I see that it's been fixed 
for 0.14.)  Thanks!


 Non-lowercase field names in structs causes NullPointerException
 

 Key: HIVE-9107
 URL: https://issues.apache.org/jira/browse/HIVE-9107
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Jim Pivarski

 If an HQL query references a struct field with mixed or upper case, Hive 
 throws a NullPointerException instead of giving a better error message or 
 simply lower-casing the name.
 For example, if I have a struct in column mystruct with a field named 
 myfield, a query like
 select mystruct.MyField from tablename;
 passes the local initialize (it submits an M-R job) but the remote initialize 
 jobs throw NullPointerExceptions.  The exception is on line 61 of 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator, which is right after 
 the field name is extracted and not forced to be lower-case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20

Jason Dere created HIVE-9122:


 Summary: Need to remove additional references to 
hive-shims-common-secure, hive-shims-0.20
 Key: HIVE-9122
 URL: https://issues.apache.org/jira/browse/HIVE-9122
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Jason Dere
Assignee: Jason Dere


HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but 
we still have a few references to those removed packages:

{noformat}
$ find . -name pom.xml -exec egrep 
hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print
  
artifactorg.apache.hive.shims:hive-shims-common-secure/artifact
./jdbc/pom.xml
  includeorg.apache.hive.shims:hive-shims-0.20/include
  
includeorg.apache.hive.shims:hive-shims-common-secure/include
./ql/pom.xml
  artifactIdhive-shims-common-secure/artifactId
./shims/0.20S/pom.xml
  artifactIdhive-shims-common-secure/artifactId
./shims/0.23/pom.xml
  artifactIdhive-shims-common-secure/artifactId
./shims/scheduler/pom.xml
{noformat}

While building from trunk, you can see that maven is still pulling an old 
snapshot of hive-shims-common-secure.jar from repository.apache.org:

{noformat}
[INFO] 
[INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT
[INFO] 
Downloading: 
http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
Downloaded: 
http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
 (801 B at 7.2 KB/sec)
Downloading: 
http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
Downloaded: 
http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
 (4 KB at 47.0 KB/sec)
Downloading: 
http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
Downloaded: 
http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
 (33 KB at 277.0 KB/sec)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20


 [ 
https://issues.apache.org/jira/browse/HIVE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9122:
-
Attachment: HIVE-9122.1.patch

Actually the references to hive-shims-common-secure need to be changed to 
hive-shims-common, or Hive would not build. Attaching patch.

 Need to remove additional references to hive-shims-common-secure, 
 hive-shims-0.20
 -

 Key: HIVE-9122
 URL: https://issues.apache.org/jira/browse/HIVE-9122
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9122.1.patch


 HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but 
 we still have a few references to those removed packages:
 {noformat}
 $ find . -name pom.xml -exec egrep 
 hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print
   
 artifactorg.apache.hive.shims:hive-shims-common-secure/artifact
 ./jdbc/pom.xml
   includeorg.apache.hive.shims:hive-shims-0.20/include
   
 includeorg.apache.hive.shims:hive-shims-common-secure/include
 ./ql/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/0.20S/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/0.23/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/scheduler/pom.xml
 {noformat}
 While building from trunk, you can see that maven is still pulling an old 
 snapshot of hive-shims-common-secure.jar from repository.apache.org:
 {noformat}
 [INFO] 
 
 [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT
 [INFO] 
 
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
  (801 B at 7.2 KB/sec)
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
  (4 KB at 47.0 KB/sec)
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
  (33 KB at 277.0 KB/sec)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20


 [ 
https://issues.apache.org/jira/browse/HIVE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9122:
-
Status: Patch Available  (was: Open)

 Need to remove additional references to hive-shims-common-secure, 
 hive-shims-0.20
 -

 Key: HIVE-9122
 URL: https://issues.apache.org/jira/browse/HIVE-9122
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9122.1.patch


 HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but 
 we still have a few references to those removed packages:
 {noformat}
 $ find . -name pom.xml -exec egrep 
 hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print
   
 artifactorg.apache.hive.shims:hive-shims-common-secure/artifact
 ./jdbc/pom.xml
   includeorg.apache.hive.shims:hive-shims-0.20/include
   
 includeorg.apache.hive.shims:hive-shims-common-secure/include
 ./ql/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/0.20S/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/0.23/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/scheduler/pom.xml
 {noformat}
 While building from trunk, you can see that maven is still pulling an old 
 snapshot of hive-shims-common-secure.jar from repository.apache.org:
 {noformat}
 [INFO] 
 
 [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT
 [INFO] 
 
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
  (801 B at 7.2 KB/sec)
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
  (4 KB at 47.0 KB/sec)
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
  (33 KB at 277.0 KB/sec)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9123) Query with join fails with NPE when using join auto conversion

Kamil Gorlo created HIVE-9123:
-

 Summary: Query with join fails with NPE when using join auto 
conversion
 Key: HIVE-9123
 URL: https://issues.apache.org/jira/browse/HIVE-9123
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: CDH5 with Hive 0.13.1
Reporter: Kamil Gorlo


I have two simple tables:

desc kgorlo_comm;
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| id| bigint |  |
| dest_id   | bigint |  |
+---++--+--+

desc kgorlo_log; 
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |
+---++--+--+

With data:

select * from kgorlo_comm; 
+-+--+--+
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
+-+--+--+
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|
+-+--+--+

select * from kgorlo_log; 
++-++--+
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
++-++--+
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |
++-++--+

Following query fails in second stage of execution:

select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as 
wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and 
com1.dest_id=v.dest_id;

with following exception:

  2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
  java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {_col0:1,_col1:2}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at

[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion


 [ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9123:
--
Description: 
I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

Following query fails in second stage of execution:

select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as 
wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and 
com1.dest_id=v.dest_id;

with following exception:

  2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
  java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {_col0:1,_col1:2}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at

[jira] [Commented] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true


[ 
https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248443#comment-14248443
 ] 

Brock Noland commented on HIVE-9120:


Nice find! Thank you [~dongc]!!

 Hive Query log does not work when hive.exec.parallel is true
 

 Key: HIVE-9120
 URL: https://issues.apache.org/jira/browse/HIVE-9120
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Logging
Reporter: Dong Chen
Assignee: Dong Chen

 When hive.exec.parallel is true, the query log is not saved and Beeline can 
 not retrieve it.
 When parallel, Driver.launchTask() may run the task in a new thread if other 
 conditions are also on. TaskRunner.start() is invoked instead of 
 TaskRunner.runSequential(). This cause the threadlocal variable OperationLog 
 to be null and query logs are not logged.
 The OperationLog object should be set in the new thread in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion


 [ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9123:
--
Description: 
I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

Following query fails in second stage of execution:

`select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as 
wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and 
com1.dest_id=v.dest_id;`

with following exception:

  2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
  java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {_col0:1,_col1:2}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at

[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion


 [ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9123:
--
Description: 
I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

Following query fails in second stage of execution:

'select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as 
wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and 
com1.dest_id=v.dest_id;'

with following exception:

  2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
  java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {_col0:1,_col1:2}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at

[jira] [Commented] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248446#comment-14248446
 ] 

Brock Noland commented on HIVE-9121:


+1

 Enable beeline query progress information for Spark job[Spark Branch]
 -

 Key: HIVE-9121
 URL: https://issues.apache.org/jira/browse/HIVE-9121
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M4
 Attachments: HIVE-9121.1-spark.patch


 We could not get query progress information in Beeline as SparkJobMonitor is 
 filtered out of operation log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion


 [ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9123:
--
Description: 
I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

Following query fails in second stage of execution:

bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) 
as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and 
com1.dest_id=v.dest_id;

with following exception:

{quote}
  2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
  java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {_col0:1,_col1:2}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at

[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion


 [ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9123:
--
Description: 
I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

Following query fails in second stage of execution:

bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) 
as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and 
com1.dest_id=v.dest_id;

with following exception:

  2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
  java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {_col0:1,_col1:2}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at

[jira] [Updated] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9121:
---
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you very much [~chengxiang li]! I have committed this to branch.

 Enable beeline query progress information for Spark job[Spark Branch]
 -

 Key: HIVE-9121
 URL: https://issues.apache.org/jira/browse/HIVE-9121
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M4
 Fix For: spark-branch

 Attachments: HIVE-9121.1-spark.patch


 We could not get query progress information in Beeline as SparkJobMonitor is 
 filtered out of operation log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table


[ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248466#comment-14248466
 ] 

Hive QA commented on HIVE-7248:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687391/HIVE-7248.3.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6705 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2093/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2093/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2093/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687391 - PreCommit-HIVE-TRUNK-Build

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, 
 HIVE-7248.3.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8131) Support timestamp in Avro


[ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248467#comment-14248467
 ] 

Brock Noland commented on HIVE-8131:


I think it'd be great to have [~rdblue] review this as well.

 Support timestamp in Avro
 -

 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248489#comment-14248489
 ] 

Jimmy Xiang commented on HIVE-8843:
---

Thanks a lot for reviewing it. Good point. Yes, it is a little intrusive for 
the RSC one. Let me fix it.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8815) Create unit test join of encrypted and unencrypted table


[ 
https://issues.apache.org/jira/browse/HIVE-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248502#comment-14248502
 ] 

Brock Noland commented on HIVE-8815:


Hi,

Looks good to me! Say, do we need to add {{encrypt_data.txt}} or can we re-use 
one of the existing files?

Cheers!

 Create unit test join of encrypted and unencrypted table
 

 Key: HIVE-8815
 URL: https://issues.apache.org/jira/browse/HIVE-8815
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8815.patch


 NO PRECOMMIT TESTS
 The results should be inserted into a third table encrypted with a separate 
 key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8816) Create unit test join of two encrypted tables with different keys


[ 
https://issues.apache.org/jira/browse/HIVE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248504#comment-14248504
 ] 

Brock Noland commented on HIVE-8816:


Can we add {{explain extended}} and then verify in the {{.out.orig}} file that 
the {{.hive-staging}} used is inside the table with the stronger key?

 Create unit test join of two encrypted tables with different keys
 -

 Key: HIVE-8816
 URL: https://issues.apache.org/jira/browse/HIVE-8816
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Fix For: encryption-branch

 Attachments: HIVE-8816.patch


 NO PRECOMMIT TESTS
 The results should be inserted into a third table encrypted with a separate 
 key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29061: HIVE-9109 : Add support for Java 8 specific q-test out files

2014-12-16 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29061/#review65208
---

Ship it!


Ship It!

- Brock Noland


On Dec. 16, 2014, 2:16 a.m., Mohit Sabharwal wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29061/
 ---
 
 (Updated Dec. 16, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-9109
 https://issues.apache.org/jira/browse/HIVE-9109
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-9109 : Add support for Java 8 specific q-test out files
 
 Hash function differences between Java 7 and Java 8 lead to result order 
 differences. While we have been able to fix a good number by converting hash 
 maps to insert order hash maps, there are several cases where doing so is 
 either not possible (because changes originate in external APIs) or change 
 leads to even more out file differences.
 
 For example:
 (1) In TestCliDriver.testCliDriver_varchar_udf1, for the following query:
 select
   str_to_map('a:1,b:2,c:3',',',':'),
   str_to_map(cast('a:1,b:2,c:3' as varchar(20)),',',':')
 from varchar_udf_1 limit 1;”)
 the StandardMapObjectInspector used in LazySimpleSerDe to serialize the final 
 output uses a HashMap. Changing it to LinkedHashMap will lead to several 
 other q-test output differences.
 
 (2) In TestCliDriver.testCliDriver_parquet_map_null, data with map column is 
 read from an Avro table into a Parquet table. Avro API, specifically 
 GenericData.Record uses HashMap and returns data in different order.
 
 This patch adds supports to specify a hint called 
 JAVA_VERSION_SPECIFIC_OUTPUT which may be added to a q-test, only if 
 different outputs are expected for different Java versions.
 
 Under Java 7, test output file has .java1.7.out extension.
 
 Under Java 8, test output file has .java1.8.out extension.
 
 If hint is not added, we continue to generate a .out file for the test.
 
 
 Diffs
 -
 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
 e06d6828aa8924de2b56c8a63cc0955c5bd514d2 
   ql/src/test/queries/clientpositive/varchar_udf1.q 
 0a3012b5cd6d3e0cf065e51e7a680af1f0db859d 
   ql/src/test/results/clientpositive/varchar_udf1.q.java1.7.out PRE-CREATION 
   ql/src/test/results/clientpositive/varchar_udf1.q.java1.8.out PRE-CREATION 
   ql/src/test/results/clientpositive/varchar_udf1.q.out 
 842bd38cb5070994df3a264cc691372384433ae3 
 
 Diff: https://reviews.apache.org/r/29061/diff/
 
 
 Testing
 ---
 
 Tested using varchar_udf1.q.
 Out file changes for this test are included in the patch
 
 
 Thanks,
 
 Mohit Sabharwal

[jira] [Commented] (HIVE-9109) Add support for Java 8 specific q-test out files


[ 
https://issues.apache.org/jira/browse/HIVE-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248508#comment-14248508
 ] 

Brock Noland commented on HIVE-9109:


+1

thank you [~mohitsabharwal]

 Add support for Java 8 specific q-test out files
 

 Key: HIVE-9109
 URL: https://issues.apache.org/jira/browse/HIVE-9109
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Attachments: HIVE-9109.1.patch, HIVE-9109.patch


 Hash function differences between Java 7 and Java 8 lead to result order 
 differences. While we have been able to fix a good number by converting hash 
 maps to insert order hash maps, there are several cases where doing so is 
 either not possible (because changes originate in external APIs) or change 
 leads to even more out file differences.
 For example:
 (1) In TestCliDriver.testCliDriver_varchar_udf1, for the following query:
 {code}
 select
   str_to_map('a:1,b:2,c:3',',',':'),
   str_to_map(cast('a:1,b:2,c:3' as varchar(20)),',',':')
 from varchar_udf_1 limit 1;”)
 {code}
 the {{StandardMapObjectInspector}} used in {{LazySimpleSerDe}} to serialize 
 the final output uses a {{HashMap}}. Changing it to {{LinkedHashMap}} will 
 lead to several other q-test output differences.
 (2) In TestCliDriver.testCliDriver_parquet_map_null, data with {{map}} column 
 is read from an Avro table into a Parquet table. Avro API, specifically 
 {{GenericData.Record}} uses {{HashMap}} and returns data in different order.
 This patch adds supports to specify a hint called 
 {{JAVA_VERSION_SPECIFIC_OUTPUT}} which may be added to a q-test, only if 
 different outputs are expected for Java versions.
 For example:
   Under Java 7, test output file has .java1.7.out extension. 
   Under Java 8, test output file has .java1.8.out extension.
 If hint is not added, we continue to generate a single .out file for the 
 test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9087) The move task does not handle properly in the case of loading data from the local file system path.


 [ 
https://issues.apache.org/jira/browse/HIVE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9087:
---
   Resolution: Fixed
Fix Version/s: encryption-branch
   Status: Resolved  (was: Patch Available)

Thank you [~Ferd] for the patch and [~spena] for the review!! I have committed 
this to branch!

 The move task does not handle properly in the case of loading data from the 
 local file system path.
 ---

 Key: HIVE-9087
 URL: https://issues.apache.org/jira/browse/HIVE-9087
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Fix For: encryption-branch

 Attachments: HIVE-9087.1.patch, HIVE-9087.patch


 NO PRECOMMIT TESTS
 The following exception will be thrown.
 load data local inpath /root/testdata/encrypt_data.txt overwrite into table 
 unencrypteddb.src;
 Getting log thread is interrupted, since query is done!
 14/12/12 19:17:12 ERROR exec.Task: Failed with exception Wrong FS: 
 file:/root/testdata/encrypt_data.txt, expected: hdfs://localhost:9000
 java.lang.IllegalArgumentException: Wrong FS: 
 file:/root/testdata/encrypt_data.txt, expected: hdfs://localhost:9000
   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
   at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
   at org.apache.hadoop.hive.common.FileUtils.isSubDir(FileUtils.java:616)
   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2340)
   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2659)
   at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:666)
   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1571)
   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:289)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1644)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1217)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
   at 
 org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
   at 
 org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:724)
 14/12/12 19:17:12 ERROR ql.Driver: FAILED: Execution Error, return code 1 
 from org.apache.hadoop.hive.ql.exec.MoveTask
 14/12/12 19:17:12 ERROR operation.Operation: Error running hive query: 
 org.apache.hive.service.cli.HiveSQLException: Error while processing 
 statement: FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
   at 
 org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
   at 
 org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
   at 
 org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at

[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356


[ 
https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248529#comment-14248529
 ] 

Jason Dere commented on HIVE-9115:
--

Hmm, not sure why any of these failures would be related. The last 3 do not 
seem to be as they were failing in previous unit test runs. udaf_covar_pop.q is 
a new failure, will run locally to see if there is any issue.

 Hive build failure on hadoop-2.7 due to HADOOP-11356
 

 Key: HIVE-9115
 URL: https://issues.apache.org/jira/browse/HIVE-9115
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
 Attachments: HIVE-9115.1.patch


 HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, 
 causing build break on Hive when compiling against hadoop-2.7:
 {noformat}
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63]
  cannot find symbol
   symbol:   class AccessControlException
   location: package org.apache.hadoop.fs.permission
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356


[ 
https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248533#comment-14248533
 ] 

Jason Dere commented on HIVE-9115:
--

Some discussion from [~ste...@apache.org] on HADOOP-11411, about concerns this 
will cause Hive to fail at runtime when running on Hadoop-2.7.  If so, then 
this fix may also be a candidate to add to 0.14 branch.

{quote}
Steve Loughran added a comment - 7 hours ago
Does this mean that at run time, older versions of Hive will not run against 
Hadoop 2.7?
Jason Dere added a comment - 1 hour ago
No, this just means that Hive will not compile against Hadoop 2.7. Run time 
should work on older versions.
Steve Loughran added a comment - 9 minutes ago
looking at the patch, there's enough referencing of the Hadoop class that there 
may be some link problems
{quote}

 Hive build failure on hadoop-2.7 due to HADOOP-11356
 

 Key: HIVE-9115
 URL: https://issues.apache.org/jira/browse/HIVE-9115
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
 Attachments: HIVE-9115.1.patch


 HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, 
 causing build break on Hive when compiling against hadoop-2.7:
 {noformat}
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63]
  cannot find symbol
   symbol:   class AccessControlException
   location: package org.apache.hadoop.fs.permission
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248536#comment-14248536
 ] 

Jimmy Xiang commented on HIVE-8843:
---

Thought about it again. The current solution seems to be the simplest one.  Did 
I miss anything?

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9124) Performance of query 28 from tpc-ds

Brock Noland created HIVE-9124:
--

 Summary: Performance of query 28 from tpc-ds
 Key: HIVE-9124
 URL: https://issues.apache.org/jira/browse/HIVE-9124
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


As you can see the from the attached screenshot, one stage was submitted at 
{{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
understand:

* What is going on the mean time
* Why is it taking so long



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9124) Performance of query 28 from tpc-ds


 [ 
https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9124:
---
Description: 
As you can see the from the attached screenshot, one stage was submitted at 
{{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
understand:

* What is going on the mean time
* Why is it taking so long

{noformat}
select  *
from (select avg(ss_list_price) B1_LP
,count(ss_list_price) B1_CNT
,count(distinct ss_list_price) B1_CNTD
  from store_sales
  where ss_quantity between 0 and 5
and (ss_list_price between 11 and 11+10 
 or ss_coupon_amt between 460 and 460+1000
 or ss_wholesale_cost between 14 and 14+20)) B1,
 (select avg(ss_list_price) B2_LP
,count(ss_list_price) B2_CNT
,count(distinct ss_list_price) B2_CNTD
  from store_sales
  where ss_quantity between 6 and 10
and (ss_list_price between 91 and 91+10
  or ss_coupon_amt between 1430 and 1430+1000
  or ss_wholesale_cost between 32 and 32+20)) B2,
 (select avg(ss_list_price) B3_LP
,count(ss_list_price) B3_CNT
,count(distinct ss_list_price) B3_CNTD
  from store_sales
  where ss_quantity between 11 and 15
and (ss_list_price between 66 and 66+10
  or ss_coupon_amt between 920 and 920+1000
  or ss_wholesale_cost between 4 and 4+20)) B3,
 (select avg(ss_list_price) B4_LP
,count(ss_list_price) B4_CNT
,count(distinct ss_list_price) B4_CNTD
  from store_sales
  where ss_quantity between 16 and 20
and (ss_list_price between 142 and 142+10
  or ss_coupon_amt between 3054 and 3054+1000
  or ss_wholesale_cost between 80 and 80+20)) B4,
 (select avg(ss_list_price) B5_LP
,count(ss_list_price) B5_CNT
,count(distinct ss_list_price) B5_CNTD
  from store_sales
  where ss_quantity between 21 and 25
and (ss_list_price between 135 and 135+10
  or ss_coupon_amt between 14180 and 14180+1000
  or ss_wholesale_cost between 38 and 38+20)) B5,
 (select avg(ss_list_price) B6_LP
,count(ss_list_price) B6_CNT
,count(distinct ss_list_price) B6_CNTD
  from store_sales
  where ss_quantity between 26 and 30
and (ss_list_price between 28 and 28+10
  or ss_coupon_amt between 2513 and 2513+1000
  or ss_wholesale_cost between 42 and 42+20)) B6
limit 100
{noformat}

  was:
As you can see the from the attached screenshot, one stage was submitted at 
{{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
understand:

* What is going on the mean time
* Why is it taking so long


 Performance of query 28 from tpc-ds
 ---

 Key: HIVE-9124
 URL: https://issues.apache.org/jira/browse/HIVE-9124
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
 Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png


 As you can see the from the attached screenshot, one stage was submitted at 
 {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
 next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
 understand:
 * What is going on the mean time
 * Why is it taking so long
 {noformat}
 select  *
 from (select avg(ss_list_price) B1_LP
 ,count(ss_list_price) B1_CNT
 ,count(distinct ss_list_price) B1_CNTD
   from store_sales
   where ss_quantity between 0 and 5
 and (ss_list_price between 11 and 11+10 
  or ss_coupon_amt between 460 and 460+1000
  or ss_wholesale_cost between 14 and 14+20)) B1,
  (select avg(ss_list_price) B2_LP
 ,count(ss_list_price) B2_CNT
 ,count(distinct ss_list_price) B2_CNTD
   from store_sales
   where ss_quantity between 6 and 10
 and (ss_list_price between 91 and 91+10
   or ss_coupon_amt between 1430 and 1430+1000
   or ss_wholesale_cost between 32 and 32+20)) B2,
  (select avg(ss_list_price) B3_LP
 ,count(ss_list_price) B3_CNT
 ,count(distinct ss_list_price) B3_CNTD
   from store_sales
   where ss_quantity between 11 and 15
 and (ss_list_price between 66 and 66+10
   or ss_coupon_amt between 920 and 920+1000
   or ss_wholesale_cost between 4 and 4+20)) B3,
  (select avg(ss_list_price) B4_LP
 ,count(ss_list_price) B4_CNT
 ,count(distinct ss_list_price) B4_CNTD
   from store_sales
   where ss_quantity between 16 and 20
 and

[jira] [Updated] (HIVE-9124) Performance of query 28 from tpc-ds


 [ 
https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9124:
---
Attachment: Screen Shot 2014-12-16 at 9.30.41 AM.png

 Performance of query 28 from tpc-ds
 ---

 Key: HIVE-9124
 URL: https://issues.apache.org/jira/browse/HIVE-9124
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
 Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png


 As you can see the from the attached screenshot, one stage was submitted at 
 {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
 next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
 understand:
 * What is going on the mean time
 * Why is it taking so long
 {noformat}
 select  *
 from (select avg(ss_list_price) B1_LP
 ,count(ss_list_price) B1_CNT
 ,count(distinct ss_list_price) B1_CNTD
   from store_sales
   where ss_quantity between 0 and 5
 and (ss_list_price between 11 and 11+10 
  or ss_coupon_amt between 460 and 460+1000
  or ss_wholesale_cost between 14 and 14+20)) B1,
  (select avg(ss_list_price) B2_LP
 ,count(ss_list_price) B2_CNT
 ,count(distinct ss_list_price) B2_CNTD
   from store_sales
   where ss_quantity between 6 and 10
 and (ss_list_price between 91 and 91+10
   or ss_coupon_amt between 1430 and 1430+1000
   or ss_wholesale_cost between 32 and 32+20)) B2,
  (select avg(ss_list_price) B3_LP
 ,count(ss_list_price) B3_CNT
 ,count(distinct ss_list_price) B3_CNTD
   from store_sales
   where ss_quantity between 11 and 15
 and (ss_list_price between 66 and 66+10
   or ss_coupon_amt between 920 and 920+1000
   or ss_wholesale_cost between 4 and 4+20)) B3,
  (select avg(ss_list_price) B4_LP
 ,count(ss_list_price) B4_CNT
 ,count(distinct ss_list_price) B4_CNTD
   from store_sales
   where ss_quantity between 16 and 20
 and (ss_list_price between 142 and 142+10
   or ss_coupon_amt between 3054 and 3054+1000
   or ss_wholesale_cost between 80 and 80+20)) B4,
  (select avg(ss_list_price) B5_LP
 ,count(ss_list_price) B5_CNT
 ,count(distinct ss_list_price) B5_CNTD
   from store_sales
   where ss_quantity between 21 and 25
 and (ss_list_price between 135 and 135+10
   or ss_coupon_amt between 14180 and 14180+1000
   or ss_wholesale_cost between 38 and 38+20)) B5,
  (select avg(ss_list_price) B6_LP
 ,count(ss_list_price) B6_CNT
 ,count(distinct ss_list_price) B6_CNTD
   from store_sales
   where ss_quantity between 26 and 30
 and (ss_list_price between 28 and 28+10
   or ss_coupon_amt between 2513 and 2513+1000
   or ss_wholesale_cost between 42 and 42+20)) B6
 limit 100
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9125) RSC stdout is logged twice

Brock Noland created HIVE-9125:
--

 Summary: RSC stdout is logged twice
 Key: HIVE-9125
 URL: https://issues.apache.org/jira/browse/HIVE-9125
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


This is quite strange and I don't see the issue at first glance.

{noformat}
2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
[PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] 
[Times: user=1.14 sys=0.00, real=0.19 secs] 
2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
[PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] 
[Times: user=1.14 sys=0.00, real=0.19 secs] 
{noformat}


{noformat}
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 
daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 
daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
2014-12-16 12:44:48,555 INFO  [stdout-redir-1]:

[jira] [Commented] (HIVE-9125) RSC stdout is logged twice


[ 
https://issues.apache.org/jira/browse/HIVE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248559#comment-14248559
 ] 

Brock Noland commented on HIVE-9125:


FYI [~vanzin] [~xuefuz]

 RSC stdout is logged twice
 --

 Key: HIVE-9125
 URL: https://issues.apache.org/jira/browse/HIVE-9125
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland

 This is quite strange and I don't see the issue at first glance.
 {noformat}
 2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
 [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 
 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 
 2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
 [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 
 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 
 {noformat}
 {noformat}
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 
 sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 
 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 
 sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 
 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at

[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-16 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248560#comment-14248560
 ] 

Marcelo Vanzin commented on HIVE-9094:
--

60s sounds reasonable. This initial timeout will always be hard to figure out, 
since launching the app will depend a lot on the cluster being used and a bunch 
of other things... :-/

Perhaps we could add some kind of context status even that the client side 
can listen to, and the driver side can periodically send the update... but that 
would probably still need some kind of timeout. Anyway, raising the timeout 
sounds fine for now.

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li

 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:176)
 at junit.framework.TestCase.runBare(TestCase.java:141)
 at junit.framework.TestResult$1.protect(TestResult.java:122)
 at junit.framework.TestResult.runProtected(TestResult.java:142)
 at junit.framework.TestResult.run(TestResult.java:125)
 at junit.framework.TestCase.run(TestCase.java:129)
 at junit.framework.TestSuite.runTest(TestSuite.java:255)
 at junit.framework.TestSuite.run(TestSuite.java:250)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at

[jira] [Updated] (HIVE-9125) RSC stdout is logged twice


 [ 
https://issues.apache.org/jira/browse/HIVE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9125:
---
Priority: Minor  (was: Major)

 RSC stdout is logged twice
 --

 Key: HIVE-9125
 URL: https://issues.apache.org/jira/browse/HIVE-9125
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Priority: Minor

 This is quite strange and I don't see the issue at first glance.
 {noformat}
 2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
 [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 
 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 
 2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
 [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 
 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 
 {noformat}
 {noformat}
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 
 sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 
 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 
 sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 
 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at

[jira] [Updated] (HIVE-9125) RSC stdout is logged twice [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9125:
---
Affects Version/s: spark-branch
  Summary: RSC stdout is logged twice [Spark Branch]  (was: RSC 
stdout is logged twice)

 RSC stdout is logged twice [Spark Branch]
 -

 Key: HIVE-9125
 URL: https://issues.apache.org/jira/browse/HIVE-9125
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Priority: Minor

 This is quite strange and I don't see the issue at first glance.
 {noformat}
 2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
 [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 
 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 
 2014-12-16 12:44:48,826 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC 
 [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 
 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 
 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 
 {noformat}
 {noformat}
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 
 sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 
 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) - 
 sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 
 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000]
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 2014-12-16 12:44:48,554 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
 2014-12-16 12:44:48,555 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
 2014-12-16 12:44:48,555

[jira] [Assigned] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356


 [ 
https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-9115:


Assignee: Jason Dere

 Hive build failure on hadoop-2.7 due to HADOOP-11356
 

 Key: HIVE-9115
 URL: https://issues.apache.org/jira/browse/HIVE-9115
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9115.1.patch


 HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, 
 causing build break on Hive when compiling against hadoop-2.7:
 {noformat}
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63]
  cannot find symbol
   symbol:   class AccessControlException
   location: package org.apache.hadoop.fs.permission
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9106) improve the performance of null scan optimizer when several table scans share a physical path

2014-12-16 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248564#comment-14248564
 ] 

Pengcheng Xiong commented on HIVE-9106:
---

[~ashutoshc], I studied the 4 failures. 2 out of the 4, 
vector_partition_diff_num_cols and vector_partition_diff_num_cols are probably 
due to Sergey's enable CBO patch (and the following Vikram's patch is not 
updated so that the column names are different). I run 
groupby3_map_multi_distinct.q and optimize_nullscan.q, both of them passed on 
my laptop. Thus, I think it is safe to check in. [~jpullokkaran], please let us 
know if you have other opinions. 

 improve the performance of null scan optimizer when several table scans share 
 a physical path
 -

 Key: HIVE-9106
 URL: https://issues.apache.org/jira/browse/HIVE-9106
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-9106.00.patch


 Current fix HIVE-9053 addresses the correctness issue. The solution can be 
 improved further when several table scans share a physical path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

What more Hive can do when compared to PIG

2014-12-16 Thread Mohan Krishna

*Hello all*

*Can somebody help me in getting the answer for the below question*

*Its regarding PIG vs HIVE:*

We knew that PIG for large data sets analysis and Hive is good at data
summrization and adhoc queries. But,I want to know , an usecase where Hive
can handle it and the same can not be acheived with PIG

I mean to say, what more a HIve query can achieve when the same is not
possible with PIG latin script

if possible i want to know the viceversa case as well


Thanks
Mohan
469-274-5677

[jira] [Commented] (HIVE-7024) Escape control characters for explain result


[ 
https://issues.apache.org/jira/browse/HIVE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248574#comment-14248574
 ] 

Hive QA commented on HIVE-7024:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687399/HIVE-7024.4.patch.txt

{color:red}ERROR:{color} -1 due to 120 failed/errored test(s), 6705 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_allchildsarenull
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_push_or
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample2

[jira] [Commented] (HIVE-9124) Performance of query 28 from tpc-ds


[ 
https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248622#comment-14248622
 ] 

Brock Noland commented on HIVE-9124:


There is 5.6 minutes of split generation:

{noformat}
2014-12-16 12:06:30,757 INFO  [sparkDriver-akka.actor.default-dispatcher-15]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits 
start=1418749551759 end=1418749590756 duration=38997 
from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2014-12-16 12:13:39,512 INFO  [sparkDriver-akka.actor.default-dispatcher-15]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits 
start=1418749978847 end=1418750019512 duration=40665 
from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2014-12-16 12:14:52,475 INFO  [sparkDriver-akka.actor.default-dispatcher-15]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits 
start=1418750020483 end=1418750092475 duration=71992 
from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2014-12-16 12:16:19,405 INFO  [sparkDriver-akka.actor.default-dispatcher-15]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits 
start=1418750094132 end=1418750179405 duration=85273 
from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2014-12-16 12:18:42,716 INFO  [sparkDriver-akka.actor.default-dispatcher-15]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits 
start=1418750181259 end=1418750322716 duration=141457 
from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2014-12-16 12:45:13,361 INFO  [sparkDriver-akka.actor.default-dispatcher-3]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits 
start=1418751829900 end=1418751913361 duration=83461 
from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
{noformat}

 Performance of query 28 from tpc-ds
 ---

 Key: HIVE-9124
 URL: https://issues.apache.org/jira/browse/HIVE-9124
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
 Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png, 
 query28-explain.txt


 As you can see the from the attached screenshot, one stage was submitted at 
 {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
 next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
 understand:
 * What is going on the mean time
 * Why is it taking so long
 {noformat}
 select  *
 from (select avg(ss_list_price) B1_LP
 ,count(ss_list_price) B1_CNT
 ,count(distinct ss_list_price) B1_CNTD
   from store_sales
   where ss_quantity between 0 and 5
 and (ss_list_price between 11 and 11+10 
  or ss_coupon_amt between 460 and 460+1000
  or ss_wholesale_cost between 14 and 14+20)) B1,
  (select avg(ss_list_price) B2_LP
 ,count(ss_list_price) B2_CNT
 ,count(distinct ss_list_price) B2_CNTD
   from store_sales
   where ss_quantity between 6 and 10
 and (ss_list_price between 91 and 91+10
   or ss_coupon_amt between 1430 and 1430+1000
   or ss_wholesale_cost between 32 and 32+20)) B2,
  (select avg(ss_list_price) B3_LP
 ,count(ss_list_price) B3_CNT
 ,count(distinct ss_list_price) B3_CNTD
   from store_sales
   where ss_quantity between 11 and 15
 and (ss_list_price between 66 and 66+10
   or ss_coupon_amt between 920 and 920+1000
   or ss_wholesale_cost between 4 and 4+20)) B3,
  (select avg(ss_list_price) B4_LP
 ,count(ss_list_price) B4_CNT
 ,count(distinct ss_list_price) B4_CNTD
   from store_sales
   where ss_quantity between 16 and 20
 and (ss_list_price between 142 and 142+10
   or ss_coupon_amt between 3054 and 3054+1000
   or ss_wholesale_cost between 80 and 80+20)) B4,
  (select avg(ss_list_price) B5_LP
 ,count(ss_list_price) B5_CNT
 ,count(distinct ss_list_price) B5_CNTD
   from store_sales
   where ss_quantity between 21 and 25
 and (ss_list_price between 135 and 135+10
   or ss_coupon_amt between 14180 and 14180+1000
   or ss_wholesale_cost between 38 and 38+20)) B5,
  (select avg(ss_list_price) B6_LP
 ,count(ss_list_price) B6_CNT
 ,count(distinct ss_list_price) B6_CNTD
   from store_sales
   where ss_quantity between 26 and 30
 and (ss_list_price between 28 and 28+10
   or ss_coupon_amt between 2513 and 2513+1000
   or ss_wholesale_cost between 42 and 42+20)) B6
 limit 100
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356

2014-12-16 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248648#comment-14248648
 ] 

Steve Loughran commented on HIVE-9115:
--

+1 for 0.14, though I was thinking more of revert that hadoop change from 
branch-2. We're looking at a minimal 2.7 release before the end of the month 
(first Java 7+ only) release, and don't want any regressions from hadoop 2.6. 
this would be one.

 Hive build failure on hadoop-2.7 due to HADOOP-11356
 

 Key: HIVE-9115
 URL: https://issues.apache.org/jira/browse/HIVE-9115
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9115.1.patch


 HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, 
 causing build break on Hive when compiling against hadoop-2.7:
 {noformat}
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63]
  cannot find symbol
   symbol:   class AccessControlException
   location: package org.apache.hadoop.fs.permission
 [INFO] 1 error
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols


[ 
https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248653#comment-14248653
 ] 

Vaibhav Gumashta commented on HIVE-8827:


I think we should backport this to 14.1 as well. Will create a new jira for 
that.

 Remove SSLv2Hello from list of disabled protocols
 -

 Key: HIVE-8827
 URL: https://issues.apache.org/jira/browse/HIVE-8827
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-8827.1.patch


 Turns out SSLv2Hello is not the same as SSLv2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8988) Support advanced aggregation in Hive to Calcite path

2014-12-16 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248652#comment-14248652
 ] 

Laljo John Pullokkaran commented on HIVE-8988:
--

[~jcamachorodriguez] could you replied the patch. Now after Sergeys patch CBO 
is on by default; we want to make sure all Cube/Rollup/GroupingSet tests 
succeed.

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, 
 HIVE-8988.02.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9126) Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 0.14 branch

Vaibhav Gumashta created HIVE-9126:
--

 Summary: Backport HIVE-8827 (Remove SSLv2Hello from list of 
disabled protocols) to 0.14 branch
 Key: HIVE-9126
 URL: https://issues.apache.org/jira/browse/HIVE-9126
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.1


Check HIVE-8827



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path


 [ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8988:
--
Attachment: HIVE-8988.02.patch

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, 
 HIVE-8988.02.patch, HIVE-8988.02.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path


 [ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8988:
--
Attachment: (was: HIVE-8988.02.patch)

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path


 [ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8988:
--
Attachment: (was: HIVE-8988.02.patch)

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]

Brock Noland created HIVE-9127:
--

 Summary: Cache Map/Reduce works in RSC [Spark Branch]
 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


In HIVE-7431 we disabling caching of Map/Reduce works because some tasks would 
fail. However, we should be able to cache these objects in RSC for split 
generation. See: 
https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
 how this impacts performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9124) Performance of query 28 from tpc-ds


[ 
https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248688#comment-14248688
 ] 

Brock Noland commented on HIVE-9124:


I created HIVE-9127 for the split generation issue. We'll keep this JIRA open 
for further investigation once that issue is closed.

 Performance of query 28 from tpc-ds
 ---

 Key: HIVE-9124
 URL: https://issues.apache.org/jira/browse/HIVE-9124
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
 Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png, 
 query28-explain.txt


 As you can see the from the attached screenshot, one stage was submitted at 
 {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the 
 next stage was not submitted until {{2014/12/16 12:18:42}}. We should 
 understand:
 * What is going on the mean time
 * Why is it taking so long
 {noformat}
 select  *
 from (select avg(ss_list_price) B1_LP
 ,count(ss_list_price) B1_CNT
 ,count(distinct ss_list_price) B1_CNTD
   from store_sales
   where ss_quantity between 0 and 5
 and (ss_list_price between 11 and 11+10 
  or ss_coupon_amt between 460 and 460+1000
  or ss_wholesale_cost between 14 and 14+20)) B1,
  (select avg(ss_list_price) B2_LP
 ,count(ss_list_price) B2_CNT
 ,count(distinct ss_list_price) B2_CNTD
   from store_sales
   where ss_quantity between 6 and 10
 and (ss_list_price between 91 and 91+10
   or ss_coupon_amt between 1430 and 1430+1000
   or ss_wholesale_cost between 32 and 32+20)) B2,
  (select avg(ss_list_price) B3_LP
 ,count(ss_list_price) B3_CNT
 ,count(distinct ss_list_price) B3_CNTD
   from store_sales
   where ss_quantity between 11 and 15
 and (ss_list_price between 66 and 66+10
   or ss_coupon_amt between 920 and 920+1000
   or ss_wholesale_cost between 4 and 4+20)) B3,
  (select avg(ss_list_price) B4_LP
 ,count(ss_list_price) B4_CNT
 ,count(distinct ss_list_price) B4_CNTD
   from store_sales
   where ss_quantity between 16 and 20
 and (ss_list_price between 142 and 142+10
   or ss_coupon_amt between 3054 and 3054+1000
   or ss_wholesale_cost between 80 and 80+20)) B4,
  (select avg(ss_list_price) B5_LP
 ,count(ss_list_price) B5_CNT
 ,count(distinct ss_list_price) B5_CNTD
   from store_sales
   where ss_quantity between 21 and 25
 and (ss_list_price between 135 and 135+10
   or ss_coupon_amt between 14180 and 14180+1000
   or ss_wholesale_cost between 38 and 38+20)) B5,
  (select avg(ss_list_price) B6_LP
 ,count(ss_list_price) B6_CNT
 ,count(distinct ss_list_price) B6_CNTD
   from store_sales
   where ss_quantity between 26 and 30
 and (ss_list_price between 28 and 28+10
   or ss_coupon_amt between 2513 and 2513+1000
   or ss_wholesale_cost between 42 and 42+20)) B6
 limit 100
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7431) When run on spark cluster, some spark tasks may fail


[ 
https://issues.apache.org/jira/browse/HIVE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248691#comment-14248691
 ] 

Brock Noland commented on HIVE-7431:


FYI that I created a jira to re-introduce some caching of these objects in RSC 
in HIVE-9127.

 When run on spark cluster, some spark tasks may fail
 

 Key: HIVE-7431
 URL: https://issues.apache.org/jira/browse/HIVE-7431
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7431.1.patch, HIVE-7431.2.patch


 When running queries on spark, some spark tasks fail (usually the first 
 couple of tasks) with the following stack trace:
 {quote}
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:60)
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:35)
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
 org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
 org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
 ...
 {quote}
 Observed for spark standalone cluster. Not verified for spark on yarn or 
 mesos.
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brock Noland updated HIVE-9127:
---
Description: In HIVE-7431 we disabled caching of Map/Reduce works because
some tasks would fail. However, we should be able to cache these objects in RSC
for split generation. See:
https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
how this impacts performance. (was: In HIVE-7431 we disabling caching of
Map/Reduce works because some tasks would fail. However, we should be able to
cache these objects in RSC for split generation. See:
https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
how this impacts performance.)

Cache Map/Reduce works in RSC [Spark Branch]

Key: HIVE-9127
URL: https://issues.apache.org/jira/browse/HIVE-9127
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Brock Noland

In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would
fail. However, we should be able to cache these objects in RSC for split
generation. See:
https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
how this impacts performance.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29111/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-9041
https://issues.apache.org/jira/browse/HIVE-9041


Repository: hive-git


Description
---

This JIRA removes UnionWork from Spark plan.

UnionWork right now is just a dummy work - in execution, it is translated to 
IdentityTran, which does nothing.
The actually union operation is implemented with rdd.union, which happens when 
a BaseWork has multiple parent BaseWorks. For instance:

 MW_1MW_2
\/
 \  /
 RW_1
 
In this case, MW_1 and MW_2 both translates to RDD_1 and RDD_2, and then we 
create another RDD_3 which is the
result of rdd.union(RDD_1, RDD_2). We then create RDD_4 for RW_1, whose parent 
is RDD_3.

*Changes on GenSparkWork*

To remove the UnionWork, most changes are in GenSparkWork. I got rid of a chunk 
of code that creates UnionWork and link the work with parent works. But, I 
still kept `currentUnionOperators` and `workWithUnionOperators`, since they are 
needed for removing union operators later.

I also changed how `followingWork` is handled. This happens when we have the 
following operator tree:

 TS_0  TS_1
   \   /
\ /
 UNION_2
  /
 RS_3
/
   FS_4
   
(You can see that I ignored quite a few operators here. They are not required 
to illustrate the problem)

In this plan, we will reach `RS_3` via two different paths: `TS_0` and `TS_1`.
The first time we get to `RS_3`, say via `TS_0`, we would break `RS_3` with its 
child, and create a work
for the path `TS_0 - UNION_2 - RS_3`. Let's say the work is `MW_1`.

We then proceed to `FS_4`, create another ReduceWork `RW_2` for it, and link 
`RW_2` with `MW_1`.
We then will visit to `RS_3` for the second time, from `TS_1`, and create 
another work for the path
`TS_1 - UNION_2 - RS_3`, say `MW_3`.

But, the problem is that `RS_3` is already disconnected with `FS_4`. In order 
to link `MW_3` with `RW_2`,
we need to save that information somewhere.

This is why we need `leafOpToChildWorkInfo`. It is actually changed from 
`leafOpToFollowingWork`.
But, I found that we also need to have the edge property between `RS_3` and its 
child saved, in order to connect.

I also encountered a case where two BaseWorks may be connected twice. I've 
explained that in the comments for the source code.

*Changes on SparkPlanGenerator*

Without UnionWork, SparkPlanGenerator can be a bit cleaner. The changes on this 
class are mostly refactoring.
I got rid of some redundant code in `generate(SparkWork)` method, and combined 
`generate(MapWork)` and `generate(ReduceWork)` into one.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
eb758e09888d7864acc9d88c7186ae2de48bc8f7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
438efabb062112da8fefc1bed9d8bd90ade26c67 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 78cbc6d2eebef5b8edc10fe693a1b580a6ee389c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
ad6b09be83a33c0cd97ab9c3bc7d02adb928f1f3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
654ba333969cacaafddec38c3c3f45ccb4b81d4a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
137df65d2bb2de20bca06e47b9e1386ddf511c68 
  ql/src/test/results/clientpositive/spark/auto_join27.q.out 
fb48351bea5df3a19c14c755eb3a3fbb7f503e61 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out 
8472df914b8f5bdcb7974fc6689313d33975a4ad 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 
72b2bd7e9b48033ca9cb1bd96facad42f12b6450 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 
1757d16a736741f90c5d84b7a0cc0c168cb7d3ad 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
04f481d4a304fdc8a86e8cfd305899084bab2e8d 
  ql/src/test/results/clientpositive/spark/join34.q.out 
9a58a228002a2b704541dfed1c713b3880e71f35 
  ql/src/test/results/clientpositive/spark/join35.q.out 
851a98128dca74f0008c20faf717d3cc974150e0 
  ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out 
92693e69a08d1ab2ea0c019f2b7f0634316d1eaf 
  ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 
060745dcc80d69b5d17101c2641c228b949c2fb8 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out 
0a38beab815fb50fbb991d6228f48bb02b009998 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 639f4bd729587ce21b509bd8e3595107c0cf71bc 
  ql/src/test/results/clientpositive/spark/multi_join_union.q.out 
d8dc110c3562e5c1e925553df86fba8ceda55b4a

Re: Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29111/
---

(Updated Dec. 16, 2014, 7:02 p.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-9041
https://issues.apache.org/jira/browse/HIVE-9041


Repository: hive-git


Description
---

This JIRA removes UnionWork from Spark plan.

UnionWork right now is just a dummy work - in execution, it is translated to 
IdentityTran, which does nothing.
The actually union operation is implemented with rdd.union, which happens when 
a BaseWork has multiple parent BaseWorks. For instance:

 MW_1MW_2
\/
 \  /
 RW_1
 
In this case, MW_1 and MW_2 both translates to RDD_1 and RDD_2, and then we 
create another RDD_3 which is the
result of rdd.union(RDD_1, RDD_2). We then create RDD_4 for RW_1, whose parent 
is RDD_3.

*Changes on GenSparkWork*

To remove the UnionWork, most changes are in GenSparkWork. I got rid of a chunk 
of code that creates UnionWork and link the work with parent works. But, I 
still kept `currentUnionOperators` and `workWithUnionOperators`, since they are 
needed for removing union operators later.

I also changed how `followingWork` is handled. This happens when we have the 
following operator tree:

 TS_0  TS_1
   \   /
\ /
 UNION_2
  /
 RS_3
/
   FS_4
   
(You can see that I ignored quite a few operators here. They are not required 
to illustrate the problem)

In this plan, we will reach `RS_3` via two different paths: `TS_0` and `TS_1`.
The first time we get to `RS_3`, say via `TS_0`, we would break `RS_3` with its 
child, and create a work
for the path `TS_0 - UNION_2 - RS_3`. Let's say the work is `MW_1`.

We then proceed to `FS_4`, create another ReduceWork `RW_2` for it, and link 
`RW_2` with `MW_1`.
We then will visit to `RS_3` for the second time, from `TS_1`, and create 
another work for the path
`TS_1 - UNION_2 - RS_3`, say `MW_3`.

But, the problem is that `RS_3` is already disconnected with `FS_4`. In order 
to link `MW_3` with `RW_2`,
we need to save that information somewhere.

This is why we need `leafOpToChildWorkInfo`. It is actually changed from 
`leafOpToFollowingWork`.
But, I found that we also need to have the edge property between `RS_3` and its 
child saved, in order to connect.

I also encountered a case where two BaseWorks may be connected twice. I've 
explained that in the comments for the source code.

*Changes on SparkPlanGenerator*

Without UnionWork, SparkPlanGenerator can be a bit cleaner. The changes on this 
class are mostly refactoring.
I got rid of some redundant code in `generate(SparkWork)` method, and combined 
`generate(MapWork)` and `generate(ReduceWork)` into one.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
eb758e09888d7864acc9d88c7186ae2de48bc8f7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
438efabb062112da8fefc1bed9d8bd90ade26c67 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 78cbc6d2eebef5b8edc10fe693a1b580a6ee389c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
ad6b09be83a33c0cd97ab9c3bc7d02adb928f1f3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
654ba333969cacaafddec38c3c3f45ccb4b81d4a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
137df65d2bb2de20bca06e47b9e1386ddf511c68 
  ql/src/test/results/clientpositive/spark/auto_join27.q.out 
fb48351bea5df3a19c14c755eb3a3fbb7f503e61 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out 
8472df914b8f5bdcb7974fc6689313d33975a4ad 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 
72b2bd7e9b48033ca9cb1bd96facad42f12b6450 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 
1757d16a736741f90c5d84b7a0cc0c168cb7d3ad 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
04f481d4a304fdc8a86e8cfd305899084bab2e8d 
  ql/src/test/results/clientpositive/spark/join34.q.out 
9a58a228002a2b704541dfed1c713b3880e71f35 
  ql/src/test/results/clientpositive/spark/join35.q.out 
851a98128dca74f0008c20faf717d3cc974150e0 
  ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out 
92693e69a08d1ab2ea0c019f2b7f0634316d1eaf 
  ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 
060745dcc80d69b5d17101c2641c228b949c2fb8 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out 
0a38beab815fb50fbb991d6228f48bb02b009998 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 639f4bd729587ce21b509bd8e3595107c0cf71bc 
  ql/src/test/results/clientpositive/spark/multi_join_union.q.out 
d8dc110c3562e5c1e925553df86fba8ceda55b4a

[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path


 [ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8988:
--
Attachment: HIVE-8988.03.patch

.03 is the new patch with CBO enabled.

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, 
 HIVE-8988.03.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9111) Potential NPE in OrcStruct for list and map types

2014-12-16 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9111:

   Resolution: Fixed
Fix Version/s: 0.14.1
   0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14.1

 Potential NPE in OrcStruct for list and map types
 -

 Key: HIVE-9111
 URL: https://issues.apache.org/jira/browse/HIVE-9111
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0, 0.15.0, 0.14.1
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9111.1.patch


 Currently getters in OrcStruct class for list and map object inspectors does 
 not have null checks which may throw NPE when UDFs like size() is used on 
 list or map column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9041) Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9041:
---
Attachment: HIVE-9041.2-spark.patch

Minor change on patch v1 (variable name, comments, etc).
Also included optimize_nullscan.q.out, which is NOT in the patch for RB.

 Generate better plan for queries containing both union and multi-insert 
 [Spark Branch]
 --

 Key: HIVE-9041
 URL: https://issues.apache.org/jira/browse/HIVE-9041
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9041.1-spark.patch, HIVE-9041.2-spark.patch


 This is a follow-up for HIVE-8920. For queries like:
 {code}
 from (select * from table0 union all select * from table1) s
 insert overwrite table table3 select s.x, count(1) group by s.x
 insert overwrite table table4 select s.y, count(1) group by s.y;
 {code}
 Currently we generate the following plan:
 {noformat}
 M1M2
   \  / \
U3   R5
|
R4
 {noformat}
 It's better, however, to have the following plan:
 {noformat}
M1  M2
|\  /|
| \/ |
| /\ |
R4  R5
 {noformat}
 Also, we can do some reseach in this JIRA to see if it's possible
 to remove UnionWork once and for all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9114) union all query in cbo test has undefined ordering

2014-12-16 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248744#comment-14248744
 ] 

Vikram Dixit K commented on HIVE-9114:
--

+1 for 0.14

 union all query in cbo test has undefined ordering
 --

 Key: HIVE-9114
 URL: https://issues.apache.org/jira/browse/HIVE-9114
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9114-branch-0.14.patch, HIVE-9114.patch


 Ordering changes based on platform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20

2014-12-16 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248752#comment-14248752
 ] 

Ashutosh Chauhan commented on HIVE-9122:


Changes in jdbc/pom.xml may not be required. See, HIVE-8270 So, for jdbc just 
change shims-common-secure to shims-common so that we have all requisite class.

 Need to remove additional references to hive-shims-common-secure, 
 hive-shims-0.20
 -

 Key: HIVE-9122
 URL: https://issues.apache.org/jira/browse/HIVE-9122
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9122.1.patch


 HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but 
 we still have a few references to those removed packages:
 {noformat}
 $ find . -name pom.xml -exec egrep 
 hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print
   
 artifactorg.apache.hive.shims:hive-shims-common-secure/artifact
 ./jdbc/pom.xml
   includeorg.apache.hive.shims:hive-shims-0.20/include
   
 includeorg.apache.hive.shims:hive-shims-common-secure/include
 ./ql/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/0.20S/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/0.23/pom.xml
   artifactIdhive-shims-common-secure/artifactId
 ./shims/scheduler/pom.xml
 {noformat}
 While building from trunk, you can see that maven is still pulling an old 
 snapshot of hive-shims-common-secure.jar from repository.apache.org:
 {noformat}
 [INFO] 
 
 [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT
 [INFO] 
 
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml
  (801 B at 7.2 KB/sec)
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom
  (4 KB at 47.0 KB/sec)
 Downloading: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
 Downloaded: 
 http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar
  (33 KB at 277.0 KB/sec)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9128) Evaluate hive.rpc.query.plan performance [Spark Branch]

Brock Noland created HIVE-9128:
--

 Summary: Evaluate hive.rpc.query.plan performance [Spark Branch]
 Key: HIVE-9128
 URL: https://issues.apache.org/jira/browse/HIVE-9128
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


Tez uses 
[hive.rpc.query.plan|https://github.com/apache/hive/blob/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1874]
 which is used in {{Utilities.java}}. Basically instead of writing the query 
plan to HDFS, the query plan is placed in the JobConf object and then 
de-serialized form there.

We should do some evaluation to see which is more performant for us. We might 
need to place some timings in {{Utilities}} to understand this if the PerfLog 
doesn't have enough information today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9114) union all query in cbo test has undefined ordering

2014-12-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9114:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to both

 union all query in cbo test has undefined ordering
 --

 Key: HIVE-9114
 URL: https://issues.apache.org/jira/browse/HIVE-9114
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9114-branch-0.14.patch, HIVE-9114.patch


 Ordering changes based on platform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9113) Explain on query failed with NPE

2014-12-16 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248770#comment-14248770
 ] 

Chao commented on HIVE-9113:


[~Prabhu Joseph] Thanks! You're right, I forgot the from lineitem in the 
subquery. But in any case, I was expecting a more helpful error message, 
perhaps like Parsing Error: missing the FROM keyword at line 4 ..., or 
something similar.

 Explain on query failed with NPE
 

 Key: HIVE-9113
 URL: https://issues.apache.org/jira/browse/HIVE-9113
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chao

 Run explain on the following query:
 {noformat}
 select p.p_partkey, li.l_suppkey
 from (select distinct l_partkey as p_partkey from lineitem) p join lineitem 
 li on p.p_partkey = li.l_partkey
 where li.l_linenumber = 1 and
  li.l_orderkey in (select l_orderkey where l_linenumber = li.l_linenumber)
 ;
 {noformat}
 gave me NPE:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.QBSubQuery.validateAndRewriteAST(QBSubQuery.java:516)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2605)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8866)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9745)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9638)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10125)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}
 Is this query invalid? If so, it should at least give some explanations, not 
 just a plain NPE message, and left user clueless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned HIVE-9127:
--

Assignee: Brock Noland

 Cache Map/Reduce works in RSC [Spark Branch]
 

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland

 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9127:
---
Description: 
In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
fail. However, we should be able to cache these objects in RSC for split 
generation. See: 
https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
 how this impacts performance.

Caller ST:

{noformat}

2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
scala.Option.getOrElse(Option.scala:120)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
scala.Option.getOrElse(Option.scala:120)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
scala.Option.getOrElse(Option.scala:120)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:313)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:247)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:735)
2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at

[jira] [Updated] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9127:
---
Attachment: HIVE-9127.1-spark.patch.txt

 Cache Map/Reduce works in RSC [Spark Branch]
 

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:313)
 2014-12-16

[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance in RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9127:
---
Summary: Improve CombineHiveInputFormat.getSplit performance in RSC [Spark 
Branch]  (was: Improve getSplit performance in RSC [Spark Branch])

 Improve CombineHiveInputFormat.getSplit performance in RSC [Spark Branch]
 -

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16

[jira] [Updated] (HIVE-9127) Improve getSplit performance in RSC [Spark Branch]