date:20141217


[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249620#comment-14249620
 ] 

Hive QA commented on HIVE-9094:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687682/HIVE-9094.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/560/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/560/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-560/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687682 - PreCommit-HIVE-SPARK-Build

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9094.1-spark.patch


 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249631#comment-14249631
 ] 

Hive QA commented on HIVE-9127:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687603/HIVE-9127.3.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2103/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2103/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687603 - PreCommit-HIVE-TRUNK-Build

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at

[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-17 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249653#comment-14249653
 ] 

Lefty Leverenz commented on HIVE-8809:
--

Doc note:  This will require some documentation changes.

The 'mvn clean install' command occurs 15 times in the wiki, and 'mvn' occurs 
43 times.  The string '-Phadoop-1' occurs 24 times and '-Phadoop-2' occurs 12 
times.  They should all be reviewed for possible revisions, with version notes.

These docs contain 'mvn' with '-Phadoop-1' or '-Phadoop-2' (or 
'-Podbc,hadoop-1'):

* Getting Started
* Hive Developer FAQ
* Hive ODBC
* How To Contribute
* How To Release

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, 
 dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, 
 dep_with_hadoop_2.txt, dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo


[ 
https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249654#comment-14249654
 ] 

Hive QA commented on HIVE-9059:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687692/HIVE-9059.2-spark.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part1
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/561/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/561/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-561/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687692 - PreCommit-HIVE-SPARK-Build

 Remove wrappers for SparkJobInfo and SparkStageInfo
 ---

 Key: HIVE-9059
 URL: https://issues.apache.org/jira/browse/HIVE-9059
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, 
 HIVE-9059.2-spark.patch


 SPARK-4567 is resolved. We can remove the wrappers we added to solve the 
 serailization issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9142) Create table stored as ORC with CTAS fails

2014-12-17 Thread karthik palanisamy (JIRA)

karthik palanisamy created HIVE-9142:


 Summary: Create table stored as ORC with CTAS fails
 Key: HIVE-9142
 URL: https://issues.apache.org/jira/browse/HIVE-9142
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
 Environment: Apache Hive 0.14
Reporter: karthik palanisamy
Priority: Blocker


hive create table orc_orc stored as orc as select * from tweets; 

Diagnostic Messages for this Task:
Error: org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex

FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
14/12/17 14:44:54 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 
2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9137) Turn off Hive's PredicateTransitivePropagate optimizer when cbo is on


[ 
https://issues.apache.org/jira/browse/HIVE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249670#comment-14249670
 ] 

Hive QA commented on HIVE-9137:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687631/HIVE-9137.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 6713 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonblock_op_deduplicate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union27
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2104/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2104/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2104/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687631 - PreCommit-HIVE-TRUNK-Build

 Turn off Hive's PredicateTransitivePropagate optimizer when cbo is on
 -

 Key: HIVE-9137
 URL: https://issues.apache.org/jira/browse/HIVE-9137
 Project: Hive
  Issue Type: Task
  Components: CBO, Logical Optimizer
Affects Versions: 0.15.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9137.patch


 Because Calcite contains rule called {{JoinPushTransitivePredicatesRule}} 
 which does exactly this. So, if cbo is on, this optimization would have 
 already taken place and we won't gain anything by running this again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path

2014-12-17 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8988:
--
Attachment: HIVE-8988.04.patch

Patch after HIVE-9129 has been applied to the trunk.

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, 
 HIVE-8988.03.patch, HIVE-8988.04.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249704#comment-14249704
 ] 

Hive QA commented on HIVE-8972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687696/HIVE-8972.4-spark.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/562/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/562/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-562/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687696 - PreCommit-HIVE-SPARK-Build

 Implement more fine-grained remote client-level events [Spark Branch]
 -

 Key: HIVE-8972
 URL: https://issues.apache.org/jira/browse/HIVE-8972
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, 
 HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch


 Follow up task of HIVE-8956.
 Fine-grained events are useful for better job monitor and failure handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9138) Add some explain to PTF operator


[ 
https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249727#comment-14249727
 ] 

Hive QA commented on HIVE-9138:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687635/HIVE-9138.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2105/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2105/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2105/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687635 - PreCommit-HIVE-TRUNK-Build

 Add some explain to PTF operator
 

 Key: HIVE-9138
 URL: https://issues.apache.org/jira/browse/HIVE-9138
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9138.1.patch.txt


 PTFOperator does not explain anything in explain statement, making it hard to 
 understand the internal works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]

2014-12-17 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8972:
-
Attachment: HIVE-8972.5-spark.patch

Try again.
The failures {{union_remove_10}} and {{join10}} are all due to timeout getting 
cluster infos, which seems unrelated to the patch.
{noformat}
2014-12-17 02:24:30,458 ERROR [main]: ql.Driver 
(SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
spark memory/core info: java.util.concurrent.TimeoutException
org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
memory/core info: java.util.concurrent.TimeoutException
at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10(TestSparkCliDriver.java:210)
..
{noformat}

 Implement more fine-grained remote client-level events [Spark Branch]
 -

 Key: HIVE-8972
 URL: https://issues.apache.org/jira/browse/HIVE-8972
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, 
 HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, 
 HIVE-8972.5-spark.patch


 Follow up task of HIVE-8956.
 Fine-grained events are useful for better job monitor and failure handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28941: HIVE-8988

2014-12-17 Thread Jesús Camacho Rodríguez


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28941/
---

(Updated Dec. 17, 2014, 12:11 p.m.)


Review request for hive, John Pullokkaran and Julian Hyde.


Changes
---

Latest patch after CBO enabled and dependencies on Calcite have been solved.


Bugs: HIVE-8988
https://issues.apache.org/jira/browse/HIVE-8988


Repository: hive-git


Description
---

HIVE-8988


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveGroupingID.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
 c02a65e2041e4742a56cf4a935da0a7c04d18fdb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
29be69182667dbe2070bd092bf75b4bb97101554 
  ql/src/test/queries/clientpositive/groupby_cube1.q 
c12720b27059075050fc92d9f31420c081303699 
  ql/src/test/results/clientpositive/groupby_cube1.q.out 
7b5d70ae8ffce47a4b351ed9dfedcd15ab1e139c 

Diff: https://reviews.apache.org/r/28941/diff/


Testing
---


Thanks,

Jesús Camacho Rodríguez

[jira] [Commented] (HIVE-9140) Add ReduceExpressionRules from Calcite into Hive


[ 
https://issues.apache.org/jira/browse/HIVE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249793#comment-14249793
 ] 

Hive QA commented on HIVE-9140:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687667/HIVE-9140.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2106/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2106/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2106/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687667 - PreCommit-HIVE-TRUNK-Build

 Add ReduceExpressionRules from Calcite into Hive
 

 Key: HIVE-9140
 URL: https://issues.apache.org/jira/browse/HIVE-9140
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9140.patch


 These rules provide a form of constant folding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9113) Explain on query failed with NPE


[ 
https://issues.apache.org/jira/browse/HIVE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249865#comment-14249865
 ] 

Hive QA commented on HIVE-9113:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687671/HIVE-9113.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2107/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2107/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687671 - PreCommit-HIVE-TRUNK-Build

 Explain on query failed with NPE
 

 Key: HIVE-9113
 URL: https://issues.apache.org/jira/browse/HIVE-9113
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chao
Assignee: Navis
 Attachments: HIVE-9113.1.patch.txt


 Run explain on the following query:
 {noformat}
 select p.p_partkey, li.l_suppkey
 from (select distinct l_partkey as p_partkey from lineitem) p join lineitem 
 li on p.p_partkey = li.l_partkey
 where li.l_linenumber = 1 and
  li.l_orderkey in (select l_orderkey where l_linenumber = li.l_linenumber)
 ;
 {noformat}
 gave me NPE:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.QBSubQuery.validateAndRewriteAST(QBSubQuery.java:516)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2605)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8866)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9745)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9638)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10125)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}
 Is this query invalid? If so, it should at least give some explanations, not 
 just a plain NPE message, and left user clueless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7024) Escape control characters for explain result


[ 
https://issues.apache.org/jira/browse/HIVE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249870#comment-14249870
 ] 

Hive QA commented on HIVE-7024:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687684/HIVE-7024.5.patch.txt

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2108/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2108/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2108/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2108/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/core/target 
hcatalog/streaming/target hcatalog/server-extensions/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target 
common/target common/src/gen service/target contrib/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/results/clientnegative/subquery_missing_from.q.out 
ql/src/test/queries/clientnegative/subquery_missing_from.q
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1646247.

At revision 1646247.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687684 - PreCommit-HIVE-TRUNK-Build

 Escape control characters for explain result
 

 Key: HIVE-7024
 URL: https://issues.apache.org/jira/browse/HIVE-7024
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7024.1.patch.txt, HIVE-7024.2.patch.txt, 
 HIVE-7024.3.patch.txt, HIVE-7024.4.patch.txt, HIVE-7024.5.patch.txt


 Comments for columns are now delimited by 0x00, which is binary and make git 
 refuse to make proper diff file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9143) select user(), current_user()

Hari Sekhon created HIVE-9143:
-

 Summary: select user(), current_user()
 Key: HIVE-9143
 URL: https://issues.apache.org/jira/browse/HIVE-9143
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Hari Sekhon
Priority: Minor


Feature request to add support for determining in SQL session which user I am 
currently connected as - an old MySQL ability:
{code}mysql select user(), current_user();
+++
| user() | current_user() |
+++
| root@localhost | root@localhost |
+++
1 row in set (0.00 sec)
{code}
which doesn't seem to have a counterpart in Hive at this time:
{code}0: jdbc:hive2://host:100 select user();
Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
Invalid function 'user' (state=42000,code=4)
0: jdbc:hive2://host:100 select current_user();
Error: Error while compiling statement: FAILED: SemanticException [Error 
10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code}

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9143) select user(), current_user()


 [ 
https://issues.apache.org/jira/browse/HIVE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HIVE-9143:
--
Description: 
Feature request to add support for determining in HQL session which user I am 
currently connected as - an old MySQL ability:
{code}mysql select user(), current_user();
+++
| user() | current_user() |
+++
| root@localhost | root@localhost |
+++
1 row in set (0.00 sec)
{code}
which doesn't seem to have a counterpart in Hive at this time:
{code}0: jdbc:hive2://host:100 select user();
Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
Invalid function 'user' (state=42000,code=4)
0: jdbc:hive2://host:100 select current_user();
Error: Error while compiling statement: FAILED: SemanticException [Error 
10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code}

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
Feature request to add support for determining in SQL session which user I am 
currently connected as - an old MySQL ability:
{code}mysql select user(), current_user();
+++
| user() | current_user() |
+++
| root@localhost | root@localhost |
+++
1 row in set (0.00 sec)
{code}
which doesn't seem to have a counterpart in Hive at this time:
{code}0: jdbc:hive2://host:100 select user();
Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
Invalid function 'user' (state=42000,code=4)
0: jdbc:hive2://host:100 select current_user();
Error: Error while compiling statement: FAILED: SemanticException [Error 
10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code}

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 select user(), current_user()
 -

 Key: HIVE-9143
 URL: https://issues.apache.org/jira/browse/HIVE-9143
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Hari Sekhon
Priority: Minor

 Feature request to add support for determining in HQL session which user I am 
 currently connected as - an old MySQL ability:
 {code}mysql select user(), current_user();
 +++
 | user() | current_user() |
 +++
 | root@localhost | root@localhost |
 +++
 1 row in set (0.00 sec)
 {code}
 which doesn't seem to have a counterpart in Hive at this time:
 {code}0: jdbc:hive2://host:100 select user();
 Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
 Invalid function 'user' (state=42000,code=4)
 0: jdbc:hive2://host:100 select current_user();
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10011]: Line 1:7 Invalid function 'current_user' 
 (state=42000,code=10011){code}
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9144) Beeline + Kerberos shouldn't prompt for unused username + password

Hari Sekhon created HIVE-9144:
-

 Summary: Beeline + Kerberos shouldn't prompt for unused username + 
password
 Key: HIVE-9144
 URL: https://issues.apache.org/jira/browse/HIVE-9144
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.13.0
 Environment: Hive 0.13 on MapR 4.0.1
Reporter: Hari Sekhon
Priority: Minor


When using beeline to connect to a kerberized HiveServer2 it still prompts for 
a username and password that aren't used. It should be changed to not prompt 
when using Kerberos:
{code}/opt/mapr/hive/hive-0.13/bin/beeline
Beeline version 0.13.0-mapr-1409 by Apache Hive
beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM
scan complete in 6ms
Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM
Enter username for 
jdbc:hive2://lonsl1101975.uk.net.intra:1/default;principal=hive/host@REALM:
 wronguser
Enter password for 
jdbc:hive2://host:1/default;principal=hive/host@REALM: enter
Connected to: Apache Hive (version 0.13.0-mapr-1409)
Driver: Hive JDBC (version 0.13.0-mapr-1409)
Transaction isolation: TRANSACTION_REPEATABLE_READ
{code}
Hive conf includes (as concisely shown by set):
{code}hive.server2.authentication = KERBEROS
hive.server2.enable.doAs = true
hive.server2.enable.impersonation = true
{code}
I can't see how to demonstrate in HQL session that I am not connected as 
wronguser (which obviously doesn't exist either locally or as a Kerberos 
principal or account in my LDAP directory), so I've raised another ticket for 
that HIVE-9143, but it should be clear given I specifed a non-existent user and 
a completely blank password just hitting enter that it's not using those 
credentials. Same happens with enter, enter for both username and password.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249916#comment-14249916
 ] 

Xuefu Zhang commented on HIVE-9127:
---

+1. Please modify the query if the patch is going to apply to trunk.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203 INFO

[jira] [Updated] (HIVE-9144) Beeline + Kerberos shouldn't prompt for unused username + password


 [ 
https://issues.apache.org/jira/browse/HIVE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HIVE-9144:
--
Description: 
When using beeline to connect to a kerberized HiveServer2 it still prompts for 
a username and password that aren't used. It should be changed to not prompt 
when using Kerberos:
{code}/opt/mapr/hive/hive-0.13/bin/beeline
Beeline version 0.13.0-mapr-1409 by Apache Hive
beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM
scan complete in 6ms
Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM
Enter username for 
jdbc:hive2://host:1/default;principal=hive/host@REALM: wronguser
Enter password for 
jdbc:hive2://host:1/default;principal=hive/host@REALM: enter
Connected to: Apache Hive (version 0.13.0-mapr-1409)
Driver: Hive JDBC (version 0.13.0-mapr-1409)
Transaction isolation: TRANSACTION_REPEATABLE_READ
{code}
Hive conf includes (as concisely shown by set):
{code}hive.server2.authentication = KERBEROS
hive.server2.enable.doAs = true
hive.server2.enable.impersonation = true
{code}
I can't see how to demonstrate in HQL session that I am not connected as 
wronguser (which obviously doesn't exist either locally or as a Kerberos 
principal or account in my LDAP directory), so I've raised another ticket for 
that HIVE-9143, but it should be clear given I specifed a non-existent user and 
a completely blank password just hitting enter that it's not using those 
credentials. Same happens with enter, enter for both username and password.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
When using beeline to connect to a kerberized HiveServer2 it still prompts for 
a username and password that aren't used. It should be changed to not prompt 
when using Kerberos:
{code}/opt/mapr/hive/hive-0.13/bin/beeline
Beeline version 0.13.0-mapr-1409 by Apache Hive
beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM
scan complete in 6ms
Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM
Enter username for 
jdbc:hive2://lonsl1101975.uk.net.intra:1/default;principal=hive/host@REALM:
 wronguser
Enter password for 
jdbc:hive2://host:1/default;principal=hive/host@REALM: enter
Connected to: Apache Hive (version 0.13.0-mapr-1409)
Driver: Hive JDBC (version 0.13.0-mapr-1409)
Transaction isolation: TRANSACTION_REPEATABLE_READ
{code}
Hive conf includes (as concisely shown by set):
{code}hive.server2.authentication = KERBEROS
hive.server2.enable.doAs = true
hive.server2.enable.impersonation = true
{code}
I can't see how to demonstrate in HQL session that I am not connected as 
wronguser (which obviously doesn't exist either locally or as a Kerberos 
principal or account in my LDAP directory), so I've raised another ticket for 
that HIVE-9143, but it should be clear given I specifed a non-existent user and 
a completely blank password just hitting enter that it's not using those 
credentials. Same happens with enter, enter for both username and password.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 Beeline + Kerberos shouldn't prompt for unused username + password
 --

 Key: HIVE-9144
 URL: https://issues.apache.org/jira/browse/HIVE-9144
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.13.0
 Environment: Hive 0.13 on MapR 4.0.1
Reporter: Hari Sekhon
Priority: Minor

 When using beeline to connect to a kerberized HiveServer2 it still prompts 
 for a username and password that aren't used. It should be changed to not 
 prompt when using Kerberos:
 {code}/opt/mapr/hive/hive-0.13/bin/beeline
 Beeline version 0.13.0-mapr-1409 by Apache Hive
 beeline !connect 
 jdbc:hive2://host:1/default;principal=hive/host@REALM
 scan complete in 6ms
 Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM
 Enter username for 
 jdbc:hive2://host:1/default;principal=hive/host@REALM: wronguser
 Enter password for 
 jdbc:hive2://host:1/default;principal=hive/host@REALM: enter
 Connected to: Apache Hive (version 0.13.0-mapr-1409)
 Driver: Hive JDBC (version 0.13.0-mapr-1409)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 {code}
 Hive conf includes (as concisely shown by set):
 {code}hive.server2.authentication = KERBEROS
 hive.server2.enable.doAs = true
 hive.server2.enable.impersonation = true
 {code}
 I can't see how to demonstrate in HQL session that I am not connected as 
 wronguser (which obviously doesn't exist either locally or as a Kerberos 
 principal or account in my LDAP directory), so I've raised another ticket for 
 that HIVE-9143, but it should be clear given I specifed a non-existent user 
 and a completely blank password just hitting enter that it's not using those 
 credentials. Same happens

Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-17 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29145/#review65323
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/29145/#comment108440

If the same timeout is used for multiple rpc calls, then the description 
here might need to be updated.


- Xuefu Zhang


On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29145/
 ---
 
 (Updated Dec. 17, 2014, 6:28 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9094
 https://issues.apache.org/jira/browse/HIVE-9094
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has 
 not launched yet
 1. set the timeout value configurable.
 2. set default timeout value 60s.
 3. enable timeout for get spark job info and get spark stage info.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 5d6a02c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 e1946d5 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
  6217de4 
 
 Diff: https://reviews.apache.org/r/29145/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249932#comment-14249932
 ] 

Xuefu Zhang commented on HIVE-9094:
---

Minor comments on RB. [~vanzin], could you also take a look?

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9094.1-spark.patch


 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:176)
 at junit.framework.TestCase.runBare(TestCase.java:141)
 at junit.framework.TestResult$1.protect(TestResult.java:122)
 at junit.framework.TestResult.runProtected(TestResult.java:142)
 at junit.framework.TestResult.run(TestResult.java:125)
 at junit.framework.TestCase.run(TestCase.java:129)
 at junit.framework.TestSuite.runTest(TestSuite.java:255)
 at junit.framework.TestSuite.run(TestSuite.java:250)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at

Re: Review Request 29147: HIVE-9059 Remove wrappers for SparkJobInfo and SparkStageInfo

2014-12-17 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29147/#review65324
---

Ship it!


Ship It!

- Xuefu Zhang


On Dec. 17, 2014, 7:29 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29147/
 ---
 
 (Updated Dec. 17, 2014, 7:29 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9059
 https://issues.apache.org/jira/browse/HIVE-9059
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 SPARK-4567 is resolved. We can remove the wrappers we added to solve the 
 serailization issues.
 
 
 Diffs
 -
 
   pom.xml b3a22b5 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
  6217de4 
   
 spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java
  437d61d 
   
 spark-client/src/main/java/org/apache/hive/spark/client/status/HiveSparkJobInfo.java
  8ea6969 
   
 spark-client/src/main/java/org/apache/hive/spark/client/status/HiveSparkStageInfo.java
  dfbb01e 
 
 Diff: https://reviews.apache.org/r/29147/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo


[ 
https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249942#comment-14249942
 ] 

Xuefu Zhang commented on HIVE-9059:
---

+1

 Remove wrappers for SparkJobInfo and SparkStageInfo
 ---

 Key: HIVE-9059
 URL: https://issues.apache.org/jira/browse/HIVE-9059
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, 
 HIVE-9059.2-spark.patch


 SPARK-4567 is resolved. We can remove the wrappers we added to solve the 
 serailization issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249959#comment-14249959
 ] 

Hive QA commented on HIVE-8972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687722/HIVE-8972.5-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/563/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/563/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-563/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687722 - PreCommit-HIVE-SPARK-Build

 Implement more fine-grained remote client-level events [Spark Branch]
 -

 Key: HIVE-8972
 URL: https://issues.apache.org/jira/browse/HIVE-8972
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, 
 HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, 
 HIVE-8972.5-spark.patch


 Follow up task of HIVE-8956.
 Fine-grained events are useful for better job monitor and failure handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249976#comment-14249976
 ] 

Hive QA commented on HIVE-9076:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687687/HIVE-9076.4.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2109/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2109/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2109/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687687 - PreCommit-HIVE-TRUNK-Build

 incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
 check
 ---

 Key: HIVE-9076
 URL: https://issues.apache.org/jira/browse/HIVE-9076
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-9076.1.patch.txt, HIVE-9076.2.patch.txt, 
 HIVE-9076.3.patch.txt, HIVE-9076.4.patch.txt


 In some file composition, AbstractFileMergeOperator removes incompatible 
 files. For example,
 {noformat}
 00_0 (v12)
 00_0_copy_1 (v12)
 00_1 (v11)
 00_1_copy_1 (v11)
 00_1_copy_2 (v11)
 00_2 (v12)
 {noformat}
 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9145) authorization_admin_almighty1.q fails with result diff [Spark Branch]

Xuefu Zhang created HIVE-9145:
-

 Summary: authorization_admin_almighty1.q fails with result diff 
[Spark Branch]
 Key: HIVE-9145
 URL: https://issues.apache.org/jira/browse/HIVE-9145
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang


HIVE-7979 enabled this test. However, the test result seems having a timestamp 
that depends on the date when the test run, which makes the test fail. The same 
test on trunk give -1 for the timestamp value and thus pass all the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9130) vector_partition_diff_num_cols result is not updated after CBO upgrade


[ 
https://issues.apache.org/jira/browse/HIVE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250041#comment-14250041
 ] 

Sergey Shelukhin commented on HIVE-9130:


Thanks!

  vector_partition_diff_num_cols result is not updated after CBO upgrade
 ---

 Key: HIVE-9130
 URL: https://issues.apache.org/jira/browse/HIVE-9130
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.15.0

 Attachments: HIVE-9130.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8843:
--
Attachment: HIVE-8843.3-spark.patch

Attached v3 again to re-run the tests.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-9135:
-

Assignee: Jimmy Xiang

 Cache Map and Reduce works in RSC [Spark Branch]
 

 Key: HIVE-9135
 URL: https://issues.apache.org/jira/browse/HIVE-9135
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Jimmy Xiang

 HIVE-9127 works around the fact that we don't cache Map/Reduce works in 
 Spark. However, other input formats such as HiveInputFormat will not benefit 
 from that fix. We should investigate how to allow caching on the RSC while 
 not on tasks (see HIVE-7431).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9059:
--
Summary: Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch] 
 (was: Remove wrappers for SparkJobInfo and SparkStageInfo)

 Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]
 --

 Key: HIVE-9059
 URL: https://issues.apache.org/jira/browse/HIVE-9059
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, 
 HIVE-9059.2-spark.patch


 SPARK-4567 is resolved. We can remove the wrappers we added to solve the 
 serailization issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9059:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Chengxiang.

 Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]
 --

 Key: HIVE-9059
 URL: https://issues.apache.org/jira/browse/HIVE-9059
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, 
 HIVE-9059.2-spark.patch


 SPARK-4567 is resolved. We can remove the wrappers we added to solve the 
 serailization issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly


[ 
https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250085#comment-14250085
 ] 

Sergey Shelukhin commented on HIVE-8848:


+1

 data loading from text files or text file processing doesn't handle nulls 
 correctly
 ---

 Key: HIVE-8848
 URL: https://issues.apache.org/jira/browse/HIVE-8848
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-8848.01.patch, HIVE-8848.2.patch.txt, 
 HIVE-8848.3.patch.txt, HIVE-8848.4.patch.txt, HIVE-8848.patch


 I am not sure how nulls are supposed to be stored in text tables, but after 
 loading some data with null or NULL strings, or x00 characters, we get 
 bunch of annoying logging from LazyPrimitive that data is not in INT format 
 and was converted to null, with data being null (string saying null, I 
 assume from the code).
 Either load should load them as nulls, or there should be some defined way to 
 load nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8406) Research on skewed join [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250088#comment-14250088
 ] 

Xuefu Zhang commented on HIVE-8406:
---

[~leftylev], this is just a research task. It doesn't seem needing any doc.

 Research on skewed join [Spark Branch]
 --

 Key: HIVE-8406
 URL: https://issues.apache.org/jira/browse/HIVE-8406
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: Skew join background.pdf


 Research on how to handle skewed join for hive on spark. Here is original 
 hive's design doc for skewed join, 
 https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7977) Avoid creating serde for partitions if possible in FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250092#comment-14250092
 ] 

Ashutosh Chauhan commented on HIVE-7977:


Don't think so. If problem persists, may be just create a new RB, instead of 
updating previous one.

 Avoid creating serde for partitions if possible in FetchTask
 

 Key: HIVE-7977
 URL: https://issues.apache.org/jira/browse/HIVE-7977
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7977.1.patch.txt, HIVE-7977.2.patch.txt, 
 HIVE-7977.3.patch.txt, HIVE-7977.4.patch.txt, HIVE-7977.5.patch.txt, 
 HIVE-7977.6.patch.txt


 Currently, FetchTask creates SerDe instance thrice for each partition, which 
 can be avoided if it's same with table SerDe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true


[ 
https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250098#comment-14250098
 ] 

Hive QA commented on HIVE-9120:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687697/HIVE-9120.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2110/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2110/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2110/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687697 - PreCommit-HIVE-TRUNK-Build

 Hive Query log does not work when hive.exec.parallel is true
 

 Key: HIVE-9120
 URL: https://issues.apache.org/jira/browse/HIVE-9120
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Logging
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-9120.patch


 When hive.exec.parallel is true, the query log is not saved and Beeline can 
 not retrieve it.
 When parallel, Driver.launchTask() may run the task in a new thread if other 
 conditions are also on. TaskRunner.start() is invoked instead of 
 TaskRunner.runSequential(). This cause the threadlocal variable OperationLog 
 to be null and query logs are not logged.
 The OperationLog object should be set in the new thread in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-17 Thread Kamil Gorlo (JIRA)

Kamil Gorlo created HIVE-9146:
-

 Summary: Query with left joins produces wrong result when join 
condition is written in different order
 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Kamil Gorlo


I have two queries which should be equal (I only swap two join conditions) but 
they are not. They are simplest queries I could produce to reproduce bug.

I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}

I get result (which is correct):
| log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
| 1   | 2| 1  | 1  |
| 1   | 3| 1  | NULL   |
| 1   | 5| NULL   | NULL   |
| 3   | 1| NULL   | 1  |

But when I run second query (query no. 2):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
{quote}

I got different (and bad, in my opinion) result:
|log.id | log.dest_id | com1.msgs | com2.msgs|
|1|2|1|1|
|1|3|1|1|
|1|5|NULL|NULL|
|3|1|NULL|NULL|

Query no. 1 and query no. 2 are different in only one place, it is second join 
condition:
bf. com2.dest_id=log.id and com2.id=log.dest_id
vs
bf. com2.id=log.dest_id and com2.dest_id=log.id

which in my opinion are equal.

Explains for both queries are of course slightly different (columns are 
swapped) and they are here:

https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-17 Thread Kamil Gorlo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9146:
--
Description: 
I have two queries which should be equal (I only swap two join conditions) but 
they are not. They are simplest queries I could produce to reproduce bug.

I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}

I get result (which is correct):
| log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
| 1   | 2| 1  | 1  |
| 1   | 3| 1  | NULL   |
| 1   | 5| NULL   | NULL   |
| 3   | 1| NULL   | 1  |

But when I run second query (query no. 2):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
{quote}

I get different (and bad, in my opinion) result:
|log.id | log.dest_id | com1.msgs | com2.msgs|
|1|2|1|1|
|1|3|1|1|
|1|5|NULL|NULL|
|3|1|NULL|NULL|

Query no. 1 and query no. 2 are different in only one place, it is second join 
condition:
bf. com2.dest_id=log.id and com2.id=log.dest_id
vs
bf. com2.id=log.dest_id and com2.dest_id=log.id

which in my opinion are equal.

Explains for both queries are of course slightly different (columns are 
swapped) and they are here:

https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)

  was:
I have two queries which should be equal (I only swap two join conditions) but 
they are not. They are simplest queries I could produce to reproduce bug.

I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}

I get result (which is correct):
| log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
| 1   | 2| 1  | 1  |
| 1   | 3| 1  | NULL   |
| 1   | 5| NULL   | NULL   |
| 3   | 1|

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250104#comment-14250104
 ] 

Brock Noland commented on HIVE-9127:


Thank you Xuefu!

bq. Please modify the query if the patch is going to apply to trunk.

I don't follow? The latest patch applies to trunk and was tested on trunk.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at

[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9127:
---
Affects Version/s: (was: spark-branch)
   0.14.0

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl

[jira] [Assigned] (HIVE-9136) Profile query compiler [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-9136:
--

Assignee: Chao

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao

 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true


[ 
https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250128#comment-14250128
 ] 

Brock Noland commented on HIVE-9120:


+1, thank you [~dongc]!

 Hive Query log does not work when hive.exec.parallel is true
 

 Key: HIVE-9120
 URL: https://issues.apache.org/jira/browse/HIVE-9120
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Logging
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-9120.patch


 When hive.exec.parallel is true, the query log is not saved and Beeline can 
 not retrieve it.
 When parallel, Driver.launchTask() may run the task in a new thread if other 
 conditions are also on. TaskRunner.start() is invoked instead of 
 TaskRunner.runSequential(). This cause the threadlocal variable OperationLog 
 to be null and query logs are not logged.
 The OperationLog object should be set in the new thread in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order


[ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250136#comment-14250136
 ] 

Ashutosh Chauhan commented on HIVE-9146:


you might be hitting into HIVE-8298 can you test your queries on Hive 0.14 and 
post your findings here.

 Query with left joins produces wrong result when join condition is written in 
 different order
 -

 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Kamil Gorlo

 I have two queries which should be equal (I only swap two join conditions) 
 but they are not. They are simplest queries I could produce to reproduce bug.
 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 And when I run this query (query no. 1):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
 {quote}
 I get result (which is correct):
 | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
 | 1   | 2| 1  | 1  |
 | 1   | 3| 1  | NULL   |
 | 1   | 5| NULL   | NULL   |
 | 3   | 1| NULL   | 1  |
 But when I run second query (query no. 2):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
 {quote}
 I get different (and bad, in my opinion) result:
 |log.id | log.dest_id | com1.msgs | com2.msgs|
 |1|2|1|1|
 |1|3|1|1|
 |1|5|NULL|NULL|
 |3|1|NULL|NULL|
 Query no. 1 and query no. 2 are different in only one place, it is second 
 join condition:
 bf. com2.dest_id=log.id and com2.id=log.dest_id
 vs
 bf. com2.id=log.dest_id and com2.dest_id=log.id
 which in my opinion are equal.
 Explains for both queries are of course slightly different (columns are 
 swapped) and they are here:
 https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
 https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9074) add ability to force direct sql usage for perf reasons


 [ 
https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9074:
---
Attachment: HIVE-9074.patch

 add ability to force direct sql usage for perf reasons
 --

 Key: HIVE-9074
 URL: https://issues.apache.org/jira/browse/HIVE-9074
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
 Attachments: HIVE-9074.patch


 Some people run direct SQL and hit failures (e.g. due to Oracle 
 1000-in-expressions stupidity, illegal cast optimization in Derby and 
 Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for 
 such cases, however that can have huge impact on perf, and some people would 
 rather have it fail so they can see the problem.
 In addition to off and on+fallback modes, on or fail mode needs to be 
 added. The default will remain the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9074) add ability to force direct sql usage for perf reasons


 [ 
https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9074:
---
Status: Patch Available  (was: Open)

 add ability to force direct sql usage for perf reasons
 --

 Key: HIVE-9074
 URL: https://issues.apache.org/jira/browse/HIVE-9074
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
 Attachments: HIVE-9074.patch


 Some people run direct SQL and hit failures (e.g. due to Oracle 
 1000-in-expressions stupidity, illegal cast optimization in Derby and 
 Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for 
 such cases, however that can have huge impact on perf, and some people would 
 rather have it fail so they can see the problem.
 In addition to off and on+fallback modes, on or fail mode needs to be 
 added. The default will remain the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9074) add ability to force direct sql usage for perf reasons


[ 
https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250161#comment-14250161
 ] 

Sergey Shelukhin commented on HIVE-9074:


[~ashutoshc] can you review? Thanks

 add ability to force direct sql usage for perf reasons
 --

 Key: HIVE-9074
 URL: https://issues.apache.org/jira/browse/HIVE-9074
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
 Attachments: HIVE-9074.patch


 Some people run direct SQL and hit failures (e.g. due to Oracle 
 1000-in-expressions stupidity, illegal cast optimization in Derby and 
 Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for 
 such cases, however that can have huge impact on perf, and some people would 
 rather have it fail so they can see the problem.
 In addition to off and on+fallback modes, on or fail mode needs to be 
 added. The default will remain the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9140) Add ReduceExpressionRules from Calcite into Hive


[ 
https://issues.apache.org/jira/browse/HIVE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250183#comment-14250183
 ] 

Ashutosh Chauhan commented on HIVE-9140:


[~jpullokkaran] Can you take a look at this one? I initially thought its better 
to have it in applyPreCBOTransformations() but than I realized that our join 
ordering algorithm leaves {where true} predicates while optimizing tree. Since, 
it will be good to remove such predicates, I have added these rules alongwith 
join ordering rules.  Let me know what do you think.

 Add ReduceExpressionRules from Calcite into Hive
 

 Key: HIVE-9140
 URL: https://issues.apache.org/jira/browse/HIVE-9140
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9140.patch


 These rules provide a form of constant folding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9140) Add ReduceExpressionRules from Calcite into Hive


[ 
https://issues.apache.org/jira/browse/HIVE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250183#comment-14250183
 ] 

Ashutosh Chauhan edited comment on HIVE-9140 at 12/17/14 5:39 PM:
--

[~jpullokkaran] Can you take a look at this one? I initially thought its better 
to have it in applyPreCBOTransformations() but than I realized that our join 
ordering algorithm leaves {{where true}} predicates while optimizing tree. 
Since, it will be good to remove such predicates, I have added these rules 
alongwith join ordering rules.  Let me know what do you think.


was (Author: ashutoshc):
[~jpullokkaran] Can you take a look at this one? I initially thought its better 
to have it in applyPreCBOTransformations() but than I realized that our join 
ordering algorithm leaves {where true} predicates while optimizing tree. Since, 
it will be good to remove such predicates, I have added these rules alongwith 
join ordering rules.  Let me know what do you think.

 Add ReduceExpressionRules from Calcite into Hive
 

 Key: HIVE-9140
 URL: https://issues.apache.org/jira/browse/HIVE-9140
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9140.patch


 These rules provide a form of constant folding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28933: HIVE-8131:Support timestamp in Avro

2014-12-17 Thread Ryan Blue


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28933/#review65334
---

Ship it!


Ship It!

- Ryan Blue


On Dec. 15, 2014, 7:40 p.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28933/
 ---
 
 (Updated Dec. 15, 2014, 7:40 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The patch includes:
 1.add timestamp support for AvroSerde 
 2.add related test cases
 
 
 Diffs
 -
 
   data/files/avro_timestamp.txt PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_timestamp.q PRE-CREATION 
   ql/src/test/results/clientpositive/avro_timestamp.q.out PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
 07c5ecf 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 7639a2b 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java 
 c8eac89 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java 
 c84b1a0 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
 8cb2dc3 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
 cd5a0fa 
 
 Diff: https://reviews.apache.org/r/28933/diff/
 
 
 Testing
 ---
 
 Test passed for added cases
 
 
 Thanks,
 
 cheng xu

[jira] [Commented] (HIVE-8988) Support advanced aggregation in Hive to Calcite path


[ 
https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250226#comment-14250226
 ] 

Hive QA commented on HIVE-8988:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687709/HIVE-8988.04.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 6713 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_id2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_grouping_operators
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2111/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2111/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2111/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687709 - PreCommit-HIVE-TRUNK-Build

 Support advanced aggregation in Hive to Calcite path 
 -

 Key: HIVE-8988
 URL: https://issues.apache.org/jira/browse/HIVE-8988
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
  Labels: grouping, logical, optiq
 Fix For: 0.15.0

 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, 
 HIVE-8988.03.patch, HIVE-8988.04.patch, HIVE-8988.patch


 CLEAR LIBRARY CACHE
 To close the gap between Hive and Calcite, we need to support the translation 
 of GroupingSets into Calcite; currently this is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250251#comment-14250251
 ] 

Hive QA commented on HIVE-8843:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687754/HIVE-8843.3-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-564/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687754 - PreCommit-HIVE-SPARK-Build

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

What's the status of AccessServer?

2014-12-17 Thread Nick Dimiduk

Hi folks,

I'm looking for ways to expose Apache Phoenix [0] to a wider audience. One
potential way to do that is to follow in the Hive footsteps with a HS2
protocol-compatible service. I've done some prototyping along these lines
and see that it's quite feasible. Along the way I came across this proposal
for refactoring HS2 into the AccessServer [1].

What's the state of the AccessServer project? Is anyone working on it? Is
there a relationship between this effort and Calcite's Avatica [2]? The
system proposed in the AccessServer doc seems to fit nicely in line with
Calcite's objectives.

Thanks,
Nick

[0]: http://phoenix.apache.org
[1]:
https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal
[2]:
http://mail-archives.apache.org/mod_mbox/calcite-dev/201412.mbox/%3CCAMCtme%2BpVsVYP%2B-J1jDPk-fNCtAHj3f0eXif_hUG_Xy81Ufxsw%40mail.gmail.com%3E

Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-17 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29145/#review65348
---

Ship it!


+1 to Xuefu's comments. The config name also looks very generic, since it's 
only applied to a couple of jobs submitted to the client. But I don't have a 
good suggestion here.

- Marcelo Vanzin


On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29145/
 ---
 
 (Updated Dec. 17, 2014, 6:28 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9094
 https://issues.apache.org/jira/browse/HIVE-9094
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has 
 not launched yet
 1. set the timeout value configurable.
 2. set default timeout value 60s.
 3. enable timeout for get spark job info and get spark stage info.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 5d6a02c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 e1946d5 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
  6217de4 
 
 Diff: https://reviews.apache.org/r/29145/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250325#comment-14250325
 ] 

Jimmy Xiang commented on HIVE-9127:
---

In looking into HIVE-9135, I was wondering if it is better to fix the root 
cause of HIVE-7431 instead disabling the cache for Spark. If so, probably we 
don't need this work around?

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -

[jira] [Updated] (HIVE-9053) select constant in union all followed by group by gives wrong result


 [ 
https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9053:
--
Attachment: HIVE-9053.04.patch-013

 select constant in union all followed by group by gives wrong result
 

 Key: HIVE-9053
 URL: https://issues.apache.org/jira/browse/HIVE-9053
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 0.15.0

 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, 
 HIVE-9053.03.patch, HIVE-9053.04.patch, HIVE-9053.04.patch-013


 Here is the the way to reproduce with q test:
 select key from (select '1' as key from src union all select key from src)tab 
 group by key;
 will give
 OK
 NULL
 1
 This is not correct as src contains many other keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9053) select constant in union all followed by group by gives wrong result


 [ 
https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9053:
--
Affects Version/s: 0.13.0
   0.14.0

 select constant in union all followed by group by gives wrong result
 

 Key: HIVE-9053
 URL: https://issues.apache.org/jira/browse/HIVE-9053
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 0.15.0

 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, 
 HIVE-9053.03.patch, HIVE-9053.04.patch, HIVE-9053.04.patch-013


 Here is the the way to reproduce with q test:
 select key from (select '1' as key from src union all select key from src)tab 
 group by key;
 will give
 OK
 NULL
 1
 This is not correct as src contains many other keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9074) add ability to force direct sql usage for perf reasons

[
https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250345#comment-14250345
]

Ashutosh Chauhan commented on HIVE-9074:

Couple of comments:
* You can use validator in HiveConf via ConfVars(String varname, Object
defaultVal, Validator validator, String description) constructor to force
allowed values for a particular config. This will allow you to get rid of
{{isConfigEnabled}} variable and thus simplify a logic bit there.
* This throws exception as soon as datastore is found to be incompatible. If
direct sql query is indeed fired but than fails while executing against
datastore, we still catch that exception and than falls back to ORM. This patch
is not intended to capture that code path, is it?

add ability to force direct sql usage for perf reasons
--

Key: HIVE-9074
URL: https://issues.apache.org/jira/browse/HIVE-9074
Project: Hive
Issue Type: Bug
Reporter: Sergey Shelukhin
Attachments: HIVE-9074.patch

Some people run direct SQL and hit failures (e.g. due to Oracle
1000-in-expressions stupidity, illegal cast optimization in Derby and
Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for
such cases, however that can have huge impact on perf, and some people would
rather have it fail so they can see the problem.
In addition to off and on+fallback modes, on or fail mode needs to be
added. The default will remain the same.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9053) select constant in union all followed by group by gives wrong result


[ 
https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250347#comment-14250347
 ] 

Pengcheng Xiong commented on HIVE-9053:
---

[~prasanth_j], could you please review and help me commit patch-013 to hive 
0.13 branch? Thanks!

 select constant in union all followed by group by gives wrong result
 

 Key: HIVE-9053
 URL: https://issues.apache.org/jira/browse/HIVE-9053
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 0.15.0

 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, 
 HIVE-9053.03.patch, HIVE-9053.04.patch, HIVE-9053.04.patch-013


 Here is the the way to reproduce with q test:
 select key from (select '1' as key from src union all select key from src)tab 
 group by key;
 will give
 OK
 NULL
 1
 This is not correct as src contains many other keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]

2014-12-17 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250353#comment-14250353
]

Marcelo Vanzin commented on HIVE-8972:
--

The patch looks ok to me.

I though about creating a separate API for these kinds of RPCs - these wouldn't
be queued in the backend but executed right away. My only concern is that this
could be abused (e.g. a caller using these calls to run a Spark job before the
queue ones), but perhaps that's an app-level concern and the client shouldn't
care if someone uses it that way.

The netty framework we're using now could also make some things easier, like
adding listeners to JobHandle and reporting job state changes to the client
side when they happen (instead of the current poll-like approach?). We could
also add client-level listeners so that interesting events are reported (e.g.
spark context up and things like that). If there's interest in these things
we could create a new task and I'll try to find some time to work on it.

Implement more fine-grained remote client-level events [Spark Branch]
-

Key: HIVE-8972
URL: https://issues.apache.org/jira/browse/HIVE-8972
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch,
HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch,
HIVE-8972.5-spark.patch

Follow up task of HIVE-8956.
Fine-grained events are useful for better job monitor and failure handling.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9074) add ability to force direct sql usage for perf reasons


[ 
https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250365#comment-14250365
 ] 

Hive QA commented on HIVE-9074:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687772/HIVE-9074.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2112/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2112/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2112/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687772 - PreCommit-HIVE-TRUNK-Build

 add ability to force direct sql usage for perf reasons
 --

 Key: HIVE-9074
 URL: https://issues.apache.org/jira/browse/HIVE-9074
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
 Attachments: HIVE-9074.patch


 Some people run direct SQL and hit failures (e.g. due to Oracle 
 1000-in-expressions stupidity, illegal cast optimization in Derby and 
 Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for 
 such cases, however that can have huge impact on perf, and some people would 
 rather have it fail so they can see the problem.
 In addition to off and on+fallback modes, on or fail mode needs to be 
 added. The default will remain the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9147) Add unit test for HIVE-7323

Peter Slawski created HIVE-9147:
---

 Summary: Add unit test for HIVE-7323
 Key: HIVE-9147
 URL: https://issues.apache.org/jira/browse/HIVE-9147
 Project: Hive
  Issue Type: Test
  Components: Statistics
Affects Versions: 0.13.1, 0.14.0
Reporter: Peter Slawski
Priority: Minor


This unit test verifies that DateStatisticImpl doesn't store mutable objects 
from callers for minimum and maximum values. This ensures callers cannot modify 
the internal minimum and maximum values outside of DateStatisticImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250373#comment-14250373
 ] 

Brock Noland commented on HIVE-9127:


bq. In looking into HIVE-9135, I was wondering if it is better to fix the root 
cause of HIVE-7431 instead disabling the cache for Spark.

I think that would be awesome. I think we disabled it early on when we were 
just trying to get HOS working.

bq. If so, probably we don't need this work around?

I think this work around results in better code generally. In 
CombineHiveInputFormat we were looking up the partition information on each 
loop iteration but with this fix we do it once before the loop, which is 
generally better.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl

[jira] [Updated] (HIVE-9147) Add unit test for HIVE-7323


 [ 
https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-9147:

Attachment: HIVE-9147.1.patch

 Add unit test for HIVE-7323
 ---

 Key: HIVE-9147
 URL: https://issues.apache.org/jira/browse/HIVE-9147
 Project: Hive
  Issue Type: Test
  Components: Statistics
Affects Versions: 0.14.0, 0.13.1
Reporter: Peter Slawski
Priority: Minor
 Attachments: HIVE-9147.1.patch


 This unit test verifies that DateStatisticImpl doesn't store mutable objects 
 from callers for minimum and maximum values. This ensures callers cannot 
 modify the internal minimum and maximum values outside of DateStatisticImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9147) Add unit test for HIVE-7323


 [ 
https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-9147:

Fix Version/s: 0.15.0
   Status: Patch Available  (was: Open)

Attached patch for unit test.

 Add unit test for HIVE-7323
 ---

 Key: HIVE-9147
 URL: https://issues.apache.org/jira/browse/HIVE-9147
 Project: Hive
  Issue Type: Test
  Components: Statistics
Affects Versions: 0.13.1, 0.14.0
Reporter: Peter Slawski
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9147.1.patch


 This unit test verifies that DateStatisticImpl doesn't store mutable objects 
 from callers for minimum and maximum values. This ensures callers cannot 
 modify the internal minimum and maximum values outside of DateStatisticImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250383#comment-14250383
 ] 

Jimmy Xiang commented on HIVE-9127:
---

bq. I think this work around results in better code generally.
Agreed. Thanks.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250389#comment-14250389
 ] 

Xuefu Zhang commented on HIVE-9127:
---

{quote}
Please modify the query if the patch is going to apply to trunk.
{quote}
My bad. I meant to say modify the JIRA, but now I see again and it seems 
alright except for a Spark component, which probably doesn't matter.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl

[jira] [Created] (HIVE-9148) Fix default value for HWI_WAR_FILE

Peter Slawski created HIVE-9148:
---

 Summary: Fix default value for HWI_WAR_FILE
 Key: HIVE-9148
 URL: https://issues.apache.org/jira/browse/HIVE-9148
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.13.1, 0.14.0
Reporter: Peter Slawski
Priority: Minor


The path to the hwi war file should be relative to hive home. However, 
HWI_WAR_FILE is set in hwi.sh to be an absolute path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9148) Fix default value for HWI_WAR_FILE


 [ 
https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-9148:

Attachment: HIVE-9148.1.patch

 Fix default value for HWI_WAR_FILE
 --

 Key: HIVE-9148
 URL: https://issues.apache.org/jira/browse/HIVE-9148
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.14.0, 0.13.1
Reporter: Peter Slawski
Priority: Minor
 Attachments: HIVE-9148.1.patch


 The path to the hwi war file should be relative to hive home. However, 
 HWI_WAR_FILE is set in hwi.sh to be an absolute path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9148) Fix default value for HWI_WAR_FILE


 [ 
https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-9148:

Fix Version/s: 0.15.0
   Status: Patch Available  (was: Open)

 Fix default value for HWI_WAR_FILE
 --

 Key: HIVE-9148
 URL: https://issues.apache.org/jira/browse/HIVE-9148
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.13.1, 0.14.0
Reporter: Peter Slawski
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9148.1.patch


 The path to the hwi war file should be relative to hive home. However, 
 HWI_WAR_FILE is set in hwi.sh to be an absolute path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Native ORC

2014-12-17 Thread Prasanth Jayachandran

I also feel in the long run it would be good to have ORC as a separate project 
(ASF) independent of hive. Having it in hive will only make it harder to 
modularize (SARG, vectorized readers, etc.). 


- Prasanth

On Mon, Dec 15, 2014 at 5:27 PM, Thejas Nair the...@hortonworks.com
wrote:

 IMO, in the long run, having ORC as a separate project makes a lot of
 sense, as it is used in many places outside of hive.
 On Mon, Dec 15, 2014 at 2:44 PM, Owen O'Malley omal...@apache.org wrote:
 All,
We are working on a native (aka C++) ORC reader and writer. For now we
 are working on it over at my old github - https://github.com/hortonworks/orc
 . First of all, I wanted to let everyone know it is happening and invite
 others to give feedback or help. You can see the API for the reader in the
 src/orc directory.

   It leads to an interesting question. I'd like to contribute it back to
 Hive to keep the two implementations (Java and C++) together, but that
 would mean adding an optional C++ module to Hive, which is currently all
 Java. The other option is to take the native ORC reader to Apache incubator
 as a new project and eventually pull the Java one along with it.

  I'm very interested in the Hive development communities opinion.

 Thanks,
Owen
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9127:
---
Component/s: (was: Spark)

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at

[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250496#comment-14250496
 ] 

Jimmy Xiang commented on HIVE-8843:
---

These failures are not related to the patch.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250522#comment-14250522
 ] 

Xuefu Zhang commented on HIVE-8972:
---

+1 to the latest patch.

[~vanzin], I think it makes sense to have a separate API for short-lived tasks 
as well as push-based notification for job monitoring. Please feel free to 
create new tasks for those. Thanks.

 Implement more fine-grained remote client-level events [Spark Branch]
 -

 Key: HIVE-8972
 URL: https://issues.apache.org/jira/browse/HIVE-8972
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, 
 HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, 
 HIVE-8972.5-spark.patch


 Follow up task of HIVE-8956.
 Fine-grained events are useful for better job monitor and failure handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8972:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks to Rui, Marcelo, and Chengxiang.

 Implement more fine-grained remote client-level events [Spark Branch]
 -

 Key: HIVE-8972
 URL: https://issues.apache.org/jira/browse/HIVE-8972
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, 
 HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, 
 HIVE-8972.5-spark.patch


 Follow up task of HIVE-8956.
 Fine-grained events are useful for better job monitor and failure handling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8843:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Jimmy.


 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9147) Add unit test for HIVE-7323


[ 
https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250538#comment-14250538
 ] 

Hive QA commented on HIVE-9147:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687801/HIVE-9147.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2113/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2113/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2113/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687801 - PreCommit-HIVE-TRUNK-Build

 Add unit test for HIVE-7323
 ---

 Key: HIVE-9147
 URL: https://issues.apache.org/jira/browse/HIVE-9147
 Project: Hive
  Issue Type: Test
  Components: Statistics
Affects Versions: 0.14.0, 0.13.1
Reporter: Peter Slawski
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9147.1.patch


 This unit test verifies that DateStatisticImpl doesn't store mutable objects 
 from callers for minimum and maximum values. This ensures callers cannot 
 modify the internal minimum and maximum values outside of DateStatisticImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9136:
---
Attachment: HIVE-9136.1.patch

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9136:
---
Status: Patch Available  (was: Open)

Patch v1. I added several spark-specific log events to {{PerfLogger}}. The 
correspondence agains Tez is:

|| In Tez || In Spark ||
| TEZ_SUBMIT_TO_RUNNING | SPARK_SUBMIT_TO_RUNNING |
| TEZ_BUILD_DAG | SPARK_BUILD_PLAN + SPARK_BUILD_RDD_GRAPH|
| TEZ_SUBMIT_DAG | SPARK_SUBMIT_JOB |
| TEZ_RUN_DAG | SPARK_RUN_JOB |
| TEZ_CREATE_VERTEX | SPARK_CREATE_TRAN |
| TEZ_RUN_VERTEX | SPARK_RUN_STAGE |
| TEZ_INIITIALIZE_PROCESSOR | ? |
| TEZ_RUN_PROCESSOR | ? |
| TEZ_INITIALIZE_OPERATORS | SPARK_INITIALIZE_OPERATORS |

For TEZ_INITIALIZE_PROCESSOR and TEZ_RUN_PROCESSOR, I didn't find 
correspondence in our Spark branch. Any idea? Maybe log the 
{{SparkBaseFunctionResultList}}?

In addition, I added SPARK_FLUSH_HASHTABLE, to track perf on Spark hash table 
sink, and SPARK_GENERATE_OPERATOR_TREE, to track perf on, as the name suggests, 
generating operator tree.

I'm also open to any kind of suggestions.



 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250604#comment-14250604
 ] 

Chao commented on HIVE-9136:


Sorry, there's a typo above: it should be TEZ_INITIALIZE_PROCESSOR, not 
TEZ_INIITIALIZE_PROCESSOR.

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9110) Performance of SELECT COUNT(*) FROM STORE SALES WHERE ss_item_sk IS NOT NULL [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-9110:
--

Assignee: Chao

 Performance of SELECT COUNT(*) FROM STORE SALES WHERE ss_item_sk IS NOT NULL 
 [Spark Branch]
 ---

 Key: HIVE-9110
 URL: https://issues.apache.org/jira/browse/HIVE-9110
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao

 The query 
 {noformat}
 SELECT COUNT(*) FROM STORE SALES WHERE ss_item_sk IS NOT NULL
 {noformat}
 could benefit from performance enhancements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by

Jason Dere created HIVE-9149:


 Summary: Add unit test to test implicit conversion during dynamic 
partitioning/distribute by
 Key: HIVE-9149
 URL: https://issues.apache.org/jira/browse/HIVE-9149
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere


This particular case was failing in Hive 0.13 because the string key column 
was not being converted to INT when written to the ORC file, resulting in a 
type cast error when reading data from the table.

HIVE-8151 seems to have fixed this issue, but I would like to add a unit test 
to make sure we don't regress.

{noformat}
create table implicit_cast_during_insert (c1 int, c2 string)
  partitioned by (p1 string) stored as orc;

set hive.exec.dynamic.partition.mode=nonstrict; 

insert overwrite table implicit_cast_during_insert partition (p1)
  select key, value, key key1 from (select * from src where key = 0) q
  distribute by key1 sort by key1;

select * from implicit_cast_during_insert;

drop table implicit_cast_during_insert;
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by


 [ 
https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9149:
-
Status: Patch Available  (was: Open)

 Add unit test to test implicit conversion during dynamic 
 partitioning/distribute by
 ---

 Key: HIVE-9149
 URL: https://issues.apache.org/jira/browse/HIVE-9149
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9149.1.patch


 This particular case was failing in Hive 0.13 because the string key column 
 was not being converted to INT when written to the ORC file, resulting in a 
 type cast error when reading data from the table.
 HIVE-8151 seems to have fixed this issue, but I would like to add a unit test 
 to make sure we don't regress.
 {noformat}
 create table implicit_cast_during_insert (c1 int, c2 string)
   partitioned by (p1 string) stored as orc;
 set hive.exec.dynamic.partition.mode=nonstrict; 
 insert overwrite table implicit_cast_during_insert partition (p1)
   select key, value, key key1 from (select * from src where key = 0) q
   distribute by key1 sort by key1;
 select * from implicit_cast_during_insert;
 drop table implicit_cast_during_insert;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by


 [ 
https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9149:
-
Attachment: HIVE-9149.1.patch

 Add unit test to test implicit conversion during dynamic 
 partitioning/distribute by
 ---

 Key: HIVE-9149
 URL: https://issues.apache.org/jira/browse/HIVE-9149
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9149.1.patch


 This particular case was failing in Hive 0.13 because the string key column 
 was not being converted to INT when written to the ORC file, resulting in a 
 type cast error when reading data from the table.
 HIVE-8151 seems to have fixed this issue, but I would like to add a unit test 
 to make sure we don't regress.
 {noformat}
 create table implicit_cast_during_insert (c1 int, c2 string)
   partitioned by (p1 string) stored as orc;
 set hive.exec.dynamic.partition.mode=nonstrict; 
 insert overwrite table implicit_cast_during_insert partition (p1)
   select key, value, key key1 from (select * from src where key = 0) q
   distribute by key1 sort by key1;
 select * from implicit_cast_during_insert;
 drop table implicit_cast_during_insert;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by

2014-12-17 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250648#comment-14250648
 ] 

Prasanth Jayachandran commented on HIVE-9149:
-

[~jdere] Can you add explain to your test? So that we see the UDF cast.

 Add unit test to test implicit conversion during dynamic 
 partitioning/distribute by
 ---

 Key: HIVE-9149
 URL: https://issues.apache.org/jira/browse/HIVE-9149
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9149.1.patch


 This particular case was failing in Hive 0.13 because the string key column 
 was not being converted to INT when written to the ORC file, resulting in a 
 type cast error when reading data from the table.
 HIVE-8151 seems to have fixed this issue, but I would like to add a unit test 
 to make sure we don't regress.
 {noformat}
 create table implicit_cast_during_insert (c1 int, c2 string)
   partitioned by (p1 string) stored as orc;
 set hive.exec.dynamic.partition.mode=nonstrict; 
 insert overwrite table implicit_cast_during_insert partition (p1)
   select key, value, key key1 from (select * from src where key = 0) q
   distribute by key1 sort by key1;
 select * from implicit_cast_during_insert;
 drop table implicit_cast_during_insert;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-12-17 Thread Szehon Ho (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Szehon Ho updated HIVE-8639:

Attachment: HIVE-8639.2-spark.patch

Address review comments, update some golden files, and fix another issue. The
issue is that if SMBJoin and MapJoin operators are in the same tree, they
trigger some code in SparkReduceSinkMapJoinProc and GenSparkWork that corrupts
the graph. In particular, those processor had assumed that you only visit a
MapJoin op once from a non-RS path (big-table), but this becomes false if the
big-table is a child of SMBJoin, as that itself has multiple non-RS parents.

The additional fix is to make sure we walk down once from SMBJoinOp, only the
big-table path. Thus we skip further walking if it's a small-table, as anyway
no further processing is necessary.

RB is not working for me at the moment, will upload there once it is.

Convert SMBJoin to MapJoin [Spark Branch]
-

Key: HIVE-8639
URL: https://issues.apache.org/jira/browse/HIVE-8639
Project: Hive
Issue Type: Sub-task
Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch

HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are
partitioned, there could be a slow down as each mapper would need to get a
very small chunk of a partition which has a single key. Thus, in some
scenarios it's beneficial to convert SMB join to map join.
The task is to research and support the conversion from SMB join to map join
for Spark execution engine. See the equivalent of MapReduce in
SortMergeJoinResolver.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by


 [ 
https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9149:
-
Attachment: HIVE-9149.2.patch

Good point - in fact UDFToInteger was not being called because a constant 0 was 
being used in place of key due to optimization from (where key = 0). I've 
changed the query slightly and added the explain.

 Add unit test to test implicit conversion during dynamic 
 partitioning/distribute by
 ---

 Key: HIVE-9149
 URL: https://issues.apache.org/jira/browse/HIVE-9149
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9149.1.patch, HIVE-9149.2.patch


 This particular case was failing in Hive 0.13 because the string key column 
 was not being converted to INT when written to the ORC file, resulting in a 
 type cast error when reading data from the table.
 HIVE-8151 seems to have fixed this issue, but I would like to add a unit test 
 to make sure we don't regress.
 {noformat}
 create table implicit_cast_during_insert (c1 int, c2 string)
   partitioned by (p1 string) stored as orc;
 set hive.exec.dynamic.partition.mode=nonstrict; 
 insert overwrite table implicit_cast_during_insert partition (p1)
   select key, value, key key1 from (select * from src where key = 0) q
   distribute by key1 sort by key1;
 select * from implicit_cast_during_insert;
 drop table implicit_cast_during_insert;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9148) Fix default value for HWI_WAR_FILE


[ 
https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250676#comment-14250676
 ] 

Hive QA commented on HIVE-9148:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687811/HIVE-9148.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6713 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2114/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2114/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2114/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687811 - PreCommit-HIVE-TRUNK-Build

 Fix default value for HWI_WAR_FILE
 --

 Key: HIVE-9148
 URL: https://issues.apache.org/jira/browse/HIVE-9148
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.14.0, 0.13.1
Reporter: Peter Slawski
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9148.1.patch


 The path to the hwi war file should be relative to hive home. However, 
 HWI_WAR_FILE is set in hwi.sh to be an absolute path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250678#comment-14250678
 ] 

Hive QA commented on HIVE-9136:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687827/HIVE-9136.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2115/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2115/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'bin/ext/hwi.sh'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/core/target 
hcatalog/streaming/target hcatalog/server-extensions/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1646347.

At revision 1646347.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687827 - PreCommit-HIVE-TRUNK-Build

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-12-17 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8639:

Attachment: HIVE-8639.3-spark.patch

Fix some import statements.

 Convert SMBJoin to MapJoin [Spark Branch]
 -

 Key: HIVE-8639
 URL: https://issues.apache.org/jira/browse/HIVE-8639
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, 
 HIVE-8639.3-spark.patch


 HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
 partitioned, there could be a slow down as each mapper would need to get a 
 very small chunk of a partition which has a single key. Thus, in some 
 scenarios it's beneficial to convert SMB join to map join.
 The task is to research and support the conversion from SMB join to map join 
 for Spark execution engine.  See the equivalent of MapReduce in 
 SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9150) Unrelated types are compared in GenTezWork#getFollowingWorkIndex()

2014-12-17 Thread Ted Yu (JIRA)

Ted Yu created HIVE-9150:


 Summary: Unrelated types are compared in 
GenTezWork#getFollowingWorkIndex()
 Key: HIVE-9150
 URL: https://issues.apache.org/jira/browse/HIVE-9150
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is the code:
{code}
  if (tezWork.getEdgeProperty(unionWork, 
baseWork).equals(TezEdgeProperty.EdgeType.CONTAINS)) {
{code}
getEdgeProperty() returns TezEdgeProperty which is compared with 
TezEdgeProperty$EdgeType



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9053) select constant in union all followed by group by gives wrong result


 [ 
https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9053:
--
Attachment: (was: HIVE-9053.04.patch-013)

 select constant in union all followed by group by gives wrong result
 

 Key: HIVE-9053
 URL: https://issues.apache.org/jira/browse/HIVE-9053
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 0.15.0

 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, 
 HIVE-9053.03.patch, HIVE-9053.04.patch


 Here is the the way to reproduce with q test:
 select key from (select '1' as key from src union all select key from src)tab 
 group by key;
 will give
 OK
 NULL
 1
 This is not correct as src contains many other keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9151) Checking s against null in TezJobMonitor#getNameWithProgress() should be done earlier

2014-12-17 Thread Ted Yu (JIRA)

Ted Yu created HIVE-9151:


 Summary: Checking s against null in 
TezJobMonitor#getNameWithProgress() should be done earlier
 Key: HIVE-9151
 URL: https://issues.apache.org/jira/browse/HIVE-9151
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
int spaceRemaining = COLUMN_1_WIDTH - s.length() - 1;
String trimmedVName = s;

// if the vertex name is longer than column 1 width, trim it down
// Tez Merge File Work will become Tez Merge File..
if (s != null  s.length()  COLUMN_1_WIDTH) {
{code}
s is dereferenced first, rendering the null check ineffective.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-9136) Profile query compiler [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9136:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687827/HIVE-9136.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2115/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2115/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'bin/ext/hwi.sh'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/core/target 
hcatalog/streaming/target hcatalog/server-extensions/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1646347.

At revision 1646347.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687827 - PreCommit-HIVE-TRUNK-Build)

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250696#comment-14250696
 ] 

Brock Noland commented on HIVE-9136:


Looks like the patch is named for trunk..

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250703#comment-14250703
 ] 

Chao commented on HIVE-9136:


Yes... my mistake.

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9136:
---
Attachment: HIVE-9136.1-spark.patch

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]

Brock Noland created HIVE-9152:
--

 Summary: Dynamic Partition Pruning [Spark Branch]
 Key: HIVE-9152
 URL: https://issues.apache.org/jira/browse/HIVE-9152
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland


Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
optimization and we should implement the same in HOS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

Brock Noland created HIVE-9153:
--

 Summary: Evaluate CombineHiveInputFormat versus HiveInputFormat 
[Spark Branch]
 Key: HIVE-9153
 URL: https://issues.apache.org/jira/browse/HIVE-9153
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland


The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
Spark, it might make sense for us to use {{HiveInputFormat}} as well. We should 
evaluate this on a query which has many input splits such as {{select count(*) 
from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]