[jira] [Commented] (HIVE-9334) PredicateTransitivePropagate optimizer should run after PredicatePushDown

2015-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272859#comment-14272859
 ] 

Hive QA commented on HIVE-9334:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691521/HIVE-9334.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6747 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2332/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2332/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2332/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691521 - PreCommit-HIVE-TRUNK-Build

 PredicateTransitivePropagate optimizer should run after PredicatePushDown
 -

 Key: HIVE-9334
 URL: https://issues.apache.org/jira/browse/HIVE-9334
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9334.1.patch, HIVE-9334.patch


 This way PredicateTransitivePropagate will be more effective as it has more 
 filters to push for other branches of joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2015-01-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272855#comment-14272855
 ] 

Lefty Leverenz commented on HIVE-7209:
--

Doc:  *hive.security.metastore.authorization.manager* has been updated in the 
wiki, so I'm removing the TODOC14 label.  (Additional documentation will be 
covered with HIVE-7759, as mentioned two comments back.)

* [Configuration Properties -- hive.security.metastore.authorization.manager | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.security.metastore.authorization.manager]

 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0

 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch, 
 HIVE-7209.4.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2015-01-11 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7209:
-
Labels:   (was: TODOC14)

 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0

 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch, 
 HIVE-7209.4.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7397) Set the default threshold for fetch task conversion to 1Gb

2015-01-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272858#comment-14272858
 ] 

Lefty Leverenz commented on HIVE-7397:
--

Doc:  The defaults for *hive.fetch.task.conversion* and 
*hive.fetch.task.conversion.threshold* have been updated in the wiki, so I'm 
removing the TODOC14 label.

* [hive.fetch.task.conversion | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.fetch.task.conversion]
* [hive.fetch.task.conversion.threshold | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.fetch.task.conversion.threshold]

 Set the default threshold for fetch task conversion to 1Gb
 --

 Key: HIVE-7397
 URL: https://issues.apache.org/jira/browse/HIVE-7397
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Fix For: 0.14.0

 Attachments: HIVE-7397.1.patch, HIVE-7397.2.patch, HIVE-7397.3.patch, 
 HIVE-7397.4.patch.txt, HIVE-7397.5.patch, HIVE-7397.6.patch.txt


 Currently, modifying the value of hive.fetch.task.conversion to more 
 results in a dangerous setting where small scale queries function, but large 
 scale queries crash.
 This occurs because the default threshold of -1 means apply this optimization 
 for a petabyte table.
 I am testing a variety of queries with the setting more (to make it the 
 default option as suggested by HIVE-887) change the default threshold for 
 this feature to a reasonable 1Gb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7397) Set the default threshold for fetch task conversion to 1Gb

2015-01-11 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7397:
-
Labels: Performance  (was: Performance TODOC14)

 Set the default threshold for fetch task conversion to 1Gb
 --

 Key: HIVE-7397
 URL: https://issues.apache.org/jira/browse/HIVE-7397
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance
 Fix For: 0.14.0

 Attachments: HIVE-7397.1.patch, HIVE-7397.2.patch, HIVE-7397.3.patch, 
 HIVE-7397.4.patch.txt, HIVE-7397.5.patch, HIVE-7397.6.patch.txt


 Currently, modifying the value of hive.fetch.task.conversion to more 
 results in a dangerous setting where small scale queries function, but large 
 scale queries crash.
 This occurs because the default threshold of -1 means apply this optimization 
 for a petabyte table.
 I am testing a variety of queries with the setting more (to make it the 
 default option as suggested by HIVE-887) change the default threshold for 
 this feature to a reasonable 1Gb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Open  (was: Patch Available)

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, HIVE-9039.12.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: HIVE-9039.13.patch

update golden files, e.g., union3

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, 
 HIVE-9039.12.patch, HIVE-9039.13.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Patch Available  (was: Open)

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, 
 HIVE-9039.12.patch, HIVE-9039.13.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7674) Update to Spark 1.2 [Spark Branch]

2015-01-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273061#comment-14273061
 ] 

Brock Noland commented on HIVE-7674:


Thank you! Since I committed this I updated the wiki.

 Update to Spark 1.2 [Spark Branch]
 --

 Key: HIVE-7674
 URL: https://issues.apache.org/jira/browse/HIVE-7674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Blocker
  Labels: TODOC-SPARK
 Fix For: spark-branch

 Attachments: HIVE-7674.1-spark.patch, HIVE-7674.2-spark.patch, 
 HIVE-7674.3-spark.patch


 In HIVE-8160 we added a custom repo to use Spark 1.2. Once 1.2 is released we 
 need to remove this repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 28964: HIVE-8121 Create micro-benchmarks for ParquetSerde and evaluate performance

2015-01-11 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28964/#review67616
---


Nice work Sergio!!

I know that it doesn't fit perfectly into the JMH model but I think we have to 
write a non-trival amount of records such as 1000 rows in order to get much 
benefit. Can we try that?


itests/hive-jmh/pom.xml
https://reviews.apache.org/r/28964/#comment111684

It looks like in this file 1 tab = 4 spaces whereas in Hive I think we 
typically say 1 tab = 2 spaces



itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
https://reviews.apache.org/r/28964/#comment111683

During class initialization let's create an array of 100 random values for 
each type and then we can iterate through that array for each call to this 
method.

Otherwise columnar formals will lead to unrealistic comppression for 
storing the same values over and over. For example both parquet and orc should 
be able to collapse a column consist of the integer 1 to a trivial amount of 
data.


- Brock Noland


On Jan. 9, 2015, 6:38 p.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28964/
 ---
 
 (Updated Jan. 9, 2015, 6:38 p.m.)
 
 
 Review request for hive, Brock Noland and cheng xu.
 
 
 Bugs: HIVE-8121
 https://issues.apache.org/jira/browse/HIVE-8121
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a new tool used to test ORC  PARQUET file format performance.
 
 
 Diffs
 -
 
   itests/hive-jmh/pom.xml PRE-CREATION 
   
 itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
  PRE-CREATION 
   itests/pom.xml 0a154d6eb8c119e4e6419777c28b59b9d2108ba0 
 
 Diff: https://reviews.apache.org/r/28964/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergio Pena
 




[jira] [Commented] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273075#comment-14273075
 ] 

Hive QA commented on HIVE-9039:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691565/HIVE-9039.13.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6753 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2333/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2333/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2333/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691565 - PreCommit-HIVE-TRUNK-Build

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, 
 HIVE-9039.12.patch, HIVE-9039.13.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9337) Move more hive.spark.* configurations to HiveConf

2015-01-11 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-9337:
---

 Summary: Move more hive.spark.* configurations to HiveConf
 Key: HIVE-9337
 URL: https://issues.apache.org/jira/browse/HIVE-9337
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Szehon Ho


Some hive.spark configurations have been added to HiveConf, but there are some 
like hive.spark.log.dir that are not there.

Also some configurations in RpcConfiguration.java might be eligible to be moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9335) Address review items on HIVE-9257 [Spark Branch]

2015-01-11 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9335:
---
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you Xuefu! I have committed the patch to spark.

 Address review items on HIVE-9257 [Spark Branch]
 

 Key: HIVE-9335
 URL: https://issues.apache.org/jira/browse/HIVE-9335
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-9335.1-spark.patch, HIVE-9335.2-spark.patch


 I made a pass through HIVE-9257 and found the following issues: 
 {{HashTableSinkOperator.java}}
 The fields EMPTY_OBJECT_ARRAY and EMPTY_ROW_CONTAINER are no longer constants 
 and should not be in upper case.
 {{HivePairFlatMapFunction.java}}
 We share NumberFormat accross threads and it's not thread safe.
 {{KryoSerializer.java}}
 we eat the stack trace in deserializeJobConf
 {{SparkMapRecordHandler}}
 in processRow we should not be using {{StringUtils.stringifyException}} since 
 LOG can handle stack traces.
 in close:
 {noformat}
 // signal new failure to map-reduce
 LOG.error(Hit error while closing operators - failing tree);
 throw new IllegalStateException(Error while closing operators, e);
 {noformat}
 Should be:
 {noformat}
  String msg = Error while closing operators:  + e;
 throw new IllegalStateException(msg, e);
 {noformat}
 {{SparkSessionManagerImpl}} - the method {{canReuseSession}} is useless
 {{GenSparkSkewJoinProcessor}}
 {noformat}
 +  // keep it as reference in case we need fetch work
 +//localPlan.getAliasToFetchWork().put(small_alias.toString(),
 +//new FetchWork(tblDir, tableDescList.get(small_alias)));
 {noformat}
 {{GenSparkWorkWalker}} trim ws
 {{SparkCompiler}} remote init
 {{SparkEdgeProperty}} trim ws
 {{CounterStatsPublisher}} eat exception
 {{Hadoop23Shims}} unused import of {{ResourceBundles}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8181) Upgrade JavaEWAH version to allow for unsorted bitset creation

2015-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273202#comment-14273202
 ] 

Hive QA commented on HIVE-8181:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691583/HIVE-8181.4.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7310 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2334/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2334/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2334/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691583 - PreCommit-HIVE-TRUNK-Build

 Upgrade JavaEWAH version to allow for unsorted bitset creation
 --

 Key: HIVE-8181
 URL: https://issues.apache.org/jira/browse/HIVE-8181
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0, 0.13.1
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-8181.1.patch, HIVE-8181.2.patch.txt, 
 HIVE-8181.3.patch.txt, HIVE-8181.4.patch.txt


 JavaEWAH has removed the restriction that bitsets can only be set in order in 
 the latest release. 
 Currently the use of {{ewah_bitmap}} UDAF requires a {{SORT BY}}.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: Can't set bits out of order with 
 EWAHCompressedBitmap
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:824)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)
 ... 7 more
 Caused by: java.lang.RuntimeException: Can't set bits out of order with 
 EWAHCompressedBitmap
 at 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8181) Upgrade JavaEWAH version to allow for unsorted bitset creation

2015-01-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8181:

Attachment: HIVE-8181.4.patch.txt

Updated gold file. But cannot reproduce fail of udaf_percentile_approx_23

 Upgrade JavaEWAH version to allow for unsorted bitset creation
 --

 Key: HIVE-8181
 URL: https://issues.apache.org/jira/browse/HIVE-8181
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0, 0.13.1
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-8181.1.patch, HIVE-8181.2.patch.txt, 
 HIVE-8181.3.patch.txt, HIVE-8181.4.patch.txt


 JavaEWAH has removed the restriction that bitsets can only be set in order in 
 the latest release. 
 Currently the use of {{ewah_bitmap}} UDAF requires a {{SORT BY}}.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: Can't set bits out of order with 
 EWAHCompressedBitmap
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:824)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)
 ... 7 more
 Caused by: java.lang.RuntimeException: Can't set bits out of order with 
 EWAHCompressedBitmap
 at 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9336) Fix Hive throws ParseException while handling Grouping-Sets clauses

2015-01-11 Thread zhaohm3 (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaohm3 updated HIVE-9336:
--
Status: Patch Available  (was: Open)

 Fix Hive throws ParseException while handling Grouping-Sets clauses
 ---

 Key: HIVE-9336
 URL: https://issues.apache.org/jira/browse/HIVE-9336
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 0.13.1
Reporter: zhaohm3
 Fix For: 0.14.0


 Currently, when Hive parses GROUPING SETS clauses, and if there are some 
 expressions that were composed of two or more common subexpressions, then the 
 first element of those expressions can only be a simple Identifier without 
 any qualifications, otherwise Hive will throw ParseException during its 
 parser stage. Therefore, Hive will throw ParseException while parsing the 
 following HQLs:
 drop table test;
 create table test(tc1 int, tc2 int, tc3 int);
 
 explain select test.tc1, test.tc2 from test group by test.tc1, test.tc2 
 grouping sets(test.tc1, (test.tc1, test.tc2));
 explain select tc1+tc2, tc2 from test group by tc1+tc2, tc2 grouping 
 sets(tc2, (tc1 + tc2, tc2));
 
 drop table test;
 The following contents show some ParseExctption stacktrace:
 2015-01-07 09:53:34,718 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,719 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,721 INFO [main]: ql.Driver 
 (Driver.java:checkConcurrency(158)) - Concurrency mode is disabled, not 
 creating a lock manager
 2015-01-07 09:53:34,721 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,724 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=parse 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,724 INFO [main]: parse.ParseDriver 
 (ParseDriver.java:parse(185)) - Parsing command: explain select test.tc1, 
 test.tc2 from test group by test.tc1, test.tc2 grouping sets(test.tc1, 
 (test.tc1, test.tc2))
 2015-01-07 09:53:34,734 ERROR [main]: ql.Driver 
 (SessionState.java:printError(545)) - FAILED: ParseException line 1:105 
 missing ) at ',' near 'EOF'
 line 1:116 extraneous input ')' expecting EOF near 'EOF'
 org.apache.hadoop.hive.ql.parse.ParseException: line 1:105 missing ) at 
 ',' near 'EOF'
 line 1:116 extraneous input ')' expecting EOF near 'EOF'
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:210)
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 2015-01-07 09:53:34,745 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=compile 
 start=1420595614721 end=1420595614745 duration=24 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,745 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=releaseLocks 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=releaseLocks 
 start=1420595614745 end=1420595614746 duration=1 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=releaseLocks 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 

[jira] [Commented] (HIVE-9335) Address review items on HIVE-9257 [Spark Branch]

2015-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273094#comment-14273094
 ] 

Hive QA commented on HIVE-9335:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691570/HIVE-9335.2-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7301 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testMultipleTransactionBatchCommits
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/631/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/631/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-631/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691570 - PreCommit-HIVE-SPARK-Build

 Address review items on HIVE-9257 [Spark Branch]
 

 Key: HIVE-9335
 URL: https://issues.apache.org/jira/browse/HIVE-9335
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9335.1-spark.patch, HIVE-9335.2-spark.patch


 I made a pass through HIVE-9257 and found the following issues: 
 {{HashTableSinkOperator.java}}
 The fields EMPTY_OBJECT_ARRAY and EMPTY_ROW_CONTAINER are no longer constants 
 and should not be in upper case.
 {{HivePairFlatMapFunction.java}}
 We share NumberFormat accross threads and it's not thread safe.
 {{KryoSerializer.java}}
 we eat the stack trace in deserializeJobConf
 {{SparkMapRecordHandler}}
 in processRow we should not be using {{StringUtils.stringifyException}} since 
 LOG can handle stack traces.
 in close:
 {noformat}
 // signal new failure to map-reduce
 LOG.error(Hit error while closing operators - failing tree);
 throw new IllegalStateException(Error while closing operators, e);
 {noformat}
 Should be:
 {noformat}
  String msg = Error while closing operators:  + e;
 throw new IllegalStateException(msg, e);
 {noformat}
 {{SparkSessionManagerImpl}} - the method {{canReuseSession}} is useless
 {{GenSparkSkewJoinProcessor}}
 {noformat}
 +  // keep it as reference in case we need fetch work
 +//localPlan.getAliasToFetchWork().put(small_alias.toString(),
 +//new FetchWork(tblDir, tableDescList.get(small_alias)));
 {noformat}
 {{GenSparkWorkWalker}} trim ws
 {{SparkCompiler}} remote init
 {{SparkEdgeProperty}} trim ws
 {{CounterStatsPublisher}} eat exception
 {{Hadoop23Shims}} unused import of {{ResourceBundles}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7122) Storage format for create like table

2015-01-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7122:

   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Vasanth kumar RJ for the contribution.

 Storage format for create like table
 

 Key: HIVE-7122
 URL: https://issues.apache.org/jira/browse/HIVE-7122
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Vasanth kumar RJ
Assignee: Vasanth kumar RJ
 Fix For: 0.15.0

 Attachments: HIVE-7122.1.patch, HIVE-7122.patch


 Using create like table user can specify the table storage format.
 Example:
 create table table1 like table2 stored as ORC;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9337) Move more hive.spark.* configurations to HiveConf

2015-01-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273175#comment-14273175
 ] 

Szehon Ho commented on HIVE-9337:
-

[~chengxiang li] I wonder if you have any thoughts about this?  I am not so 
familiar about the RPCConfiguration.java configurations to know, but I think it 
would be great to move them as well as 'hive.spark.log.dir' or any other ones, 
to HiveConf.java for documentation purpose as is consistent with other hive.* 
properties, if its possible

 Move more hive.spark.* configurations to HiveConf
 -

 Key: HIVE-9337
 URL: https://issues.apache.org/jira/browse/HIVE-9337
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Szehon Ho

 Some hive.spark configurations have been added to HiveConf, but there are 
 some like hive.spark.log.dir that are not there.
 Also some configurations in RpcConfiguration.java might be eligible to be 
 moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9340) Address review of HIVE-9257 (ii)

2015-01-11 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-9340:
---

 Summary: Address review of HIVE-9257 (ii)
 Key: HIVE-9340
 URL: https://issues.apache.org/jira/browse/HIVE-9340
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho


Some minor fixes:

1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test fw.
2.  Get rid of spark-snapshot repository dep in pom
3.  Cleanup ExplainTask to get rid of * in imports.
4.  Reorder the scala/spark dependencies in pom to fit the main order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273209#comment-14273209
 ] 

Szehon Ho commented on HIVE-9257:
-

Thanks I noticed, was collecting those as well as some other items I saw: 
HIVE-9340

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9340) Address review of HIVE-9257 (ii)

2015-01-11 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9340:

Issue Type: Sub-task  (was: Task)
Parent: HIVE-7292

 Address review of HIVE-9257 (ii)
 

 Key: HIVE-9340
 URL: https://issues.apache.org/jira/browse/HIVE-9340
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho

 Some minor fixes:
 1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test 
 fw.
 2.  Get rid of spark-snapshot repository dep in pom
 3.  Cleanup ExplainTask to get rid of * in imports.
 4.  Reorder the scala/spark dependencies in pom to fit the main order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9336) Fix Hive throws ParseException while handling Grouping-Sets clauses

2015-01-11 Thread zhaohm3 (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273152#comment-14273152
 ] 

zhaohm3 commented on HIVE-9336:
---

For more details, visit: https://www.zybuluo.com/Spongcer/note/61369


 Fix Hive throws ParseException while handling Grouping-Sets clauses
 ---

 Key: HIVE-9336
 URL: https://issues.apache.org/jira/browse/HIVE-9336
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 0.13.1
Reporter: zhaohm3
 Fix For: 0.14.0


 Currently, when Hive parses GROUPING SETS clauses, and if there are some 
 expressions that were composed of two or more common subexpressions, then the 
 first element of those expressions can only be a simple Identifier without 
 any qualifications, otherwise Hive will throw ParseException during its 
 parser stage. Therefore, Hive will throw ParseException while parsing the 
 following HQLs:
 drop table test;
 create table test(tc1 int, tc2 int, tc3 int);
 
 explain select test.tc1, test.tc2 from test group by test.tc1, test.tc2 
 grouping sets(test.tc1, (test.tc1, test.tc2));
 explain select tc1+tc2, tc2 from test group by tc1+tc2, tc2 grouping 
 sets(tc2, (tc1 + tc2, tc2));
 
 drop table test;
 The following contents show some ParseExctption stacktrace:
 2015-01-07 09:53:34,718 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,719 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,721 INFO [main]: ql.Driver 
 (Driver.java:checkConcurrency(158)) - Concurrency mode is disabled, not 
 creating a lock manager
 2015-01-07 09:53:34,721 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,724 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=parse 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,724 INFO [main]: parse.ParseDriver 
 (ParseDriver.java:parse(185)) - Parsing command: explain select test.tc1, 
 test.tc2 from test group by test.tc1, test.tc2 grouping sets(test.tc1, 
 (test.tc1, test.tc2))
 2015-01-07 09:53:34,734 ERROR [main]: ql.Driver 
 (SessionState.java:printError(545)) - FAILED: ParseException line 1:105 
 missing ) at ',' near 'EOF'
 line 1:116 extraneous input ')' expecting EOF near 'EOF'
 org.apache.hadoop.hive.ql.parse.ParseException: line 1:105 missing ) at 
 ',' near 'EOF'
 line 1:116 extraneous input ')' expecting EOF near 'EOF'
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:210)
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 2015-01-07 09:53:34,745 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=compile 
 start=1420595614721 end=1420595614745 duration=24 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,745 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=releaseLocks 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=releaseLocks 
 start=1420595614745 end=1420595614746 duration=1 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - 

[jira] [Resolved] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho resolved HIVE-9257.
-
   Resolution: Fixed
Fix Version/s: 0.15.0

Committed to trunk.  Thanks Brock and Xuefu for detailed review!  Thanks also 
to all the contributors to the spark branch for this milestone!

Also modified build machine's default properties (trunk-mr2) to the new 
properties attached (trunk-mr2-spark-merge) which has configurations to run 
SparkCliDriver tests.

Follow-ups will be taken care of in HIVE-9335, and subsequent JIRA's.

In terms of doc, there is one property added to HiveConf 
(hive.spark.client.future.timeout), but there are potentially more (see 
HIVE-9337).  These will have to be added in : 
[https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2]

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9335) Address review items on HIVE-9257 [Spark Branch]

2015-01-11 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9335:
---
Attachment: HIVE-9335.2-spark.patch

 Address review items on HIVE-9257 [Spark Branch]
 

 Key: HIVE-9335
 URL: https://issues.apache.org/jira/browse/HIVE-9335
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9335.1-spark.patch, HIVE-9335.2-spark.patch


 I made a pass through HIVE-9257 and found the following issues: 
 {{HashTableSinkOperator.java}}
 The fields EMPTY_OBJECT_ARRAY and EMPTY_ROW_CONTAINER are no longer constants 
 and should not be in upper case.
 {{HivePairFlatMapFunction.java}}
 We share NumberFormat accross threads and it's not thread safe.
 {{KryoSerializer.java}}
 we eat the stack trace in deserializeJobConf
 {{SparkMapRecordHandler}}
 in processRow we should not be using {{StringUtils.stringifyException}} since 
 LOG can handle stack traces.
 in close:
 {noformat}
 // signal new failure to map-reduce
 LOG.error(Hit error while closing operators - failing tree);
 throw new IllegalStateException(Error while closing operators, e);
 {noformat}
 Should be:
 {noformat}
  String msg = Error while closing operators:  + e;
 throw new IllegalStateException(msg, e);
 {noformat}
 {{SparkSessionManagerImpl}} - the method {{canReuseSession}} is useless
 {{GenSparkSkewJoinProcessor}}
 {noformat}
 +  // keep it as reference in case we need fetch work
 +//localPlan.getAliasToFetchWork().put(small_alias.toString(),
 +//new FetchWork(tblDir, tableDescList.get(small_alias)));
 {noformat}
 {{GenSparkWorkWalker}} trim ws
 {{SparkCompiler}} remote init
 {{SparkEdgeProperty}} trim ws
 {{CounterStatsPublisher}} eat exception
 {{Hadoop23Shims}} unused import of {{ResourceBundles}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9310) CLI JLine does not flush history back to ~/.hivehistory

2015-01-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273145#comment-14273145
 ] 

Navis commented on HIVE-9310:
-

Ok, I got it. But still can we flush history in signal 
handler(https://github.com/apache/hive/blob/trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L325)?

 CLI JLine does not flush history back to ~/.hivehistory
 ---

 Key: HIVE-9310
 URL: https://issues.apache.org/jira/browse/HIVE-9310
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.15.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9310.1.patch


 Hive CLI does not seem to be saving history anymore.
 In JLine with the PersistentHistory class, to keep history across sessions, 
 you need to do {{reader.getHistory().flush()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9341) Apply ColumnPrunning for noop PTFs

2015-01-11 Thread Navis (JIRA)
Navis created HIVE-9341:
---

 Summary: Apply ColumnPrunning for noop PTFs
 Key: HIVE-9341
 URL: https://issues.apache.org/jira/browse/HIVE-9341
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial


Currently, PTF disables CP optimization, which can make a huge burden. For 
example,
{noformat}
select p_mfgr, p_name, p_size,
rank() over (partition by p_mfgr order by p_name) as r,
dense_rank() over (partition by p_mfgr order by p_name) as dr,
sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
unbounded preceding and current row) as s1
from noop(on part 
  partition by p_mfgr
  order by p_name
  );

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: part
Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: p_mfgr (type: string), p_name (type: string)
  sort order: ++
  Map-reduce partition columns: p_mfgr (type: string)
  Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: p_partkey (type: int), p_name (type: string), 
p_mfgr (type: string), p_brand (type: string), p_type (type: string), p_size 
(type: int), p_container (type: string), p_retailprice (type: double), 
p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: bigint), 
INPUT__FILE__NAME (type: string), ROW__ID (type: 
structtransactionid:bigint,bucketid:int,rowid:bigint)
...
{noformat}

There should be a generic way to discern referenced columns but before that, we 
know CP can be safely applied to noop functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9334) PredicateTransitivePropagate optimizer should run after PredicatePushDown

2015-01-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273132#comment-14273132
 ] 

Navis commented on HIVE-9334:
-

PredicateTransitivePropagate is for propagating predicates in on condition in 
JOIN operators only. Others should be dealt with generic PPD optimizer. So it's 
right order to run PredicateTransitivePropagate first before PPD. (Yes, these 
two can be merged into one optimizer, but PPD was too unstable in those days)

I don't know where not-null predicates is from but it's redundant and should 
not be added (It's once removed by ConstantPropagateOptimizer).

 PredicateTransitivePropagate optimizer should run after PredicatePushDown
 -

 Key: HIVE-9334
 URL: https://issues.apache.org/jira/browse/HIVE-9334
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9334.1.patch, HIVE-9334.patch


 This way PredicateTransitivePropagate will be more effective as it has more 
 filters to push for other branches of joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9336) Fix Hive throws ParseException while handling Grouping-Sets clauses

2015-01-11 Thread zhaohm3 (JIRA)
zhaohm3 created HIVE-9336:
-

 Summary: Fix Hive throws ParseException while handling 
Grouping-Sets clauses
 Key: HIVE-9336
 URL: https://issues.apache.org/jira/browse/HIVE-9336
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 0.13.1
Reporter: zhaohm3
 Fix For: 0.14.0


Currently, when Hive parses GROUPING SETS clauses, and if there are some 
expressions that were composed of two or more common subexpressions, then the 
first element of those expressions can only be a simple Identifier without any 
qualifications, otherwise Hive will throw ParseException during its parser 
stage. Therefore, Hive will throw ParseException while parsing the following 
HQLs:

drop table test;
create table test(tc1 int, tc2 int, tc3 int);

explain select test.tc1, test.tc2 from test group by test.tc1, test.tc2 
grouping sets(test.tc1, (test.tc1, test.tc2));
explain select tc1+tc2, tc2 from test group by tc1+tc2, tc2 grouping 
sets(tc2, (tc1 + tc2, tc2));

drop table test;

The following contents show some ParseExctption stacktrace:

2015-01-07 09:53:34,718 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=Driver.run 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,719 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=TimeToSubmit 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,721 INFO [main]: ql.Driver 
(Driver.java:checkConcurrency(158)) - Concurrency mode is disabled, not 
creating a lock manager
2015-01-07 09:53:34,721 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=compile 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,724 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=parse 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,724 INFO [main]: parse.ParseDriver 
(ParseDriver.java:parse(185)) - Parsing command: explain select test.tc1, 
test.tc2 from test group by test.tc1, test.tc2 grouping sets(test.tc1, 
(test.tc1, test.tc2))
2015-01-07 09:53:34,734 ERROR [main]: ql.Driver 
(SessionState.java:printError(545)) - FAILED: ParseException line 1:105 missing 
) at ',' near 'EOF'
line 1:116 extraneous input ')' expecting EOF near 'EOF'
org.apache.hadoop.hive.ql.parse.ParseException: line 1:105 missing ) at ',' 
near 'EOF'
line 1:116 extraneous input ')' expecting EOF near 'EOF'
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:210)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
2015-01-07 09:53:34,745 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=compile 
start=1420595614721 end=1420595614745 duration=24 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,745 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=releaseLocks 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=releaseLocks 
start=1420595614745 end=1420595614746 duration=1 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=releaseLocks 
from=org.apache.hadoop.hive.ql.Driver
2015-01-07 09:53:34,746 INFO [main]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=releaseLocks 
start=1420595614746 end=1420595614746 duration=0 
from=org.apache.hadoop.hive.ql.Driver

But, Hive will not throw ParseException while handling the follwing HQLs:

drop table test;

[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273177#comment-14273177
 ] 

Xuefu Zhang commented on HIVE-9257:
---

Actually my comments on RB were not covered by HIVE-9335, which already has +1 
pending. We may need a separate JIRA to cover them.

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9338) Merge from trunk to spark 1/12/2015 [Spark Branch]

2015-01-11 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-9338:
---

 Summary: Merge from trunk to spark 1/12/2015 [Spark Branch]
 Key: HIVE-9338
 URL: https://issues.apache.org/jira/browse/HIVE-9338
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

2015-01-11 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273185#comment-14273185
 ] 

Xuefu Zhang commented on HIVE-9339:
---

cc: [~lirui]

 Optimize split grouping for CombineHiveInputFormat [Spark Branch]
 -

 Key: HIVE-9339
 URL: https://issues.apache.org/jira/browse/HIVE-9339
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang

 It seems that split generation, especially in terms of grouping inputs, needs 
 to be improved. For this, we may need cluster information. Because of this, 
 we will first try to solve the problem for Spark.
 As to cluster information, Spark doesn't provide an API (SPARK-5080). 
 However, Spark doesn't have a listener API, with which Spark driver can get 
 notifications about executor going up/down, task starting/finishing, etc. 
 With this information, Spark client should be able to have a view of the 
 current cluster image.
 Spark developers mentioned that the listener can only be created after 
 SparkContext is started, at which time, some executions may have already 
 started and so the listener will miss some information. This can be fixed. 
 File a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

2015-01-11 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-9339:
-

 Summary: Optimize split grouping for CombineHiveInputFormat [Spark 
Branch]
 Key: HIVE-9339
 URL: https://issues.apache.org/jira/browse/HIVE-9339
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang


It seems that split generation, especially in terms of grouping inputs, needs 
to be improved. For this, we may need cluster information. Because of this, we 
will first try to solve the problem for Spark.

As to cluster information, Spark doesn't provide an API (SPARK-5080). However, 
Spark doesn't have a listener API, with which Spark driver can get 
notifications about executor going up/down, task starting/finishing, etc. With 
this information, Spark client should be able to have a view of the current 
cluster image.

Spark developers mentioned that the listener can only be created after 
SparkContext is started, at which time, some executions may have already 
started and so the listener will miss some information. This can be fixed. File 
a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

2015-01-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273191#comment-14273191
 ] 

Rui Li commented on HIVE-9339:
--

Using listener is fine. We currently use listeners to collect metrics as well.

 Optimize split grouping for CombineHiveInputFormat [Spark Branch]
 -

 Key: HIVE-9339
 URL: https://issues.apache.org/jira/browse/HIVE-9339
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang

 It seems that split generation, especially in terms of grouping inputs, needs 
 to be improved. For this, we may need cluster information. Because of this, 
 we will first try to solve the problem for Spark.
 As to cluster information, Spark doesn't provide an API (SPARK-5080). 
 However, Spark doesn't have a listener API, with which Spark driver can get 
 notifications about executor going up/down, task starting/finishing, etc. 
 With this information, Spark client should be able to have a view of the 
 current cluster image.
 Spark developers mentioned that the listener can only be created after 
 SparkContext is started, at which time, some executions may have already 
 started and so the listener will miss some information. This can be fixed. 
 File a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9340) Address review of HIVE-9257 (ii)

2015-01-11 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9340:

Description: 
Some minor fixes:

1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test fw.
2.  Get rid of spark-snapshot repository dep in pom (found by Xuefu)
3.  Cleanup ExplainTask to get rid of * in imports. (found by Xuefu)
4.  Reorder the scala/spark dependencies in pom to fit the alphabetical order.

  was:
Some minor fixes:

1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test fw.
2.  Get rid of spark-snapshot repository dep in pom
3.  Cleanup ExplainTask to get rid of * in imports.
4.  Reorder the scala/spark dependencies in pom to fit the main order.


 Address review of HIVE-9257 (ii)
 

 Key: HIVE-9340
 URL: https://issues.apache.org/jira/browse/HIVE-9340
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho

 Some minor fixes:
 1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test 
 fw.
 2.  Get rid of spark-snapshot repository dep in pom (found by Xuefu)
 3.  Cleanup ExplainTask to get rid of * in imports. (found by Xuefu)
 4.  Reorder the scala/spark dependencies in pom to fit the alphabetical order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9335) Address review items on HIVE-9257 [Spark Branch]

2015-01-11 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273104#comment-14273104
 ] 

Xuefu Zhang commented on HIVE-9335:
---

+1

 Address review items on HIVE-9257 [Spark Branch]
 

 Key: HIVE-9335
 URL: https://issues.apache.org/jira/browse/HIVE-9335
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9335.1-spark.patch, HIVE-9335.2-spark.patch


 I made a pass through HIVE-9257 and found the following issues: 
 {{HashTableSinkOperator.java}}
 The fields EMPTY_OBJECT_ARRAY and EMPTY_ROW_CONTAINER are no longer constants 
 and should not be in upper case.
 {{HivePairFlatMapFunction.java}}
 We share NumberFormat accross threads and it's not thread safe.
 {{KryoSerializer.java}}
 we eat the stack trace in deserializeJobConf
 {{SparkMapRecordHandler}}
 in processRow we should not be using {{StringUtils.stringifyException}} since 
 LOG can handle stack traces.
 in close:
 {noformat}
 // signal new failure to map-reduce
 LOG.error(Hit error while closing operators - failing tree);
 throw new IllegalStateException(Error while closing operators, e);
 {noformat}
 Should be:
 {noformat}
  String msg = Error while closing operators:  + e;
 throw new IllegalStateException(msg, e);
 {noformat}
 {{SparkSessionManagerImpl}} - the method {{canReuseSession}} is useless
 {{GenSparkSkewJoinProcessor}}
 {noformat}
 +  // keep it as reference in case we need fetch work
 +//localPlan.getAliasToFetchWork().put(small_alias.toString(),
 +//new FetchWork(tblDir, tableDescList.get(small_alias)));
 {noformat}
 {{GenSparkWorkWalker}} trim ws
 {{SparkCompiler}} remote init
 {{SparkEdgeProperty}} trim ws
 {{CounterStatsPublisher}} eat exception
 {{Hadoop23Shims}} unused import of {{ResourceBundles}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9172) Merging HIVE-5871 into LazySimpleSerDe

2015-01-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273158#comment-14273158
 ] 

Navis commented on HIVE-9172:
-

[~jdere] Thanks for the precious comments I've missed. Agree on reverting 
MultiDelimitSerde (Even I didn't noticed base64 encoding problem in lazy serde, 
which should to be configurable also). For backward compatibility issue, I 
agree that classes in LazySerDe(objects, OIs, utils) might be useful to 
implement custom SerDes, but basically it's not for public usage and should not 
be regarded as that. And also I think I've minimized changes and seemed not 
that hard to rebase it.

 Merging HIVE-5871 into LazySimpleSerDe
 --

 Key: HIVE-9172
 URL: https://issues.apache.org/jira/browse/HIVE-9172
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-9172.1.patch.txt, HIVE-9172.2.patch.txt, 
 HIVE-9172.3.patch.txt


 Merging multi character support for field delimiter to LazySimpleSerDe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9257:

Labels: TODOC15  (was: )

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Move ancient Hive issues from Hadoop project to Hive

2015-01-11 Thread Brock Noland
+1

On Fri, Jan 9, 2015 at 5:48 PM, Ashutosh Chauhan hashut...@apache.org
wrote:

 Hi all,

 Hive started out as Hadoop subproject. That time Hadoop's jira is used to
 track Hive's bugs and features. As I try to find lineage of some very old
 code in Hive, I sometimes end up on those jiras. It will be nice to move
 those issues from Hadoop to Hive so that its easy to search as all jiras
 relevant to Hive is contained in one project. A representative list is
 :  *http://s.apache.org/Hive-issues-in-Hadoop
 http://s.apache.org/Hive-issues-in-Hadoop*

 Unless some one objects, I will start to move those issues to Hive some
 time over next week.

 Thanks,
 Ashutosh



[jira] [Updated] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way

2015-01-11 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9119:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Na.

 ZooKeeperHiveLockManager does not use zookeeper in the proper way
 -

 Key: HIVE-9119
 URL: https://issues.apache.org/jira/browse/HIVE-9119
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Na Yang
Assignee: Na Yang
 Fix For: 0.15.0

 Attachments: HIVE-9119.1.patch, HIVE-9119.2.patch, HIVE-9119.3.patch, 
 HIVE-9119.4.patch


 ZooKeeperHiveLockManager does not use zookeeper in the proper way. 
 Currently a new zookeeper client instance is created for each 
 getlock/releaselock query which sometimes causes the number of open 
 connections between
 HiveServer2 and ZooKeeper exceed the max connection number that zookeeper 
 server allows. 
 To use zookeeper as a distributed lock, there is no need to create a new 
 zookeeper instance for every getlock try. A single zookeeper instance could 
 be reused and shared by ZooKeeperHiveLockManagers.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way

2015-01-11 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9119:
--
Labels: TODOC15  (was: )

 ZooKeeperHiveLockManager does not use zookeeper in the proper way
 -

 Key: HIVE-9119
 URL: https://issues.apache.org/jira/browse/HIVE-9119
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Na Yang
Assignee: Na Yang
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: HIVE-9119.1.patch, HIVE-9119.2.patch, HIVE-9119.3.patch, 
 HIVE-9119.4.patch


 ZooKeeperHiveLockManager does not use zookeeper in the proper way. 
 Currently a new zookeeper client instance is created for each 
 getlock/releaselock query which sometimes causes the number of open 
 connections between
 HiveServer2 and ZooKeeper exceed the max connection number that zookeeper 
 server allows. 
 To use zookeeper as a distributed lock, there is no need to create a new 
 zookeeper instance for every getlock try. A single zookeeper instance could 
 be reused and shared by ZooKeeperHiveLockManagers.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9343) Fix windowing.q for Spark on trunk

2015-01-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273261#comment-14273261
 ] 

Rui Li commented on HIVE-9343:
--

OK I'll take a look

 Fix windowing.q for Spark on trunk
 --

 Key: HIVE-9343
 URL: https://issues.apache.org/jira/browse/HIVE-9343
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 After HIVE-9257 the windowing.q test is failing on trunk since HIVE-9104 was 
 not merge to spark. Details:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode

2015-01-11 Thread Pierre Yin (JIRA)
Pierre Yin created HIVE-9342:


 Summary: add num-executors / executor-cores / executor-memory 
option support for hive on spark in Yarn mode
 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Improvement
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor


When I run hive on spark with Yarn mode, I want to control some yarn option, 
such as --num-executors, --executor-cores, --executor-memory.
We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9343) Fix windowing.q for Spark on trunk

2015-01-11 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9343:
-
Assignee: Rui Li
  Status: Patch Available  (was: Open)

 Fix windowing.q for Spark on trunk
 --

 Key: HIVE-9343
 URL: https://issues.apache.org/jira/browse/HIVE-9343
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: HIVE-9343.1.patch


 After HIVE-9257 the windowing.q test is failing on trunk since HIVE-9104 was 
 not merge to spark. Details:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9343) Fix windowing.q for Spark on trunk

2015-01-11 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9343:
-
Attachment: HIVE-9343.1.patch

 Fix windowing.q for Spark on trunk
 --

 Key: HIVE-9343
 URL: https://issues.apache.org/jira/browse/HIVE-9343
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
 Attachments: HIVE-9343.1.patch


 After HIVE-9257 the windowing.q test is failing on trunk since HIVE-9104 was 
 not merge to spark. Details:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Patch Available  (was: Open)

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, 
 HIVE-9039.12.patch, HIVE-9039.13.patch, HIVE-9039.14.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: HIVE-9039.14.patch

update union remove 25.q

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, 
 HIVE-9039.12.patch, HIVE-9039.13.patch, HIVE-9039.14.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9338) Merge from trunk to spark 1/12/2015 [Spark Branch]

2015-01-11 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9338:

Status: Patch Available  (was: Open)

 Merge from trunk to spark 1/12/2015 [Spark Branch]
 --

 Key: HIVE-9338
 URL: https://issues.apache.org/jira/browse/HIVE-9338
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9338-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9341) Apply ColumnPrunning for noop PTFs

2015-01-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9341:

Attachment: HIVE-9341.1.patch.txt

 Apply ColumnPrunning for noop PTFs
 --

 Key: HIVE-9341
 URL: https://issues.apache.org/jira/browse/HIVE-9341
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9341.1.patch.txt


 Currently, PTF disables CP optimization, which can make a huge burden. For 
 example,
 {noformat}
 select p_mfgr, p_name, p_size,
 rank() over (partition by p_mfgr order by p_name) as r,
 dense_rank() over (partition by p_mfgr order by p_name) as dr,
 sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
 unbounded preceding and current row) as s1
 from noop(on part 
   partition by p_mfgr
   order by p_name
   );
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: part
 Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   key expressions: p_mfgr (type: string), p_name (type: string)
   sort order: ++
   Map-reduce partition columns: p_mfgr (type: string)
   Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: p_partkey (type: int), p_name (type: 
 string), p_mfgr (type: string), p_brand (type: string), p_type (type: 
 string), p_size (type: int), p_container (type: string), p_retailprice (type: 
 double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: 
 bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: 
 structtransactionid:bigint,bucketid:int,rowid:bigint)
 ...
 {noformat}
 There should be a generic way to discern referenced columns but before that, 
 we know CP can be safely applied to noop functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9344) Fix flaky test optimize_nullscan

2015-01-11 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9344:
---
Assignee: (was: Brock Noland)

 Fix flaky test optimize_nullscan
 

 Key: HIVE-9344
 URL: https://issues.apache.org/jira/browse/HIVE-9344
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 The optimize_nullscan test is extremely flaky. We need to find a way to fix 
 this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9344) Fix flaky test optimize_nullscan

2015-01-11 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned HIVE-9344:
--

Assignee: Brock Noland

 Fix flaky test optimize_nullscan
 

 Key: HIVE-9344
 URL: https://issues.apache.org/jira/browse/HIVE-9344
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland

 The optimize_nullscan test is extremely flaky. We need to find a way to fix 
 this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273258#comment-14273258
 ] 

Brock Noland commented on HIVE-9257:


I also created HIVE-9344 to fix that optimize_nullscan tests that are so flaky.

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9341) Apply ColumnPrunning for noop PTFs

2015-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273284#comment-14273284
 ] 

Hive QA commented on HIVE-9341:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691593/HIVE-9341.1.patch.txt

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 7311 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_streaming
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2335/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2335/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2335/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691593 - PreCommit-HIVE-TRUNK-Build

 Apply ColumnPrunning for noop PTFs
 --

 Key: HIVE-9341
 URL: https://issues.apache.org/jira/browse/HIVE-9341
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9341.1.patch.txt


 Currently, PTF disables CP optimization, which can make a huge burden. For 
 example,
 {noformat}
 select p_mfgr, p_name, p_size,
 rank() over (partition by p_mfgr order by p_name) as r,
 dense_rank() over (partition by p_mfgr order by p_name) as dr,
 sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
 unbounded preceding and current row) as s1
 from noop(on part 
   partition by p_mfgr
   order by p_name
   );
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: part
 Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   key expressions: p_mfgr (type: string), p_name (type: string)
   sort order: ++
   Map-reduce partition columns: p_mfgr (type: string)
   Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: p_partkey (type: int), p_name (type: 
 string), p_mfgr (type: string), p_brand (type: string), p_type (type: 
 string), p_size (type: int), p_container (type: string), p_retailprice (type: 
 double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: 
 bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: 
 structtransactionid:bigint,bucketid:int,rowid:bigint)
 ...
 {noformat}
 There should be a generic way to discern referenced columns but before that, 
 we know CP can be safely applied to noop functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9338) Merge from trunk to spark 1/12/2015 [Spark Branch]

2015-01-11 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9338:

Attachment: HIVE-9338-spark.patch

 Merge from trunk to spark 1/12/2015 [Spark Branch]
 --

 Key: HIVE-9338
 URL: https://issues.apache.org/jira/browse/HIVE-9338
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9338-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode

2015-01-11 Thread Pierre Yin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Yin updated HIVE-9342:
-
Attachment: HIVE-9342.1-spark.patch

 add num-executors / executor-cores / executor-memory option support for hive 
 on spark in Yarn mode
 --

 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Improvement
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor
  Labels: spark
 Fix For: spark-branch

 Attachments: HIVE-9342.1-spark.patch


 When I run hive on spark with Yarn mode, I want to control some yarn option, 
 such as --num-executors, --executor-cores, --executor-memory.
 We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273256#comment-14273256
 ] 

Brock Noland commented on HIVE-9257:


The only test which failed and is not flaky is was the windowing test. I 
created HIVE-9343 to track the issue.

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9344) Fix flaky test optimize_nullscan

2015-01-11 Thread Brock Noland (JIRA)
Brock Noland created HIVE-9344:
--

 Summary: Fix flaky test optimize_nullscan
 Key: HIVE-9344
 URL: https://issues.apache.org/jira/browse/HIVE-9344
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


The optimize_nullscan test is extremely flaky. We need to find a way to fix 
this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273223#comment-14273223
 ] 

Brock Noland commented on HIVE-9257:


FYI - I am running a build here: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/1/
 to ensure all the tests are passing post merge. If there is any failing tests, 
due to the merge, we should address them ASAP tomorrow.

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9341) Apply ColumnPrunning for noop PTFs

2015-01-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9341:

Status: Patch Available  (was: Open)

 Apply ColumnPrunning for noop PTFs
 --

 Key: HIVE-9341
 URL: https://issues.apache.org/jira/browse/HIVE-9341
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9341.1.patch.txt


 Currently, PTF disables CP optimization, which can make a huge burden. For 
 example,
 {noformat}
 select p_mfgr, p_name, p_size,
 rank() over (partition by p_mfgr order by p_name) as r,
 dense_rank() over (partition by p_mfgr order by p_name) as dr,
 sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
 unbounded preceding and current row) as s1
 from noop(on part 
   partition by p_mfgr
   order by p_name
   );
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: part
 Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   key expressions: p_mfgr (type: string), p_name (type: string)
   sort order: ++
   Map-reduce partition columns: p_mfgr (type: string)
   Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: p_partkey (type: int), p_name (type: 
 string), p_mfgr (type: string), p_brand (type: string), p_type (type: 
 string), p_size (type: int), p_container (type: string), p_retailprice (type: 
 double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: 
 bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: 
 structtransactionid:bigint,bucketid:int,rowid:bigint)
 ...
 {noformat}
 There should be a generic way to discern referenced columns but before that, 
 we know CP can be safely applied to noop functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 29800: Apply ColumnPrunning for noop PTFs

2015-01-11 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29800/
---

Review request for hive.


Bugs: HIVE-9341
https://issues.apache.org/jira/browse/HIVE-9341


Repository: hive-git


Description
---

Currently, PTF disables CP optimization, which can make a huge burden. For 
example,
{noformat}
select p_mfgr, p_name, p_size,
rank() over (partition by p_mfgr order by p_name) as r,
dense_rank() over (partition by p_mfgr order by p_name) as dr,
sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
unbounded preceding and current row) as s1
from noop(on part 
  partition by p_mfgr
  order by p_name
  );

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: part
Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: p_mfgr (type: string), p_name (type: string)
  sort order: ++
  Map-reduce partition columns: p_mfgr (type: string)
  Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: p_partkey (type: int), p_name (type: string), 
p_mfgr (type: string), p_brand (type: string), p_type (type: string), p_size 
(type: int), p_container (type: string), p_retailprice (type: double), 
p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: bigint), 
INPUT__FILE__NAME (type: string), ROW__ID (type: 
structtransactionid:bigint,bucketid:int,rowid:bigint)
...
{noformat}

There should be a generic way to discern referenced columns but before that, we 
know CP can be safely applied to noop functions.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
afd1738 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicatePushDown.java ee7328e 

Diff: https://reviews.apache.org/r/29800/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Updated] (HIVE-9039) Support Union Distinct

2015-01-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Open  (was: Patch Available)

 Support Union Distinct
 --

 Key: HIVE-9039
 URL: https://issues.apache.org/jira/browse/HIVE-9039
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
 HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, 
 HIVE-9039.06.patch, HIVE-9039.07.patch, HIVE-9039.08.patch, 
 HIVE-9039.09.patch, HIVE-9039.10.patch, HIVE-9039.11.patch, 
 HIVE-9039.12.patch, HIVE-9039.13.patch, HIVE-9039.14.patch


 Current version (Hive 0.14) does not support union (or union distinct). It 
 only supports union all. In this patch, we try to add this new feature by 
 rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9343) Fix windowing.q for Spark on trunk

2015-01-11 Thread Brock Noland (JIRA)
Brock Noland created HIVE-9343:
--

 Summary: Fix windowing.q for Spark on trunk
 Key: HIVE-9343
 URL: https://issues.apache.org/jira/browse/HIVE-9343
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


After HIVE-9257 the windowing.q test is failing on trunk since HIVE-9104 was 
not merge to spark. Details:

http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode

2015-01-11 Thread Pierre Yin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Yin updated HIVE-9342:
-
Fix Version/s: spark-branch
   Status: Patch Available  (was: Open)

Please see patch in attachment.

 add num-executors / executor-cores / executor-memory option support for hive 
 on spark in Yarn mode
 --

 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Improvement
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor
  Labels: spark
 Fix For: spark-branch


 When I run hive on spark with Yarn mode, I want to control some yarn option, 
 such as --num-executors, --executor-cores, --executor-memory.
 We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9343) Fix windowing.q for Spark on trunk

2015-01-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273253#comment-14273253
 ] 

Brock Noland commented on HIVE-9343:


[~chengxiang li] or [~lirui] - any chance you could update the output file for 
this test?



 Fix windowing.q for Spark on trunk
 --

 Key: HIVE-9343
 URL: https://issues.apache.org/jira/browse/HIVE-9343
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 After HIVE-9257 the windowing.q test is failing on trunk since HIVE-9104 was 
 not merge to spark. Details:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015

2015-01-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273295#comment-14273295
 ] 

Szehon Ho commented on HIVE-9257:
-

Thanks Brock.  FYI, I am planning another merge in next few days to incorporate 
the review fixes, and also to get rid of the spark-snapshot dependency.  It 
will hopefully be a more managable patch that can uploaded this time and tested 
the normal way :)

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9343) Fix windowing.q for Spark on trunk

2015-01-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273296#comment-14273296
 ] 

Szehon Ho commented on HIVE-9343:
-

Thanks guys for taking care of this!  +1 pending tests.

 Fix windowing.q for Spark on trunk
 --

 Key: HIVE-9343
 URL: https://issues.apache.org/jira/browse/HIVE-9343
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: HIVE-9343.1.patch


 After HIVE-9257 the windowing.q test is failing on trunk since HIVE-9104 was 
 not merge to spark. Details:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-2/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)