date:20141218


[ 
https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251356#comment-14251356
 ] 

Vikram Dixit K commented on HIVE-9141:
--

[~navis] Nice patch. Simplifies the code as well. However, it looks like with 
this change, the followingWork can never be a union work because of moving the 
code. Do you see any way followingWork can be union work? I think we can remove 
that piece of the code as well as remove the getFollowingWorkIndex method as 
well. Thoughts?

 HiveOnTez: mix of union all, distinct, group by generates error
 ---

 Key: HIVE-9141
 URL: https://issues.apache.org/jira/browse/HIVE-9141
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Navis
 Attachments: HIVE-9141.1.patch.txt


 Here is the way to produce it:
 in Hive q test setting (with src table)
 set hive.execution.engine=tez;
 SELECT key, value FROM
   (
   SELECT key, value FROM src
 UNION ALL
   SELECT key, key as value FROM 
   
   (  
   SELECT distinct key FROM (
   SELECT key, value FROM
   (SELECT key, value FROM src
   UNION ALL
   SELECT key, value FROM src
   )t1 
   group by  key, value
   )t2
 )t3 
   
)t4
group by  key, value;
 will generate
 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - 
 FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be 
 cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot 
 be cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
 at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7081) HiveServer/HiveServer2 leaks jdbc connections when network interrupt

2014-12-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251358#comment-14251358
 ] 

Thejas M Nair commented on HIVE-7081:
-

I don't think HIVE-5799 will help to free up this thread. HIVE-6679 (or the 
upgrade to thrift 0.9.2) would be needed.

 HiveServer/HiveServer2 leaks jdbc connections when network  interrupt
 -

 Key: HIVE-7081
 URL: https://issues.apache.org/jira/browse/HIVE-7081
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.12.0, 0.13.0
 Environment: hadoop 1.2.1
 hive 0.12.0 / hive 0.13.0
 linux 2.6.32
Reporter: Wang Zhiqiang
  Labels: ConnectoinLeak, HiveServer2, JDBC

 HiveServer/HiveServer2 leaks jdbc connections when network between client and 
 server is interrupted。
 I test both use DBVisualizer and write JDBC code，when the network between 
 client and hiveserver/hiverserver2 is interrupted，the tcp connection in the 
 server side will be in ESTABLISH state forever util the server is stoped。By 
 Using jstack to print out server's thread, I found thread is doing 
 socketRead0()。
 {quote}
 pool-1-thread-13 prio=10 tid=0x7fd00c0c6800 nid=0x5d21 runnable 
 [0x7fd00018]
java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - locked 0xebc24f28 (a java.io.BufferedInputStream)
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745) 
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6892) Permission inheritance issues

2014-12-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251359#comment-14251359
 ] 

Lefty Leverenz commented on HIVE-6892:
--

Can we remove the TODOC14 label now?

Also, should any other docs have links to Permission Inheritance in Hive?  For 
example, Authorization or Storage Based Authorization:

* [Authorization | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization]
* [Storage Based Authorization | 
https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+Metastore+Server]

 Permission inheritance issues
 -

 Key: HIVE-6892
 URL: https://issues.apache.org/jira/browse/HIVE-6892
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC14

 *HDFS Background*
 * When a file or directory is created, its owner is the user identity of the 
 client process, and its group is inherited from parent (the BSD rule).  
 Permissions are taken from default umask.  Extended Acl's are taken from 
 parent unless they are set explicitly.
 *Goals*
 To reduce need to set fine-grain file security props after every operation, 
 users may want the following Hive warehouse file/dir to auto-inherit security 
 properties from their directory parents:
 * Directories created by new database/table/partition/bucket
 * Files added to tables via load/insert
 * Table directories exported/imported  (open question of whether exported 
 table inheriting perm from new parent needs another flag)
 What may be inherited:
 * Basic file permission
 * Groups (already done by HDFS for new directories)
 * Extended ACL's (already done by HDFS for new directories)
 *Behavior*
 * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive 
 will try to do all above inheritances.  In the future, we can add more flags 
 for more finer-grained control.
 * Failure by Hive to inherit will not cause operation to fail.  Rule of thumb 
 of when security-prop inheritance will happen is the following:
 ** To run chmod, a user must be the owner of the file, or else a super-user.
 ** To run chgrp, a user must be the owner of files, or else a super-user.
 ** Hence, user that hive runs as (either 'hive' or the logged-in user in case 
 of impersonation), must be super-user or owner of the file whose security 
 properties are going to be changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251381#comment-14251381
 ] 

Hive QA commented on HIVE-8920:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687976/HIVE-8920.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7237 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/571/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/571/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-571/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687976 - PreCommit-HIVE-SPARK-Build

 SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]
 -

 Key: HIVE-8920
 URL: https://issues.apache.org/jira/browse/HIVE-8920
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Xuefu Zhang
 Attachments: HIVE-8920.1-spark.patch


 The following query will not work:
 {code}
 from (select * from table0 union all select * from table1) s
 insert overwrite table table3 select s.x, count(1) group by s.x
 insert overwrite table table4 select s.y, count(1) group by s.y;
 {code}
 Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
 below:
 {noformat}
M1M2
  \  / \
   U3   R5
   |
   R4
 {noformat}
 In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the 
 {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the 
 childWork could be UnionWork U3. Thus, the code will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9148) Fix default value for HWI_WAR_FILE

2014-12-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251385#comment-14251385
 ] 

Lefty Leverenz commented on HIVE-9148:
--

Does this mean the description of configuration parameter *hive.hwi.war.file* 
is wrong in HiveConf.java and the wiki?

{code}
HIVEHWIWARFILE(hive.hwi.war.file, ${env:HWI_WAR_FILE},
This sets the path to the HWI war file, relative to ${HIVE_HOME}. ),
{code}

* [Configuration Properties -- hive.hwi.war.file | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hwi.war.file]

 Fix default value for HWI_WAR_FILE
 --

 Key: HIVE-9148
 URL: https://issues.apache.org/jira/browse/HIVE-9148
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.14.0, 0.13.1
Reporter: Peter Slawski
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9148.1.patch


 The path to the hwi war file should be relative to hive home. However, 
 HWI_WAR_FILE is set in hwi.sh to be an absolute path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9004) Reset doesn't work for the default empty value entry


[ 
https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251401#comment-14251401
 ] 

Hive QA commented on HIVE-9004:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687952/HIVE-9004.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2123/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2123/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2123/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687952 - PreCommit-HIVE-TRUNK-Build

 Reset doesn't work for the default empty value entry
 

 Key: HIVE-9004
 URL: https://issues.apache.org/jira/browse/HIVE-9004
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Reporter: Cheng Hao
Assignee: Cheng Hao
 Fix For: spark-branch, 0.15.0, 0.14.1

 Attachments: HIVE-9004.patch


 To illustrate that:
 In hive cli:
 hive set hive.table.parameters.default;
 hive.table.parameters.default is undefined
 hive set hive.table.parameters.default=key1=value1;
 hive reset;
 hive set hive.table.parameters.default;
 hive.table.parameters.default=key1=value1
 I think we expect the last output as hive.table.parameters.default is 
 undefined



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251407#comment-14251407
 ] 

Rui Li commented on HIVE-9153:
--

I used our cluster B to test this. Results show that CombineHiveInputFormat 
still performs much better than HiveInputFormat for spark. The test query is 
{code}select count(*) from store_sales where ss_sold_date_sk is not null;{code}
With CombineHiveInputFormat spark spawns 1252 mappers and the query finishes in 
about 180s, while HiveInputFormat requires 13559 mappers and the query finishes 
in about 700s.
I didn't find why Tez uses HiveInputFormat as default. But for Tez, 
HiveInputFormat spawns 332 mappers while CombineHiveInputFormat spawns 1252. So 
I think Tez has its own way to combine the splits. With 332 mappers, Tez 
finishes the query in about 90s, and with 1252 mappers, it took about 120s.

 Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
 -

 Key: HIVE-9153
 URL: https://issues.apache.org/jira/browse/HIVE-9153
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Rui Li

 The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
 However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
 Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
 should evaluate this on a query which has many input splits such as {{select 
 count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]

2014-12-18 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9116:

Attachment: HIVE-9116.1-spark.patch

 Add unit test for multi sessions.[Spark Branch]
 ---

 Key: HIVE-9116
 URL: https://issues.apache.org/jira/browse/HIVE-9116
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9116.1-spark.patch


 HS2 multi sessions support is enabled in HoS, we should add some unit tests 
 for verification and regression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]

2014-12-18 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9116:

Status: Patch Available  (was: Open)

 Add unit test for multi sessions.[Spark Branch]
 ---

 Key: HIVE-9116
 URL: https://issues.apache.org/jira/browse/HIVE-9116
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9116.1-spark.patch


 HS2 multi sessions support is enabled in HoS, we should add some unit tests 
 for verification and regression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 29200: HIVE-9116 Add unit test for multi sessions on Spark.

2014-12-18 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29200/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-9116
https://issues.apache.org/jira/browse/HIVE-9116


Repository: hive-git


Description
---

test with multi sessions HS2 with multi thread jdbc connections.


Diffs
-

  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMultiSessionsHS2WithLocalClusterSpark.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/29200/diff/


Testing
---


Thanks,

chengxiang li

Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-18 Thread chengxiang li



 On Dec. 17, 2014, 7:06 p.m., Marcelo Vanzin wrote:
  +1 to Xuefu's comments. The config name also looks very generic, since it's 
  only applied to a couple of jobs submitted to the client. But I don't have 
  a good suggestion here.

While getExecutorCount/getJobInfo/getStageInfo, we use JobHandle.get() to wait 
result, so I use SPARK_CLIENT_FUTURE_TIMEOUT here, which means Hive would use 
this setting as timeout value while call JobHandle.get(), it seems more 
reasonable than previous name.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29145/#review65348
---


On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29145/
 ---
 
 (Updated Dec. 17, 2014, 6:28 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9094
 https://issues.apache.org/jira/browse/HIVE-9094
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has 
 not launched yet
 1. set the timeout value configurable.
 2. set default timeout value 60s.
 3. enable timeout for get spark job info and get spark stage info.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 5d6a02c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 e1946d5 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
  6217de4 
 
 Diff: https://reviews.apache.org/r/29145/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-18 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29145/
---

(Updated Dec. 18, 2014, 9:40 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

update patch, and the setting name/desc.


Bugs: HIVE-9094
https://issues.apache.org/jira/browse/HIVE-9094


Repository: hive-git


Description
---

RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has 
not launched yet
1. set the timeout value configurable.
2. set default timeout value 60s.
3. enable timeout for get spark job info and get spark stage info.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
5d6a02c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
256d0b0 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
 1d3a9d8 

Diff: https://reviews.apache.org/r/29145/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]

2014-12-18 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9094:

Attachment: HIVE-9094.2-spark.patch

update setting name and description.

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch


 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:176)
 at junit.framework.TestCase.runBare(TestCase.java:141)
 at junit.framework.TestResult$1.protect(TestResult.java:122)
 at junit.framework.TestResult.runProtected(TestResult.java:142)
 at junit.framework.TestResult.run(TestResult.java:125)
 at junit.framework.TestCase.run(TestCase.java:129)
 at junit.framework.TestSuite.runTest(TestSuite.java:255)
 at junit.framework.TestSuite.run(TestSuite.java:250)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)

[jira] [Resolved] (HIVE-9126) Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 0.14 branch

2014-12-18 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-9126.

Resolution: Duplicate

As suggested by [~thejas], committed HIVE-8827 to branch 14 instead.

 Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 
 0.14 branch
 -

 Key: HIVE-9126
 URL: https://issues.apache.org/jira/browse/HIVE-9126
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.1

 Attachments: HIVE-9126.1.patch


 Check HIVE-8827. NO PRECOMMIT TESTS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols

2014-12-18 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8827:
---
Fix Version/s: 0.14.1

Also committed to 14.1.

 Remove SSLv2Hello from list of disabled protocols
 -

 Key: HIVE-8827
 URL: https://issues.apache.org/jira/browse/HIVE-8827
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-8827.1.patch


 Turns out SSLv2Hello is not the same as SSLv2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url


[ 
https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251452#comment-14251452
 ] 

Hive QA commented on HIVE-9158:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687960/HIVE-9158.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2124/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2124/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2124/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687960 - PreCommit-HIVE-TRUNK-Build

 Multiple LDAP server URLs in hive.server2.authentication.ldap.url
 -

 Key: HIVE-9158
 URL: https://issues.apache.org/jira/browse/HIVE-9158
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Attachments: HIVE-9158.1.patch, LDAPClient.java


 Support for multiple LDAP servers for failover in the event that one stops 
 responding or is down for maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-18 Thread Kamil Gorlo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251495#comment-14251495
 ] 

Kamil Gorlo commented on HIVE-9146:
---

I've tested in on HDP 2.2 with Hive 0.14 and in fact everything is working as 
expected. Thanks.

 Query with left joins produces wrong result when join condition is written in 
 different order
 -

 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Kamil Gorlo

 I have two queries which should be equal (I only swap two join conditions) 
 but they are not. They are simplest queries I could produce to reproduce bug.
 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 And when I run this query (query no. 1):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
 {quote}
 I get result (which is correct):
 | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
 | 1   | 2| 1  | 1  |
 | 1   | 3| 1  | NULL   |
 | 1   | 5| NULL   | NULL   |
 | 3   | 1| NULL   | 1  |
 But when I run second query (query no. 2):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
 {quote}
 I get different (and bad, in my opinion) result:
 |log.id | log.dest_id | com1.msgs | com2.msgs|
 |1|2|1|1|
 |1|3|1|1|
 |1|5|NULL|NULL|
 |3|1|NULL|NULL|
 Query no. 1 and query no. 2 are different in only one place, it is second 
 join condition:
 bf. com2.dest_id=log.id and com2.id=log.dest_id
 vs
 bf. com2.id=log.dest_id and com2.dest_id=log.id
 which in my opinion are equal.
 Explains for both queries are of course slightly different (columns are 
 swapped) and they are here:
 https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
 https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251521#comment-14251521
 ] 

Hive QA commented on HIVE-9116:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687998/HIVE-9116.1-spark.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7238 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/572/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/572/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-572/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687998 - PreCommit-HIVE-SPARK-Build

 Add unit test for multi sessions.[Spark Branch]
 ---

 Key: HIVE-9116
 URL: https://issues.apache.org/jira/browse/HIVE-9116
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9116.1-spark.patch


 HS2 multi sessions support is enabled in HoS, we should add some unit tests 
 for verification and regression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8681) CBO: Column names are missing from join expression in Map join with CBO enabled


[ 
https://issues.apache.org/jira/browse/HIVE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251553#comment-14251553
 ] 

Hive QA commented on HIVE-8681:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687974/HIVE-8681.4.patch

{color:red}ERROR:{color} -1 due to 1120 failed/errored test(s), 6716 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2

[jira] [Commented] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251557#comment-14251557
 ] 

Rui Li commented on HIVE-8722:
--

I got this exception which also seems related:
{noformat}
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) - 14/12/18 12:25:18 DEBUG rdd.HadoopRDD: 
SplitLocationInfo and other new Hadoop classes are unavailable. Using the older 
Hadoop location info code.
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) - java.lang.ClassNotFoundException: 
org.apache.hadoop.mapred.InputSplitWithLocationInfo
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.security.AccessController.doPrivileged(Native Method)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at java.lang.Class.forName0(Native 
Method)
2014-12-18 12:25:18,399 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
java.lang.Class.forName(Class.java:190)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.init(HadoopRDD.scala:381)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:391)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD$.init(HadoopRDD.scala:390)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD$.clinit(HadoopRDD.scala)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:179)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:197)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
scala.Option.getOrElse(Option.scala:120)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:204)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
scala.Option.getOrElse(Option.scala:120)
2014-12-18 12:25:18,400 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(435)) -at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:204)
2014-12-18 12:25:18,400 INFO

[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251593#comment-14251593
 ] 

Hive QA commented on HIVE-9094:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12688000/HIVE-9094.2-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/573/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/573/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-573/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12688000 - PreCommit-HIVE-SPARK-Build

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch


 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

[jira] [Commented] (HIVE-9123) Query with join fails with NPE when using join auto conversion

2014-12-18 Thread Kamil Gorlo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251596#comment-14251596
 ] 

Kamil Gorlo commented on HIVE-9123:
---

I've tried in HDP 2.2 (with Hive 0.14.0.2.2.0.0-1084) and also cannot reproduce.

BUT, I 've also tried with HDP 2.1 (withi Hive 0.13.0.2.1.1.0-237) and also 
CANNOT reproduce.

So it looks that this issue is only (?) with CDH 5.2.1 (with Hive 
0.13.1-cdh5.2.1).

 Query with join fails with NPE when using join auto conversion
 --

 Key: HIVE-9123
 URL: https://issues.apache.org/jira/browse/HIVE-9123
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: CDH5 with Hive 0.13.1
Reporter: Kamil Gorlo

 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 Following query fails in second stage of execution:
 bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, 
 count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id 
 and com1.dest_id=v.dest_id;
 with following exception:
 {quote}
   2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
   java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {_col0:1,_col1:2}
   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at

[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251597#comment-14251597
 ] 

Rui Li commented on HIVE-9153:
--

Judging from the results, I think fewer mappers can improve overall 
performance, which is true for both spark and tez. Problem is that, why spark 
is 60s slower than tez with same # of mappers.
One possible reason is that we don't have data locality with 
CombineHiveInputFormat, which is tracked by HIVE-8722.
I also noticed that the parallelism drops during execution (attach a screenshot 
later). This may be due to the delay schedule mechanism of spark, which 
attempts to schedule tasks with some locality first.

 Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
 -

 Key: HIVE-9153
 URL: https://issues.apache.org/jira/browse/HIVE-9153
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: screenshot.PNG


 The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
 However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
 Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
 should evaluate this on a query which has many input splits such as {{select 
 count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9153:
-
Attachment: screenshot.PNG

 Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
 -

 Key: HIVE-9153
 URL: https://issues.apache.org/jira/browse/HIVE-9153
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: screenshot.PNG


 The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
 However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
 Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
 should evaluate this on a query which has many input splits such as {{select 
 count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9160) Suspicious comparing logic in LazyPrimitive


[ 
https://issues.apache.org/jira/browse/HIVE-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251624#comment-14251624
 ] 

Hive QA commented on HIVE-9160:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687981/HIVE-9160.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2126/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2126/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2126/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687981 - PreCommit-HIVE-TRUNK-Build

 Suspicious comparing logic in LazyPrimitive
 ---

 Key: HIVE-9160
 URL: https://issues.apache.org/jira/browse/HIVE-9160
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-9160.1.patch.txt


 {code}
   @Override
   public boolean equals(Object obj) {
 if (!(obj instanceof LazyPrimitive?, ?)) {
   return false;
 }
 if (data == obj) {
   return true;
 }
 if (data == null || obj == null) {
   return false;
 }
 return data.equals(((LazyPrimitive?, ?) obj).getWritableObject());
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251643#comment-14251643
 ] 

Rui Li commented on HIVE-8722:
--

Never mind my last comments. That's because I used hadoop-2.4 which doesn't 
have that class.

 Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
 ---

 Key: HIVE-8722
 URL: https://issues.apache.org/jira/browse/HIVE-8722
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang

 We got thie following exception in hive.log:
 {noformat}
 2014-11-03 11:45:49,865 DEBUG rdd.HadoopRDD
 (Logging.scala:logDebug(84)) - Failed to use InputSplitWithLocations.
 java.lang.ClassCastException: Cannot cast
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit
 to org.apache.hadoop.mapred.InputSplitWithLocationInfo
 at java.lang.Class.cast(Class.java:3094)
 at 
 org.apache.spark.rdd.HadoopRDD.getPreferredLocations(HadoopRDD.scala:278)
 at 
 org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216)
 at 
 org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:215)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1303)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1313)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1312)
 {noformat}
 My understanding is that the split location info helps Spark to execute tasks 
 more efficiently. This could help other execution engine too. So we should 
 consider to enhance InputSplitShim to implement InputSplitWithLocationInfo if 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-9123) Query with join fails with NPE when using join auto conversion


 [ 
https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-9123.

Resolution: Cannot Reproduce

 Query with join fails with NPE when using join auto conversion
 --

 Key: HIVE-9123
 URL: https://issues.apache.org/jira/browse/HIVE-9123
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: CDH5 with Hive 0.13.1
Reporter: Kamil Gorlo

 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 Following query fails in second stage of execution:
 bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, 
 count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id 
 and com1.dest_id=v.dest_id;
 with following exception:
 {quote}
   2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null
   java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {_col0:1,_col1:2}
   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at

[jira] [Resolved] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order


 [ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-9146.

   Resolution: Fixed
Fix Version/s: 0.14.0
 Assignee: Ashutosh Chauhan

Fixed via HIVE-8298

 Query with left joins produces wrong result when join condition is written in 
 different order
 -

 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.1
Reporter: Kamil Gorlo
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0


 I have two queries which should be equal (I only swap two join conditions) 
 but they are not. They are simplest queries I could produce to reproduce bug.
 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 And when I run this query (query no. 1):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
 {quote}
 I get result (which is correct):
 | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
 | 1   | 2| 1  | 1  |
 | 1   | 3| 1  | NULL   |
 | 1   | 5| NULL   | NULL   |
 | 3   | 1| NULL   | 1  |
 But when I run second query (query no. 2):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
 {quote}
 I get different (and bad, in my opinion) result:
 |log.id | log.dest_id | com1.msgs | com2.msgs|
 |1|2|1|1|
 |1|3|1|1|
 |1|5|NULL|NULL|
 |3|1|NULL|NULL|
 Query no. 1 and query no. 2 are different in only one place, it is second 
 join condition:
 bf. com2.dest_id=log.id and com2.id=log.dest_id
 vs
 bf. com2.id=log.dest_id and com2.dest_id=log.id
 which in my opinion are equal.
 Explains for both queries are of course slightly different (columns are 
 swapped) and they are here:
 https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
 https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order


 [ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9146:
---
Component/s: Logical Optimizer

 Query with left joins produces wrong result when join condition is written in 
 different order
 -

 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.1
Reporter: Kamil Gorlo
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0


 I have two queries which should be equal (I only swap two join conditions) 
 but they are not. They are simplest queries I could produce to reproduce bug.
 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 And when I run this query (query no. 1):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
 {quote}
 I get result (which is correct):
 | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
 | 1   | 2| 1  | 1  |
 | 1   | 3| 1  | NULL   |
 | 1   | 5| NULL   | NULL   |
 | 3   | 1| NULL   | 1  |
 But when I run second query (query no. 2):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
 {quote}
 I get different (and bad, in my opinion) result:
 |log.id | log.dest_id | com1.msgs | com2.msgs|
 |1|2|1|1|
 |1|3|1|1|
 |1|5|NULL|NULL|
 |3|1|NULL|NULL|
 Query no. 1 and query no. 2 are different in only one place, it is second 
 join condition:
 bf. com2.dest_id=log.id and com2.id=log.dest_id
 vs
 bf. com2.id=log.dest_id and com2.dest_id=log.id
 which in my opinion are equal.
 Explains for both queries are of course slightly different (columns are 
 swapped) and they are here:
 https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
 https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29145/#review65489
---

Ship it!


Ship It!

- Xuefu Zhang


On Dec. 18, 2014, 9:40 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29145/
 ---
 
 (Updated Dec. 18, 2014, 9:40 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9094
 https://issues.apache.org/jira/browse/HIVE-9094
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has 
 not launched yet
 1. set the timeout value configurable.
 2. set default timeout value 60s.
 3. enable timeout for get spark job info and get spark stage info.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 5d6a02c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 256d0b0 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
  1d3a9d8 
 
 Diff: https://reviews.apache.org/r/29145/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251667#comment-14251667
 ] 

Xuefu Zhang commented on HIVE-9094:
---

+1.

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch


 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:176)
 at junit.framework.TestCase.runBare(TestCase.java:141)
 at junit.framework.TestResult$1.protect(TestResult.java:122)
 at junit.framework.TestResult.runProtected(TestResult.java:142)
 at junit.framework.TestResult.run(TestResult.java:125)
 at junit.framework.TestCase.run(TestCase.java:129)
 at junit.framework.TestSuite.runTest(TestSuite.java:255)
 at junit.framework.TestSuite.run(TestSuite.java:250)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
 at

[jira] [Commented] (HIVE-9141) HiveOnTez: mix of union all, distinct, group by generates error


[ 
https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251668#comment-14251668
 ] 

Ashutosh Chauhan commented on HIVE-9141:


+1 This also fixes {{optimize_nullscan.q}} breakage introduced by HIVE-9053

 HiveOnTez: mix of union all, distinct, group by generates error
 ---

 Key: HIVE-9141
 URL: https://issues.apache.org/jira/browse/HIVE-9141
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Navis
 Attachments: HIVE-9141.1.patch.txt


 Here is the way to produce it:
 in Hive q test setting (with src table)
 set hive.execution.engine=tez;
 SELECT key, value FROM
   (
   SELECT key, value FROM src
 UNION ALL
   SELECT key, key as value FROM 
   
   (  
   SELECT distinct key FROM (
   SELECT key, value FROM
   (SELECT key, value FROM src
   UNION ALL
   SELECT key, value FROM src
   )t1 
   group by  key, value
   )t2
 )t3 
   
)t4
group by  key, value;
 will generate
 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - 
 FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be 
 cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot 
 be cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
 at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9141) HiveOnTez: mix of union all, distinct, group by generates error


[ 
https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251668#comment-14251668
 ] 

Ashutosh Chauhan edited comment on HIVE-9141 at 12/18/14 2:07 PM:
--

+1 This also fixes {{optimize_nullscan.q}} breakage introduced by HIVE-9055


was (Author: ashutoshc):
+1 This also fixes {{optimize_nullscan.q}} breakage introduced by HIVE-9053

 HiveOnTez: mix of union all, distinct, group by generates error
 ---

 Key: HIVE-9141
 URL: https://issues.apache.org/jira/browse/HIVE-9141
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Navis
 Attachments: HIVE-9141.1.patch.txt


 Here is the way to produce it:
 in Hive q test setting (with src table)
 set hive.execution.engine=tez;
 SELECT key, value FROM
   (
   SELECT key, value FROM src
 UNION ALL
   SELECT key, key as value FROM 
   
   (  
   SELECT distinct key FROM (
   SELECT key, value FROM
   (SELECT key, value FROM src
   UNION ALL
   SELECT key, value FROM src
   )t1 
   group by  key, value
   )t2
 )t3 
   
)t4
group by  key, value;
 will generate
 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - 
 FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be 
 cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot 
 be cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
 at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8920:
--
Summary: IOContext problem with multiple MapWorks cloned for multi-insert 
[Spark Branch]  (was: SplitSparkWorkResolver doesn't work with UnionWork [Spark 
Branch])

 IOContext problem with multiple MapWorks cloned for multi-insert [Spark 
 Branch]
 ---

 Key: HIVE-8920
 URL: https://issues.apache.org/jira/browse/HIVE-8920
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Xuefu Zhang
 Attachments: HIVE-8920.1-spark.patch


 The following query will not work:
 {code}
 from (select * from table0 union all select * from table1) s
 insert overwrite table table3 select s.x, count(1) group by s.x
 insert overwrite table table4 select s.y, count(1) group by s.y;
 {code}
 Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
 below:
 {noformat}
M1M2
  \  / \
   U3   R5
   |
   R4
 {noformat}
 In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the 
 {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the 
 childWork could be UnionWork U3. Thus, the code will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8920:
--
Description: 
The following query will not work:
{code}
from (select * from table0 union all select * from table1) s
insert overwrite table table3 select s.x, count(1) group by s.x
insert overwrite table table4 select s.y, count(1) group by s.y;
{code}

Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
below:

{noformat}
   M1M2
 \  / \
  U3   R5
  |
  R4
{noformat}

In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} 
is a ReduceWork, but for this case, you can see that for M2 the childWork could 
be UnionWork U3. Thus, the code will fail.

HIVE-9041 addressed partially addressed the problem by removing union task. 
However, it's still necessary to cloning M1 and M2 to support multi-insert. 
Because M1 and M2 can run in a single JVM, the original solution of storing a 
global IOContext will not work because M1 and M2 have different io contexts, 
both needing to be stored.


  was:
The following query will not work:
{code}
from (select * from table0 union all select * from table1) s
insert overwrite table table3 select s.x, count(1) group by s.x
insert overwrite table table4 select s.y, count(1) group by s.y;
{code}

Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
below:

{noformat}
   M1M2
 \  / \
  U3   R5
  |
  R4
{noformat}

In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} 
is a ReduceWork, but for this case, you can see that for M2 the childWork could 
be UnionWork U3. Thus, the code will fail.




 IOContext problem with multiple MapWorks cloned for multi-insert [Spark 
 Branch]
 ---

 Key: HIVE-8920
 URL: https://issues.apache.org/jira/browse/HIVE-8920
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Xuefu Zhang
 Attachments: HIVE-8920.1-spark.patch


 The following query will not work:
 {code}
 from (select * from table0 union all select * from table1) s
 insert overwrite table table3 select s.x, count(1) group by s.x
 insert overwrite table table4 select s.y, count(1) group by s.y;
 {code}
 Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
 below:
 {noformat}
M1M2
  \  / \
   U3   R5
   |
   R4
 {noformat}
 In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the 
 {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the 
 childWork could be UnionWork U3. Thus, the code will fail.
 HIVE-9041 addressed partially addressed the problem by removing union task. 
 However, it's still necessary to cloning M1 and M2 to support multi-insert. 
 Because M1 and M2 can run in a single JVM, the original solution of storing a 
 global IOContext will not work because M1 and M2 have different io contexts, 
 both needing to be stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 29205: HIVE-8920: IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29205/
---

Review request for hive and Chao Sun.


Bugs: HIVE-8920
https://issues.apache.org/jira/browse/HIVE-8920


Repository: hive-git


Description
---

See bug description. Patch in HIVE-9084 is included here.


Diffs
-

  itests/src/test/resources/testconfiguration.properties fd732c1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
46894ac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0bd18e0 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
b4c2c1f 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5ba6612 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java
 67dda02 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 1efbb12 
  ql/src/test/queries/clientpositive/multi_insert_union_src.q PRE-CREATION 
  ql/src/test/results/clientpositive/multi_insert_union_src.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_union_src.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/29205/diff/


Testing
---

Added a new qtest.


Thanks,

Xuefu Zhang

[jira] [Commented] (HIVE-9133) CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out


[ 
https://issues.apache.org/jira/browse/HIVE-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251689#comment-14251689
 ] 

Hive QA commented on HIVE-9133:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687985/HIVE-9133.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2127/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2127/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2127/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687985 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out 
 ---

 Key: HIVE-9133
 URL: https://issues.apache.org/jira/browse/HIVE-9133
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.15.0

 Attachments: HIVE-9133.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29205: HIVE-8920: IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29205/
---

(Updated Dec. 18, 2014, 2:37 p.m.)


Review request for hive and Chao Sun.


Bugs: HIVE-8920
https://issues.apache.org/jira/browse/HIVE-8920


Repository: hive-git


Description
---

See bug description. Patch in HIVE-9084 is included here.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties fd732c1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
46894ac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0bd18e0 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
b4c2c1f 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5ba6612 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java
 67dda02 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 1efbb12 
  ql/src/test/queries/clientpositive/multi_insert_union_src.q PRE-CREATION 
  ql/src/test/results/clientpositive/multi_insert_union_src.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_union_src.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/29205/diff/


Testing
---

Added a new qtest.


Thanks,

Xuefu Zhang

[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8920:
--
Attachment: HIVE-8920.2-spark.patch

Patch #2 correct some code styling issue.

 IOContext problem with multiple MapWorks cloned for multi-insert [Spark 
 Branch]
 ---

 Key: HIVE-8920
 URL: https://issues.apache.org/jira/browse/HIVE-8920
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Xuefu Zhang
 Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch


 The following query will not work:
 {code}
 from (select * from table0 union all select * from table1) s
 insert overwrite table table3 select s.x, count(1) group by s.x
 insert overwrite table table4 select s.y, count(1) group by s.y;
 {code}
 Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
 below:
 {noformat}
M1M2
  \  / \
   U3   R5
   |
   R4
 {noformat}
 In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the 
 {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the 
 childWork could be UnionWork U3. Thus, the code will fail.
 HIVE-9041 addressed partially addressed the problem by removing union task. 
 However, it's still necessary to cloning M1 and M2 to support multi-insert. 
 Because M1 and M2 can run in a single JVM, the original solution of storing a 
 global IOContext will not work because M1 and M2 have different io contexts, 
 both needing to be stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8722:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-9134

 Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
 ---

 Key: HIVE-8722
 URL: https://issues.apache.org/jira/browse/HIVE-8722
 Project: Hive
  Issue Type: Sub-task
Reporter: Jimmy Xiang

 We got thie following exception in hive.log:
 {noformat}
 2014-11-03 11:45:49,865 DEBUG rdd.HadoopRDD
 (Logging.scala:logDebug(84)) - Failed to use InputSplitWithLocations.
 java.lang.ClassCastException: Cannot cast
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit
 to org.apache.hadoop.mapred.InputSplitWithLocationInfo
 at java.lang.Class.cast(Class.java:3094)
 at 
 org.apache.spark.rdd.HadoopRDD.getPreferredLocations(HadoopRDD.scala:278)
 at 
 org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216)
 at 
 org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:215)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1303)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1313)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1312)
 {noformat}
 My understanding is that the split location info helps Spark to execute tasks 
 more efficiently. This could help other execution engine too. So we should 
 consider to enhance InputSplitShim to implement InputSplitWithLocationInfo if 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7122) Storage format for create like table

2014-12-18 Thread Vasanth kumar RJ (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251751#comment-14251751
 ] 

Vasanth kumar RJ commented on HIVE-7122:


Hi [~Prabhu Joseph],
sorry for late reply. In CTAS restriction says target table cannot be 
partitioned and external. Like table allows to create similar table and 
external as well.

 Storage format for create like table
 

 Key: HIVE-7122
 URL: https://issues.apache.org/jira/browse/HIVE-7122
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Vasanth kumar RJ
Assignee: Vasanth kumar RJ
 Attachments: HIVE-7122.patch


 Using create like table user can specify the table storage format.
 Example:
 create table table1 like table2 stored as ORC;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251755#comment-14251755
 ] 

Xuefu Zhang commented on HIVE-9153:
---

Thanks for the findings, [~lirui]. I heard that the spark snapshot we are using 
is 2X slower than previous version. this might explain the slowness. Also, I 
think the number of mappers and locality matter in speed, but the two may 
collide with each other. For instance, if we have more executors than mappers, 
it's desirable to have more map tasks. However, doing so might impact locality 
because some mappers might read remotely. On the other hand, if there are more 
mappers than executors, then few mappers will help the speed.

Any way, it would be good to find out how Tez generates splits using 
HiveInputFormat. Also, we should fix HIVE-8722. Is there a way to disable 
Spark's delayed schedule to try out?

 Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
 -

 Key: HIVE-9153
 URL: https://issues.apache.org/jira/browse/HIVE-9153
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: screenshot.PNG


 The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
 However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
 Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
 should evaluate this on a query which has many input splits such as {{select 
 count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29200: HIVE-9116 Add unit test for multi sessions on Spark.


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29200/#review65494
---

Ship it!


Ship It!

- Xuefu Zhang


On Dec. 18, 2014, 9:14 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29200/
 ---
 
 (Updated Dec. 18, 2014, 9:14 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9116
 https://issues.apache.org/jira/browse/HIVE-9116
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 test with multi sessions HS2 with multi thread jdbc connections.
 
 
 Diffs
 -
 
   
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMultiSessionsHS2WithLocalClusterSpark.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29200/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251760#comment-14251760
 ] 

Xuefu Zhang commented on HIVE-9116:
---

+1

 Add unit test for multi sessions.[Spark Branch]
 ---

 Key: HIVE-9116
 URL: https://issues.apache.org/jira/browse/HIVE-9116
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9116.1-spark.patch


 HS2 multi sessions support is enabled in HoS, we should add some unit tests 
 for verification and regression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29198: HIVE-9136 - Profile query compiler [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29198/#review65497
---

Ship it!


Ship It!

- Xuefu Zhang


On Dec. 18, 2014, 7:36 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29198/
 ---
 
 (Updated Dec. 18, 2014, 7:36 a.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-9136
 https://issues.apache.org/jira/browse/HIVE-9136
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Please check out the JIRA for a correspondence between Spark  Tez log names.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
 1e0a749 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
 46894ac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 3f23541 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 215d53f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
 a5d73a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
  a9fbf6c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 362072f 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
 90a2f9e 
   ql/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 4e2b130 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 b6a7ac2 
 
 Diff: https://reviews.apache.org/r/29198/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251798#comment-14251798
 ] 

Xuefu Zhang commented on HIVE-9136:
---

Patch looks good to me. However, in SparkCompiler, we are only measuring time 
to generate task tree. We should also gauge the time spent on logical 
optimization as well as physical optimization. This can be addressed in a 
followup JIRA though.

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, 
 HIVE-9136.2-spark.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9154) Cache pathToPartitionInfo in context aware record reader


[ 
https://issues.apache.org/jira/browse/HIVE-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251820#comment-14251820
 ] 

Xuefu Zhang commented on HIVE-9154:
---

+1

 Cache pathToPartitionInfo in context aware record reader
 

 Key: HIVE-9154
 URL: https://issues.apache.org/jira/browse/HIVE-9154
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: HIVE-9154.1-spark.patch, HIVE-9154.2.patch


 This is similar to HIVE-9127.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9006) hiveserver thrift api version is still 6


[ 
https://issues.apache.org/jira/browse/HIVE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251823#comment-14251823
 ] 

Hive QA commented on HIVE-9006:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687986/HIVE-9006.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2128/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2128/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2128/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687986 - PreCommit-HIVE-TRUNK-Build

 hiveserver thrift api version is still 6
 

 Key: HIVE-9006
 URL: https://issues.apache.org/jira/browse/HIVE-9006
 Project: Hive
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HIVE-9006.1.patch, HIVE-9006.2.patch


 Look at the TCLIService.thrift, when open session, the protocol version info 
 is still v6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251886#comment-14251886
 ] 

Hive QA commented on HIVE-8920:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12688043/HIVE-8920.2-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7237 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/574/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/574/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-574/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12688043 - PreCommit-HIVE-SPARK-Build

 IOContext problem with multiple MapWorks cloned for multi-insert [Spark 
 Branch]
 ---

 Key: HIVE-8920
 URL: https://issues.apache.org/jira/browse/HIVE-8920
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Xuefu Zhang
 Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch


 The following query will not work:
 {code}
 from (select * from table0 union all select * from table1) s
 insert overwrite table table3 select s.x, count(1) group by s.x
 insert overwrite table table4 select s.y, count(1) group by s.y;
 {code}
 Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
 below:
 {noformat}
M1M2
  \  / \
   U3   R5
   |
   R4
 {noformat}
 In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the 
 {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the 
 childWork could be UnionWork U3. Thus, the code will fail.
 HIVE-9041 addressed partially addressed the problem by removing union task. 
 However, it's still necessary to cloning M1 and M2 to support multi-insert. 
 Because M1 and M2 can run in a single JVM, the original solution of storing a 
 global IOContext will not work because M1 and M2 have different io contexts, 
 both needing to be stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9148) Fix default value for HWI_WAR_FILE

2014-12-18 Thread Peter Slawski (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251912#comment-14251912
 ] 

Peter Slawski commented on HIVE-9148:
-

No, the description is correct as *hive.hwi.war.file* is assumed to be relative 
to *$HIVE_HOME* in HWIServer.java. *$HWI_WAR_FILE* is being set wrongly in 
[hwi.sh|https://github.com/apache/hive/blob/b8250ac2f30539f6b23ce80a20a9e338d3d31458/bin/ext/hwi.sh#L29].
 So if you didn't overwrote *hive.hwi.war.file* in hive-site.xml, the path to 
the HWI war file would be wrong.

From 
[hwi/src/java/org/apache/hadoop/hive/hwi/HWIServer.java:77|https://github.com/apache/hive/blob/release-0.14.0/hwi/src/java/org/apache/hadoop/hive/hwi/HWIServer.java#L77]
{code:java}
String hwiWAR = conf.getVar(HiveConf.ConfVars.HIVEHWIWARFILE);
String hivehome = System.getenv().get(HIVE_HOME);
File hwiWARFile = new File(hivehome, hwiWAR);
{code}



 Fix default value for HWI_WAR_FILE
 --

 Key: HIVE-9148
 URL: https://issues.apache.org/jira/browse/HIVE-9148
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.14.0, 0.13.1
Reporter: Peter Slawski
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9148.1.patch


 The path to the hwi war file should be relative to hive home. However, 
 HWI_WAR_FILE is set in hwi.sh to be an absolute path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7081) HiveServer/HiveServer2 leaks jdbc connections when network interrupt


[ 
https://issues.apache.org/jira/browse/HIVE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251932#comment-14251932
 ] 

Brock Noland commented on HIVE-7081:


We upgraded to 0.9.2 already: 
https://github.com/apache/hive/blob/trunk/pom.xml#L141

 HiveServer/HiveServer2 leaks jdbc connections when network  interrupt
 -

 Key: HIVE-7081
 URL: https://issues.apache.org/jira/browse/HIVE-7081
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.12.0, 0.13.0
 Environment: hadoop 1.2.1
 hive 0.12.0 / hive 0.13.0
 linux 2.6.32
Reporter: Wang Zhiqiang
  Labels: ConnectoinLeak, HiveServer2, JDBC

 HiveServer/HiveServer2 leaks jdbc connections when network between client and 
 server is interrupted。
 I test both use DBVisualizer and write JDBC code，when the network between 
 client and hiveserver/hiverserver2 is interrupted，the tcp connection in the 
 server side will be in ESTABLISH state forever util the server is stoped。By 
 Using jstack to print out server's thread, I found thread is doing 
 socketRead0()。
 {quote}
 pool-1-thread-13 prio=10 tid=0x7fd00c0c6800 nid=0x5d21 runnable 
 [0x7fd00018]
java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - locked 0xebc24f28 (a java.io.BufferedInputStream)
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745) 
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7081) HiveServer/HiveServer2 leaks jdbc connections when network interrupt

2014-12-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251947#comment-14251947
 ] 

Thejas M Nair commented on HIVE-7081:
-

Yes, to be more specific - trunk has a fix, 0.14 release does not have the fix.


 HiveServer/HiveServer2 leaks jdbc connections when network  interrupt
 -

 Key: HIVE-7081
 URL: https://issues.apache.org/jira/browse/HIVE-7081
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.12.0, 0.13.0
 Environment: hadoop 1.2.1
 hive 0.12.0 / hive 0.13.0
 linux 2.6.32
Reporter: Wang Zhiqiang
  Labels: ConnectoinLeak, HiveServer2, JDBC

 HiveServer/HiveServer2 leaks jdbc connections when network between client and 
 server is interrupted。
 I test both use DBVisualizer and write JDBC code，when the network between 
 client and hiveserver/hiverserver2 is interrupted，the tcp connection in the 
 server side will be in ESTABLISH state forever util the server is stoped。By 
 Using jstack to print out server's thread, I found thread is doing 
 socketRead0()。
 {quote}
 pool-1-thread-13 prio=10 tid=0x7fd00c0c6800 nid=0x5d21 runnable 
 [0x7fd00018]
java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - locked 0xebc24f28 (a java.io.BufferedInputStream)
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745) 
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9161) Fix ordering differences on UDF functions due to Java8

Sergio Peña created HIVE-9161:
-

 Summary: Fix ordering differences on UDF functions due to Java8
 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña


Java 8 uses a different hash function for HashMap, which is leading to 
iteration order differences in several cases. (See Java8 vs Java7)

This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


 [ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9161:
--
Status: Patch Available  (was: Open)

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


 [ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9161:
--
Attachment: HIVE-9161.1.patch

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]

2014-12-18 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251974#comment-14251974
 ] 

Chao commented on HIVE-9136:


Sure, we can do that as follow-up.

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, 
 HIVE-9136.2-spark.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9116:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

committed to spark branch. Thanks, Chengxiang.

 Add unit test for multi sessions.[Spark Branch]
 ---

 Key: HIVE-9116
 URL: https://issues.apache.org/jira/browse/HIVE-9116
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Fix For: spark-branch

 Attachments: HIVE-9116.1-spark.patch


 HS2 multi sessions support is enabled in HoS, we should add some unit tests 
 for verification and regression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251984#comment-14251984
 ] 

Xuefu Zhang commented on HIVE-9136:
---

+1. [~csun], please create the JIRA and link it with this one. Thanks.

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, 
 HIVE-9136.2-spark.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9136:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Chao.

 Profile query compiler [Spark Branch]
 -

 Key: HIVE-9136
 URL: https://issues.apache.org/jira/browse/HIVE-9136
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, 
 HIVE-9136.2-spark.patch


 We should put some performance counters around the compiler and evaluate how 
 long it takes to compile a query in Spark versus the other execution 
 frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-12-18 Thread Jonathan Bender (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251986#comment-14251986
 ] 

Jonathan Bender commented on HIVE-7049:
---

Seems like we can get away with the following patch (confirm the fileSchema AKA 
writer's schema is actually a union type before trying to find the type that 
the reader schema expects).  If not, just use the schema as is (it should be 
promoted to a union by Avro).

This worked for me in local testing.

```diff --git 
a/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
b/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
index ce933ff..032761c 100644
--- 
a/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
+++ 
b/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
@@ -265,9 +265,12 @@ private Object deserializeNullableUnion(Object datum, 
Schema fileSchema, Schema
 if(schema.getType().equals(Schema.Type.NULL)) {
   return null;
 }
+Schema writerSchema = fileSchema;
+if (writerSchema != null  
writerSchema.getType().equals(Schema.Type.UNION)) {
+  writerSchema = writerSchema.getTypes().get(tag);  
+}
 
-return worker(datum, fileSchema == null ? null : 
fileSchema.getTypes().get(tag), schema,
-SchemaToTypeInfo.generateTypeInfo(schema));
+return worker(datum, writerSchema, schema, 
SchemaToTypeInfo.generateTypeInfo(schema));
 
   }
 ```

 Unable to deserialize AVRO data when file schema and record schema are 
 different and nullable
 -

 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-7049.1.patch


 It mainly happens when 
 1 )file schema and record schema are not same
 2 ) Record schema is nullable  but file schema is not.
 The potential code location is at class AvroDeserialize
  
 {noformat}
  if(AvroSerdeUtils.isNullableType(recordSchema)) {
   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
 columnType);
 }
 {noformat}
 In the above code snippet, recordSchema is verified if it is nullable. But 
 the file schema is not checked.
 I tested with these values:
 {noformat}
 recordSchema= [null,string]
 fielSchema= string
 {noformat}
 And i got the following exception line numbers might not be the same due to 
 mu debugged code version.
 {noformat}
 org.apache.avro.AvroRuntimeException: Not a union: string 
 at org.apache.avro.Schema.getTypes(Schema.java:272)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9131) MiniTez optimize_nullscan test is unstable


[ 
https://issues.apache.org/jira/browse/HIVE-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252022#comment-14252022
 ] 

Vikram Dixit K commented on HIVE-9131:
--

This test had failed otherwise as well when I ran it without HIVE-9055. But 
this stack trace missed my attention. This will be fixed as part of HIVE-9141. 
Sorry for the trouble.

 MiniTez optimize_nullscan test is unstable
 --

 Key: HIVE-9131
 URL: https://issues.apache.org/jira/browse/HIVE-9131
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin

 Sometimes fails with:
 {noformat}
 2014-12-16 11:55:04,139 ERROR ql.Driver (SessionState.java:printError(834)) - 
 FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be 
 cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot 
 be cast to org.apache.hadoop.hive.ql.plan.ReduceWork
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
   at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
   at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
   at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
   at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan(TestMiniTezCliDriver.java:120)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-9131) MiniTez optimize_nullscan test is unstable


 [ 
https://issues.apache.org/jira/browse/HIVE-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K resolved HIVE-9131.
--
Resolution: Duplicate

 MiniTez optimize_nullscan test is unstable
 --

 Key: HIVE-9131
 URL: https://issues.apache.org/jira/browse/HIVE-9131
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin

 Sometimes fails with:
 {noformat}
 2014-12-16 11:55:04,139 ERROR ql.Driver (SessionState.java:printError(834)) - 
 FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be 
 cast to org.apache.hadoop.hive.ql.plan.ReduceWork
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot 
 be cast to org.apache.hadoop.hive.ql.plan.ReduceWork
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
   at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
   at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
   at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
   at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
   at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan(TestMiniTezCliDriver.java:120)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9055) Tez: union all followed by group by followed by another union all gives error


 [ 
https://issues.apache.org/jira/browse/HIVE-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-9055:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Tez: union all followed by group by followed by another union all gives error
 -

 Key: HIVE-9055
 URL: https://issues.apache.org/jira/browse/HIVE-9055
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Vikram Dixit K
 Attachments: HIVE-9055.1.patch, HIVE-9055.2.patch, HIVE-9055.3.patch


 Here is the way to produce it:
 in Hive q test setting (with src table)
 set hive.execution.engine=tez;
 select key from 
 (
 select key from src
 union all 
 select key from src
 ) tab group by key
 union all
 select key from src;
 will give you
 ERROR
 2014-12-09 11:38:48,316 ERROR ql.Driver (SessionState.java:printError(834)) - 
 FAILED: IndexOutOfBoundsException Index: -1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: -1, Size: 1
 at java.util.LinkedList.checkElementIndex(LinkedList.java:553)
 at java.util.LinkedList.get(LinkedList.java:474)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:354)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
 at 
 org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
 at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
 at 
 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez(TestMiniTezCliDriver.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 btw: there is not problem when it is run with MR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9127:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Brock.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16

[jira] [Commented] (HIVE-6892) Permission inheritance issues


[ 
https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252036#comment-14252036
 ] 

Szehon Ho commented on HIVE-6892:
-

Strange, I thought I added a link from Storage Based Authorization, but I 
must have forgotten to save it.  I'll try to add it and remove the label.

 Permission inheritance issues
 -

 Key: HIVE-6892
 URL: https://issues.apache.org/jira/browse/HIVE-6892
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC14

 *HDFS Background*
 * When a file or directory is created, its owner is the user identity of the 
 client process, and its group is inherited from parent (the BSD rule).  
 Permissions are taken from default umask.  Extended Acl's are taken from 
 parent unless they are set explicitly.
 *Goals*
 To reduce need to set fine-grain file security props after every operation, 
 users may want the following Hive warehouse file/dir to auto-inherit security 
 properties from their directory parents:
 * Directories created by new database/table/partition/bucket
 * Files added to tables via load/insert
 * Table directories exported/imported  (open question of whether exported 
 table inheriting perm from new parent needs another flag)
 What may be inherited:
 * Basic file permission
 * Groups (already done by HDFS for new directories)
 * Extended ACL's (already done by HDFS for new directories)
 *Behavior*
 * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive 
 will try to do all above inheritances.  In the future, we can add more flags 
 for more finer-grained control.
 * Failure by Hive to inherit will not cause operation to fail.  Rule of thumb 
 of when security-prop inheritance will happen is the following:
 ** To run chmod, a user must be the owner of the file, or else a super-user.
 ** To run chgrp, a user must be the owner of files, or else a super-user.
 ** Hence, user that hive runs as (either 'hive' or the logged-in user in case 
 of impersonation), must be super-user or owner of the file whose security 
 properties are going to be changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9094:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Chengxiang.

 TimeoutException when trying get executor count from RSC [Spark Branch]
 ---

 Key: HIVE-9094
 URL: https://issues.apache.org/jira/browse/HIVE-9094
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch


 In 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
  join25.q failed because:
 {code}
 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
 (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
 spark memory/core info: java.util.concurrent.TimeoutException
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
 memory/core info: java.util.concurrent.TimeoutException
 at 
 org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
 at 
 org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
 at 
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:176)
 at junit.framework.TestCase.runBare(TestCase.java:141)
 at junit.framework.TestResult$1.protect(TestResult.java:122)
 at junit.framework.TestResult.runProtected(TestResult.java:142)
 at junit.framework.TestResult.run(TestResult.java:125)
 at junit.framework.TestCase.run(TestCase.java:129)
 at junit.framework.TestSuite.runTest(TestSuite.java:255)
 at junit.framework.TestSuite.run(TestSuite.java:250)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at

[jira] [Commented] (HIVE-9004) Reset doesn't work for the default empty value entry


[ 
https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252047#comment-14252047
 ] 

Szehon Ho commented on HIVE-9004:
-

Thanks.  Looks good to me, +1

 Reset doesn't work for the default empty value entry
 

 Key: HIVE-9004
 URL: https://issues.apache.org/jira/browse/HIVE-9004
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Reporter: Cheng Hao
Assignee: Cheng Hao
 Fix For: spark-branch, 0.15.0, 0.14.1

 Attachments: HIVE-9004.patch


 To illustrate that:
 In hive cli:
 hive set hive.table.parameters.default;
 hive.table.parameters.default is undefined
 hive set hive.table.parameters.default=key1=value1;
 hive reset;
 hive set hive.table.parameters.default;
 hive.table.parameters.default=key1=value1
 I think we expect the last output as hive.table.parameters.default is 
 undefined



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9162) stats19 test is environment-dependant

Sergey Shelukhin created HIVE-9162:
--

 Summary: stats19 test is environment-dependant
 Key: HIVE-9162
 URL: https://issues.apache.org/jira/browse/HIVE-9162
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Priority: Minor


This is a very annoying test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9162) stats19 test is environment-dependant


 [ 
https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-9162:
--

Assignee: Sergey Shelukhin

 stats19 test is environment-dependant
 -

 Key: HIVE-9162
 URL: https://issues.apache.org/jira/browse/HIVE-9162
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.15.0, 0.14.1


 This is a very annoying test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9162) stats19 test is environment-dependant


 [ 
https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9162:
---
Fix Version/s: 0.14.1
   0.15.0

 stats19 test is environment-dependant
 -

 Key: HIVE-9162
 URL: https://issues.apache.org/jira/browse/HIVE-9162
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.15.0, 0.14.1


 This is a very annoying test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


[ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252059#comment-14252059
 ] 

Xuefu Zhang commented on HIVE-9127:
---

Spark patch is also committed to Spark branch.

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]:

[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance


 [ 
https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9127:
--
Fix Version/s: spark-branch

 Improve CombineHiveInputFormat.getSplit performance
 ---

 Key: HIVE-9127
 URL: https://issues.apache.org/jira/browse/HIVE-9127
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch, 0.15.0

 Attachments: HIVE-9127.1-spark.patch.txt, 
 HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt


 In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would 
 fail. However, we should be able to cache these objects in RSC for split 
 generation. See: 
 https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622
  how this impacts performance.
 Caller ST:
 {noformat}
 
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,202 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.ShuffleDependency.init(Dependency.scala:79)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 scala.Option.getOrElse(Option.scala:120)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.rdd.RDD.dependencies(RDD.scala:190)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -at 
 org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301)
 2014-12-16 14:36:22,203 INFO  [stdout-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(435)) -

[jira] [Commented] (HIVE-9133) CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out


[ 
https://issues.apache.org/jira/browse/HIVE-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252071#comment-14252071
 ] 

Sergey Shelukhin commented on HIVE-9133:


Left some partial comments on RB.
Overall comment - is it possible to minimize the use of semanticAnalyzer, and 
esp. its fields and setters? Even if results in redundant args. If we cannot 
avoid dependency completely at least we should limit it to some logical 
methods...

 CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out 
 ---

 Key: HIVE-9133
 URL: https://issues.apache.org/jira/browse/HIVE-9133
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.15.0

 Attachments: HIVE-9133.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252076#comment-14252076
]

Szehon Ho commented on HIVE-8639:
-

[~brocknoland] yes there are tests that still do. The triggering factor is
whether the tests have hive.auto.convert.sortmerge.join.to.mapjoin turned on.
For example, all the auto_sortmerge_.* tests have at least one part that runs
SMB join before that flag is turned on.

[~xuefuz] can you review when you get a chance?
Test failures seem unrelated. I looked at join32_lessSize, it seems caused by
a TimeoutException in spark client's RPC layer.

{noformat}
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for client
connection.
at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer.java:125)
{noformat}

Convert SMBJoin to MapJoin [Spark Branch]
-

Key: HIVE-8639
URL: https://issues.apache.org/jira/browse/HIVE-8639
Project: Hive
Issue Type: Sub-task
Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch,
HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch

HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are
partitioned, there could be a slow down as each mapper would need to get a
very small chunk of a partition which has a single key. Thus, in some
scenarios it's beneficial to convert SMB join to map join.
The task is to research and support the conversion from SMB join to map join
for Spark execution engine. See the equivalent of MapReduce in
SortMergeJoinResolver.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252086#comment-14252086
 ] 

Xuefu Zhang commented on HIVE-8639:
---

[~szehon], I'm reviewing at the moment. Thanks.

 Convert SMBJoin to MapJoin [Spark Branch]
 -

 Key: HIVE-8639
 URL: https://issues.apache.org/jira/browse/HIVE-8639
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, 
 HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch


 HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
 partitioned, there could be a slow down as each mapper would need to get a 
 very small chunk of a partition which has a single key. Thus, in some 
 scenarios it's beneficial to convert SMB join to map join.
 The task is to research and support the conversion from SMB join to map join 
 for Spark execution engine.  See the equivalent of MapReduce in 
 SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9006) hiveserver thrift api version is still 6


[ 
https://issues.apache.org/jira/browse/HIVE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252092#comment-14252092
 ] 

Szehon Ho commented on HIVE-9006:
-

Thanks, +1

 hiveserver thrift api version is still 6
 

 Key: HIVE-9006
 URL: https://issues.apache.org/jira/browse/HIVE-9006
 Project: Hive
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HIVE-9006.1.patch, HIVE-9006.2.patch


 Look at the TCLIService.thrift, when open session, the protocol version info 
 is still v6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9162) stats19 test is environment-dependant


 [ 
https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9162:
---
Attachment: HIVE-9162.patch

Simple q file change. [~jpullokkaran] can you take a look?
Comment in q file says set prefix to high value so path doesn't have to be 
hashed, but value is too low for some environments and it still gets hashed.
[~vikram.dixit] ok for 14?

 stats19 test is environment-dependant
 -

 Key: HIVE-9162
 URL: https://issues.apache.org/jira/browse/HIVE-9162
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9162.patch


 This is a very annoying test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9162) stats19 test is environment-dependant


 [ 
https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9162:
---
Status: Patch Available  (was: Open)

 stats19 test is environment-dependant
 -

 Key: HIVE-9162
 URL: https://issues.apache.org/jira/browse/HIVE-9162
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9162.patch


 This is a very annoying test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8395) CBO: enable by default


[ 
https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252137#comment-14252137
 ] 

Sergey Shelukhin commented on HIVE-8395:


modified

 CBO: enable by default
 --

 Key: HIVE-8395
 URL: https://issues.apache.org/jira/browse/HIVE-8395
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: HIVE-8395-27-28-delta.patch, 
 HIVE-8395-28-29-delta.patch, HIVE-8395.01.patch, HIVE-8395.02.patch, 
 HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, 
 HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, 
 HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.11.patch, 
 HIVE-8395.12.patch, HIVE-8395.12.patch, HIVE-8395.13.patch, 
 HIVE-8395.13.patch, HIVE-8395.14.patch, HIVE-8395.15.patch, 
 HIVE-8395.16.patch, HIVE-8395.17.patch, HIVE-8395.18.patch, 
 HIVE-8395.18.patch, HIVE-8395.19.patch, HIVE-8395.20.patch, 
 HIVE-8395.21.patch, HIVE-8395.22.patch, HIVE-8395.23.patch, 
 HIVE-8395.23.withon.patch, HIVE-8395.24.patch, HIVE-8395.25.patch, 
 HIVE-8395.25.patch, HIVE-8395.26.patch, HIVE-8395.27.patch, 
 HIVE-8395.28.patch, HIVE-8395.29.patch, HIVE-8395.30.patch, 
 HIVE-8395.31.patch, HIVE-8395.32.patch, HIVE-8395.33.patch, HIVE-8395.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28930: HIVE-8639 : Convert SMBJoin to MapJoin [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28930/#review65528
---

Ship it!


Ship It!

- Xuefu Zhang


On Dec. 18, 2014, 2:07 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28930/
 ---
 
 (Updated Dec. 18, 2014, 2:07 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8639
 https://issues.apache.org/jira/browse/HIVE-8639
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 In MapReduce for auto-SMB joins, SortedMergeJoinProc is run in the earlier 
 Optimizer layer to convert join to SMB join, and SortMergeJoinResolver is run 
 in later PhysicalOptimizer layer to convert it to MapJoin.  For Spark, we 
 have an opportunity to make it cleaner by deciding putting both SMB and 
 MapJoin conversions in the logical layer and deciding which one to call.
 
 This patch introduces a new unitied join processor called 
 'SparkJoinOptimizer' in the logical layer.  This will call 
 'SparkMapJoinOptimizer' and 'SparkSortMergeJoinOptimizer' in a certain order 
 depending on the flags that are set and which ever one is available fails.  
 Thus no need to write a SMB - MapJoin path.
 
 'SparkSortMergeJoinOptimizer' is a new class that wraps the logic of 
 SortedMergeJoinProc but for Spark.  To put both MapJoin/SMB processor in the 
 same level, I had to do some fixes.  
 
 1.  One fix is in 'NonBlockingOpDeDupProc', to fix the join context state, as 
 now its run before the SMB code that relies on it.  For this I submitted a 
 trunk patch at HIVE-9060.
 2.  The second fix is that MapReduce's SMB code did two graph walks, one 
 processor to calculate all 'rejected' joins, and another processor to change 
 the non-rejected ones to SMB join.  That would have made it so we do multiple 
 walks, so I refactored the 'rejected' join logic in the same join-operator 
 visit in SparkSortMergeJoinOptimizer.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java c2e643d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
  680c6fd 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
  83625ef 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 5e432ac 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 b6a7ac2 
   ql/src/test/results/clientpositive/spark/auto_join32.q.out 28c022e 
   ql/src/test/results/clientpositive/spark/auto_join_stats.q.out bccd246 
   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 842b4b3 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 
 2e35c66 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out 
 ee37010 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out 
 b2e928f 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 
 20ee657 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 
 0a48d00 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 
 5008a3f 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 
 3b081af 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 
 2a11fb2 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 
 0d971d2 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out 
 9d455dc 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 
 61eb6ae 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 
 198d50d 
   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out 
 f59e57f 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_2.q.out 
 b58091c 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_4.q.out 
 8ee392e 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_6.q.out 
 9c119df 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_7.q.out 
 b9ad92d 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_8.q.out 
 ed4d03f 
   ql/src/test/results/clientpositive/spark/cross_product_check_2.q.out 
 6fb69a5 
   ql/src/test/results/clientpositive/spark/parquet_join.q.out 240989a 
   ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 268ae23 
   ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out df66cc2 
   ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out f635949 
 
 Diff:

[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252142#comment-14252142
 ] 

Xuefu Zhang commented on HIVE-8639:
---

+1

 Convert SMBJoin to MapJoin [Spark Branch]
 -

 Key: HIVE-8639
 URL: https://issues.apache.org/jira/browse/HIVE-8639
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, 
 HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch


 HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
 partitioned, there could be a slow down as each mapper would need to get a 
 very small chunk of a partition which has a single key. Thus, in some 
 scenarios it's beneficial to convert SMB join to map join.
 The task is to research and support the conversion from SMB join to map join 
 for Spark execution engine.  See the equivalent of MapReduce in 
 SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8639:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks to Szehon for this nice piece.

 Convert SMBJoin to MapJoin [Spark Branch]
 -

 Key: HIVE-8639
 URL: https://issues.apache.org/jira/browse/HIVE-8639
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: spark-branch

 Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, 
 HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch


 HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
 partitioned, there could be a slow down as each mapper would need to get a 
 very small chunk of a partition which has a single key. Thus, in some 
 scenarios it's beneficial to convert SMB join to map join.
 The task is to research and support the conversion from SMB join to map join 
 for Spark execution engine.  See the equivalent of MapReduce in 
 SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9162) stats19 test is environment-dependant

2014-12-18 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252155#comment-14252155
 ] 

Laljo John Pullokkaran commented on HIVE-9162:
--

+1

 stats19 test is environment-dependant
 -

 Key: HIVE-9162
 URL: https://issues.apache.org/jira/browse/HIVE-9162
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9162.patch


 This is a very annoying test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


[ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252159#comment-14252159
 ] 

Szehon Ho commented on HIVE-9161:
-

Looks good to me, +1 pending tests

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


[ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252165#comment-14252165
 ] 

Hive QA commented on HIVE-9161:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12688078/HIVE-9161.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6714 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_udf1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2129/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2129/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2129/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12688078 - PreCommit-HIVE-TRUNK-Build

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url


[ 
https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252173#comment-14252173
 ] 

Szehon Ho commented on HIVE-9158:
-

Seems reasonable, +1

 Multiple LDAP server URLs in hive.server2.authentication.ldap.url
 -

 Key: HIVE-9158
 URL: https://issues.apache.org/jira/browse/HIVE-9158
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Attachments: HIVE-9158.1.patch, LDAPClient.java


 Support for multiple LDAP servers for failover in the event that one stops 
 responding or is down for maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url


 [ 
https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9158:

Labels: TODOC15  (was: )

Need to doc [Configuration Properties| 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2]

 Multiple LDAP server URLs in hive.server2.authentication.ldap.url
 -

 Key: HIVE-9158
 URL: https://issues.apache.org/jira/browse/HIVE-9158
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
  Labels: TODOC15
 Attachments: HIVE-9158.1.patch, LDAPClient.java


 Support for multiple LDAP servers for failover in the event that one stops 
 responding or is down for maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url

2014-12-18 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252190#comment-14252190
 ] 

Naveen Gangam commented on HIVE-9158:
-

Thanks Szehon. I just updated the Configuration Properties to add this info.

 Multiple LDAP server URLs in hive.server2.authentication.ldap.url
 -

 Key: HIVE-9158
 URL: https://issues.apache.org/jira/browse/HIVE-9158
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
  Labels: TODOC15
 Attachments: HIVE-9158.1.patch, LDAPClient.java


 Support for multiple LDAP servers for failover in the event that one stops 
 responding or is down for maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8131) Support timestamp in Avro


 [ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8131:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

 Support timestamp in Avro
 -

 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Fix For: 0.15.0

 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8131) Support timestamp in Avro


[ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252196#comment-14252196
 ] 

Brock Noland commented on HIVE-8131:


Thank you for your contribution! I have committed this to trunk!

 Support timestamp in Avro
 -

 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Fix For: 0.15.0

 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


 [ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9161:
--
Status: Open  (was: Patch Available)

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


 [ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9161:
--
Attachment: HIVE-9161.2.patch

Fixed varchar_udf1.q for java7

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch, HIVE-9161.2.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8


 [ 
https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9161:
--
Status: Patch Available  (was: Open)

 Fix ordering differences on UDF functions due to Java8
 --

 Key: HIVE-9161
 URL: https://issues.apache.org/jira/browse/HIVE-9161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9161.1.patch, HIVE-9161.2.patch


 Java 8 uses a different hash function for HashMap, which is leading to 
 iteration order differences in several cases. (See Java8 vs Java7)
 This part is related to UDF functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8816) Create unit test join of two encrypted tables with different keys