[jira] [Commented] (HIVE-8192) Check DDL's writetype in DummyTxnManager

2014-12-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229612#comment-14229612
 ] 

Hive QA commented on HIVE-8192:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12684356/HIVE-8192.5.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6696 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1938/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1938/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1938/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12684356 - PreCommit-HIVE-TRUNK-Build

 Check DDL's writetype in DummyTxnManager
 

 Key: HIVE-8192
 URL: https://issues.apache.org/jira/browse/HIVE-8192
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0, 0.13.1
 Environment: hive0.13.1
Reporter: Wan Chang
Priority: Minor
  Labels: patch
 Attachments: HIVE-8192.2.patch, HIVE-8192.3.patch, HIVE-8192.4.patch, 
 HIVE-8192.5.patch


 The patch of HIVE-6734 added some DDL writetypes and checked DDL writetype in 
 DbTxnManager.java.
 We use DummyTxnManager as the default value of hive.txn.manager in 
 hive-site.xml. We noticed that the operation of CREATE TEMPORARY FUNCTION has 
 a DLL_NO_LOCK writetype but it requires a EXCLUSIVE lock. If we try to create 
 a temporary function while there's a SELECT is processing at the same 
 database, then the console will print 'conflicting lock present for default 
 mode EXCLUSIVE' and the CREATE TEMPORARY FUNCTION operation won't get the 
 lock until the SELECT is done. Maybe it's a good idea to check the DDL's 
 writetype in DummyTxnManager too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8999) hiveserver2 CUSTOM authentication Fails

2014-12-01 Thread Amithsha (JIRA)
Amithsha created HIVE-8999:
--

 Summary: hiveserver2 CUSTOM authentication Fails
 Key: HIVE-8999
 URL: https://issues.apache.org/jira/browse/HIVE-8999
 Project: Hive
  Issue Type: Bug
  Components: Beeline, HiveServer2
Affects Versions: 0.14.0
 Environment: Centos 6.5 Hadoop 2.4.1 Hive 0.14.0
Reporter: Amithsha


Planned to secure the hiverserver2 Using Custom authentication Method. 
But when the beeline starts and sets the server Ip and port using command. It 
hanges in the terminal after providing the username and Password.
**Procedure 
Followed
*Compiled Java File to create a jar
import java.util.Hashtable;
import javax.security.sasl.AuthenticationException;
import org.apache.hive.service.auth.PasswdAuthenticationProvider;

public class SampleAuthenticator implements PasswdAuthenticationProvider {

  HashtableString, String store = null;

  public SampleAuthenticator () {
store = new HashtableString, String();
store.put(user1, passwd1);
store.put(user2, passwd2);
  }

  @Override
  public void Authenticate(String user, String  password)
  throws AuthenticationException {

String storedPasswd = store.get(user);

if (storedPasswd != null  storedPasswd.equals(password))
  return;

throw new AuthenticationException(SampleAuthenticator: Error validating 
user);
  }

}
-
*Properties Used in Hive-site.xml
property
  namehive.server2.authentication/name
  valueCUSTOM/value
/property

property
  namehive.server2.custom.authentication.class/name
  
valueorg.apache.hive.service.auth.PasswdAuthenticationProvider.SampleAuth/value
/property
--
*Started Beeline 
beeline !connect jdbc:hive2://localhost:1/default
scan complete in 13ms
Connecting to jdbc:hive2://localhost:1/default
Enter username for jdbc:hive2://localhost:1/default: user1
Enter password for jdbc:hive2://localhost:1/default: ***
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/apache-hive/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
--
Can Anyone help me by providing the correct Java file and Procedures to use 
Custom Authentication
Thank you
Amithsha.S



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-860) Persistent distributed cache

2014-12-01 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-860:
--
Attachment: HIVE-860.4.patch

 Persistent distributed cache
 

 Key: HIVE-860
 URL: https://issues.apache.org/jira/browse/HIVE-860
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Zheng Shao
Assignee: Ferdinand Xu
 Fix For: 0.15.0

 Attachments: HIVE-860.1.patch, HIVE-860.2.patch, HIVE-860.2.patch, 
 HIVE-860.3.patch, HIVE-860.4.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch


 DistributedCache is shared across multiple jobs, if the hdfs file name is the 
 same.
 We need to make sure Hive put the same file into the same location every time 
 and do not overwrite if the file content is the same.
 We can achieve 2 different results:
 A1. Files added with the same name, timestamp, and md5 in the same session 
 will have a single copy in distributed cache.
 A2. Filed added with the same name, timestamp, and md5 will have a single 
 copy in distributed cache.
 A2 has a bigger benefit in sharing but may raise a question on when Hive 
 should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9000) LAST_VALUE Window function returns wrong results

2014-12-01 Thread Mark Grover (JIRA)
Mark Grover created HIVE-9000:
-

 Summary: LAST_VALUE Window function returns wrong results
 Key: HIVE-9000
 URL: https://issues.apache.org/jira/browse/HIVE-9000
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.13.1
Reporter: Mark Grover
Priority: Critical
 Fix For: 0.14.1


LAST_VALUE Windowing function has been returning bad results, as far as I can 
tell from day 1.

And, it seems like the tests are also asserting that LAST_VALUE gives the wrong 
result.

Here's the test output:
https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587

The query is:
{code}
select t, s, i, last_value(i) over (partition by t order by s) 
{code}

The result is:
{code}
t  si  last_value(i)
---
10  oscar allen 65662   65662
10  oscar carson65549   65549
{code}

LAST_VALUE(i) should have returned 65549 in both records, instead it simply 
ends up returning i.

Another way you can make sure LAST_VALUE is bad is to verify it's result 
against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value 
should always be more (in terms of the specified 'order by s') than the lead by 
1. While this doesn't directly apply to the above query, if the result set had 
more rows, you would clearly see records where lead is higher than last_value 
which is semantically incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9000) LAST_VALUE Window function returns wrong results

2014-12-01 Thread Mark Grover (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover updated HIVE-9000:
--
Description: 
LAST_VALUE Windowing function has been returning bad results, as far as I can 
tell from day 1.

And, it seems like the tests are also asserting that LAST_VALUE gives the wrong 
result.

Here's the test output:
https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587

The query is:
{code}
select t, s, i, last_value(i) over (partition by t order by s) 
{code}

The result is:
{code}
t  si  last_value(i)
---
10  oscar allen 65662   65662
10  oscar carson65549   65549
{code}

{{LAST_VALUE( i )}} should have returned 65549 in both records, instead it 
simply ends up returning i.

Another way you can make sure LAST_VALUE is bad is to verify it's result 
against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value 
should always be more (in terms of the specified 'order by s') than the lead by 
1. While this doesn't directly apply to the above query, if the result set had 
more rows, you would clearly see records where lead is higher than last_value 
which is semantically incorrect.

  was:
LAST_VALUE Windowing function has been returning bad results, as far as I can 
tell from day 1.

And, it seems like the tests are also asserting that LAST_VALUE gives the wrong 
result.

Here's the test output:
https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587

The query is:
{code}
select t, s, i, last_value(i) over (partition by t order by s) 
{code}

The result is:
{code}
t  si  last_value(i)
---
10  oscar allen 65662   65662
10  oscar carson65549   65549
{code}

LAST_VALUE(i) should have returned 65549 in both records, instead it simply 
ends up returning i.

Another way you can make sure LAST_VALUE is bad is to verify it's result 
against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value 
should always be more (in terms of the specified 'order by s') than the lead by 
1. While this doesn't directly apply to the above query, if the result set had 
more rows, you would clearly see records where lead is higher than last_value 
which is semantically incorrect.


 LAST_VALUE Window function returns wrong results
 

 Key: HIVE-9000
 URL: https://issues.apache.org/jira/browse/HIVE-9000
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.13.1
Reporter: Mark Grover
Priority: Critical
 Fix For: 0.14.1


 LAST_VALUE Windowing function has been returning bad results, as far as I can 
 tell from day 1.
 And, it seems like the tests are also asserting that LAST_VALUE gives the 
 wrong result.
 Here's the test output:
 https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587
 The query is:
 {code}
 select t, s, i, last_value(i) over (partition by t order by s) 
 {code}
 The result is:
 {code}
 t  si  last_value(i)
 ---
 10oscar allen 65662   65662
 10oscar carson65549   65549
 {code}
 {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it 
 simply ends up returning i.
 Another way you can make sure LAST_VALUE is bad is to verify it's result 
 against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last 
 value should always be more (in terms of the specified 'order by s') than the 
 lead by 1. While this doesn't directly apply to the above query, if the 
 result set had more rows, you would clearly see records where lead is higher 
 than last_value which is semantically incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8135) Pool zookeeper connections

2014-12-01 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229993#comment-14229993
 ] 

Brock Noland commented on HIVE-8135:


Curator would be fantastic to use. Do they have a connection pooling recipe or 
a read-write lock?

 Pool zookeeper connections
 --

 Key: HIVE-8135
 URL: https://issues.apache.org/jira/browse/HIVE-8135
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu

 Today we create a ZK connection per client. We should instead have a 
 connection pool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8998) Logging is not configured in spark-submit sub-process

2014-12-01 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8998:
---
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

 Logging is not configured in spark-submit sub-process
 -

 Key: HIVE-8998
 URL: https://issues.apache.org/jira/browse/HIVE-8998
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-8998.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 28557: HIVE-8851 Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]

2014-12-01 Thread Jimmy Xiang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28557/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8851
https://issues.apache.org/jira/browse/HIVE-8851


Repository: hive-git


Description
---

Deploy small files with Spark job, instead of loading them from HDFS.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
cfc1501 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2fe33f5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClient.java a456d6c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
faa91e3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
f46c1b4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0bd18e0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSession.java 
461f359 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 
c95d868 

Diff: https://reviews.apache.org/r/28557/diff/


Testing
---

Unit tests.


Thanks,

Jimmy Xiang



[jira] [Updated] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]

2014-12-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8851:
--
Attachment: HIVE-8851.1-spark.patch

Attached patch v1. It is also on RB: https://reviews.apache.org/r/28557/

 Broadcast files for small tables via SparkContext.addFile() and 
 SparkFiles.get() [Spark Branch]
 ---

 Key: HIVE-8851
 URL: https://issues.apache.org/jira/browse/HIVE-8851
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8851.1-spark.patch


 Currently files generated by SparkHashTableSinkOperator for small tables are 
 written directly on HDFS with a high replication factor. When map join 
 happens, map join operator is going to load these files into hash tables. 
 Since on multiple partitions can be process on the same worker node, reading 
 the same set of files multiple times are not ideal. The improvment can be 
 done by calling SparkContext.addFiles() on these files, and use 
 SparkFiles.getFile() to download them to the worker node just once.
 Please note that SparkFiles.getFile() is a static method. Code invoking this 
 method needs to be in a static method. This calling method needs to be 
 synchronized because it may get called in different threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]

2014-12-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8851:
--
Fix Version/s: spark-branch
   Status: Patch Available  (was: Open)

 Broadcast files for small tables via SparkContext.addFile() and 
 SparkFiles.get() [Spark Branch]
 ---

 Key: HIVE-8851
 URL: https://issues.apache.org/jira/browse/HIVE-8851
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8851.1-spark.patch


 Currently files generated by SparkHashTableSinkOperator for small tables are 
 written directly on HDFS with a high replication factor. When map join 
 happens, map join operator is going to load these files into hash tables. 
 Since on multiple partitions can be process on the same worker node, reading 
 the same set of files multiple times are not ideal. The improvment can be 
 done by calling SparkContext.addFiles() on these files, and use 
 SparkFiles.getFile() to download them to the worker node just once.
 Please note that SparkFiles.getFile() is a static method. Code invoking this 
 method needs to be in a static method. This calling method needs to be 
 synchronized because it may get called in different threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID

2014-12-01 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230134#comment-14230134
 ] 

Alan Gates commented on HIVE-8875:
--

Already added.  Search on hive.optimize.sort.dynamic.partition in 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

 hive.optimize.sort.dynamic.partition should be turned off for ACID
 --

 Key: HIVE-8875
 URL: https://issues.apache.org/jira/browse/HIVE-8875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-8875.2.patch, HIVE-8875.patch


 Turning this on causes ACID insert, updates, and deletes to produce 
 non-optimal plans with extra reduce phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230169#comment-14230169
 ] 

Chao commented on HIVE-8970:


I just regenerated all qfile outputs, and noticed some result differences for 
join38.q, join_literals.q, join_nullsafe.q and subquery_in.q.
Look like they are not related to map-join work, since if I reset my branch to 
the HIVE-8946 commit then the results are correct.
Maybe it's caused by the recent merge or RSC commits.

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8946) Enable Map Join [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230173#comment-14230173
 ] 

Chao commented on HIVE-8946:


Also, for join38.q I ran the query on CLI local mode with 
spark.master=local[4], and the result was correct, but running unit test gave 
different result.

 Enable Map Join [Spark Branch]
 --

 Key: HIVE-8946
 URL: https://issues.apache.org/jira/browse/HIVE-8946
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, 
 HIVE-8946.3-spark.patch


 Since all the related issues have been identified and tracked by related 
 JIRAs, in this JIRA we turn on the map join optimization for Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8946) Enable Map Join [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230175#comment-14230175
 ] 

Chao commented on HIVE-8946:


Sorry for the mistake - the above comment should belong to HIVE-8970.

 Enable Map Join [Spark Branch]
 --

 Key: HIVE-8946
 URL: https://issues.apache.org/jira/browse/HIVE-8946
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, 
 HIVE-8946.3-spark.patch


 Since all the related issues have been identified and tracked by related 
 JIRAs, in this JIRA we turn on the map join optimization for Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Attachment: HIVE-8774.10.patch

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Attachment: (was: HIVE-8774.9.1.patch)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Status: Open  (was: Patch Available)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Status: Patch Available  (was: Open)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]

2014-12-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230227#comment-14230227
 ] 

Marcelo Vanzin commented on HIVE-8991:
--

Hi [~lirui], the patch looks good if it unblocks the unit tests. I have to 
think a bit about whether it would work in a real deployment scenario, since 
IIRC hive-exec shades a lot of dependencies and it might cause problems with 
Spark. But the main one (Guava) should be solved in Spark, so hopefully there 
won't be other cases like that.

 Fix custom_input_output_format [Spark Branch]
 -

 Key: HIVE-8991
 URL: https://issues.apache.org/jira/browse/HIVE-8991
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8991.1-spark.patch


 After HIVE-8836, {{custom_input_output_format}} fails because of missing 
 hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27713: CBO: enable groupBy index

2014-12-01 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27713/
---

(Updated Dec. 1, 2014, 6:57 p.m.)


Review request for hive and John Pullokkaran.


Repository: hive-git


Description
---

Right now, even when groupby index is build, CBO is not able to use it. In this 
patch, we are trying to make it use groupby index that we build. The basic 
problem is that 
for SEL1-SEL2-GRY-...-SEL3,
the previous version only modify SEL2, which immediately precedes GRY.
Now, with CBO, we have lots of SELs, e.g., SEL1.
So, the solution is to modify all of them.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
9ffa708 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 02216de 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
0f06ec9 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 74614f3 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 d699308 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27713/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230254#comment-14230254
 ] 

Marcelo Vanzin commented on HIVE-8995:
--

The three threads are from akka; I wonder if the test code is failing to 
properly shut down clients or the library itself (i.e. call 
{{SparkClientFactory.stop()}}).

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
  
 

[jira] [Issue Comment Deleted] (HIVE-8946) Enable Map Join [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8946:
--
Comment: was deleted

(was: Also, for join38.q I ran the query on CLI local mode with 
spark.master=local[4], and the result was correct, but running unit test gave 
different result.)

 Enable Map Join [Spark Branch]
 --

 Key: HIVE-8946
 URL: https://issues.apache.org/jira/browse/HIVE-8946
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, 
 HIVE-8946.3-spark.patch


 Since all the related issues have been identified and tracked by related 
 JIRAs, in this JIRA we turn on the map join optimization for Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-8946) Enable Map Join [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8946:
--
Comment: was deleted

(was: Sorry for the mistake - the above comment should belong to HIVE-8970.)

 Enable Map Join [Spark Branch]
 --

 Key: HIVE-8946
 URL: https://issues.apache.org/jira/browse/HIVE-8946
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, 
 HIVE-8946.3-spark.patch


 Since all the related issues have been identified and tracked by related 
 JIRAs, in this JIRA we turn on the map join optimization for Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]

2014-12-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230262#comment-14230262
 ] 

Hive QA commented on HIVE-8851:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/1268/HIVE-8851.1-spark.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 7229 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join29
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_table_access_keys_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_join_tests
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_joins_explain
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/468/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/468/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-468/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 1268 - PreCommit-HIVE-SPARK-Build

 Broadcast files for small tables via SparkContext.addFile() and 
 SparkFiles.get() [Spark Branch]
 ---

 Key: HIVE-8851
 URL: https://issues.apache.org/jira/browse/HIVE-8851
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8851.1-spark.patch


 Currently files generated by SparkHashTableSinkOperator for small tables are 
 written directly on HDFS with a high replication factor. When map join 
 happens, map join operator is going to load these files into hash tables. 
 Since on multiple partitions can be process on the same worker node, reading 
 the same set of files multiple times are not ideal. The improvment can be 
 done by calling SparkContext.addFiles() on these files, and use 
 SparkFiles.getFile() to download them to the worker node just once.
 Please note that SparkFiles.getFile() is a static method. Code invoking this 
 method needs to be in a static method. This calling method needs to be 
 synchronized because it may get called in different threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230264#comment-14230264
 ] 

Brock Noland commented on HIVE-8995:


We are not. However that appears to be a JVM wide impact and we had dozens of 
instances of the three akka threads and we won't be able to call 
{{SparkClientFactory.stop()}} after each session terminates when running inside 
HS2 since HS2 will have dozens of sessions concurrently and thousands of 
sessions over a few weeks.

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
  
 

[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230273#comment-14230273
 ] 

Marcelo Vanzin commented on HIVE-8995:
--

You don't need to call that method for every session. The pattern here is:

* Call {{SparkClientFactory.initialize()}} once
* Create / use as many clients as you want
* When app shuts down, call {{SparkClientFactory.stop()}}

So this should work nicely for HS2 (call initialize during bring up, call stop 
during shut down).

I see {{RemoteHiveSparkClient}} calls initialize; that seems wrong, if my 
understanding of that class is correct (that it will be instantiated once for 
each session).

Another option is to make {{initialize}} idempotent; right now it will just 
leak the old akka actor system, which is bad. This should be a trivial change 
(just add a check for {{initialized}}).

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 

[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230272#comment-14230272
 ] 

Chao commented on HIVE-8970:


Also, for join38.q I ran the query on CLI local mode with 
spark.master=local[4], and the result was correct, but running unit test gave 
different result.


 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230291#comment-14230291
 ] 

Brock Noland commented on HIVE-8995:


bq. You don't need to call that method for every session

Yes I was just saying that calling this might not be our problem.

bq. I see {{RemoteHiveSparkClient}} calls initialize;

This seems like it's the issue. We should change this and throw an exception if 
it's called twice.

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
  
 

[jira] [Comment Edited] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230291#comment-14230291
 ] 

Brock Noland edited comment on HIVE-8995 at 12/1/14 7:31 PM:
-

bq. You don't need to call that method for every session

Yes I was just saying that calling this might not be our problem.

bq. I see {{RemoteHiveSparkClient}} calls initialize;

This seems like it's the issue. We should change this and throw an exception if 
it's called twice.


was (Author: brocknoland):
bq. You don't need to call that method for every session

Yes I was just saying that calling this might not be our problem.

bq. I see {{RemoteHiveSparkClient}} calls initialize;

This seems like it's the issue. We should change this and throw an exception if 
it's called twice.

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
  
 

[jira] [Updated] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]

2014-12-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8851:
--
Status: Open  (was: Patch Available)

 Broadcast files for small tables via SparkContext.addFile() and 
 SparkFiles.get() [Spark Branch]
 ---

 Key: HIVE-8851
 URL: https://issues.apache.org/jira/browse/HIVE-8851
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8851.1-spark.patch


 Currently files generated by SparkHashTableSinkOperator for small tables are 
 written directly on HDFS with a high replication factor. When map join 
 happens, map join operator is going to load these files into hash tables. 
 Since on multiple partitions can be process on the same worker node, reading 
 the same set of files multiple times are not ideal. The improvment can be 
 done by calling SparkContext.addFiles() on these files, and use 
 SparkFiles.getFile() to download them to the worker node just once.
 Please note that SparkFiles.getFile() is a static method. Code invoking this 
 method needs to be in a static method. This calling method needs to be 
 synchronized because it may get called in different threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


0.15 release

2014-12-01 Thread Brock Noland
Hi,

In 2014 we did two large releases. Thank you very much to the RM's for
pushing those out! I've found that Apache projects gain traction
through releasing often, thus I think we should aim to increase the
rate of releases in 2015. (Not that I cannot complain since I did not
volunteer to RM any release.)

As such I'd like to volunteer as RM for the 0.15 release.

Cheers,
Brock


[jira] [Updated] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-12-01 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8970:
---
Attachment: HIVE-8970.3-spark.patch

Excluded the following tests from this patch:

{noformat}
join38.q
join_literals.q
join_nullsafe.q
subquery_in.q
ppd_join4.q
ppd_multi_insert.q
{noformat}

Their results are different from MR. I will create follow-up JIRA to address 
the issue.

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, 
 HIVE-8970.3-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-860) Persistent distributed cache

2014-12-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230346#comment-14230346
 ] 

Hive QA commented on HIVE-860:
--



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12684389/HIVE-860.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1939/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1939/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1939/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative,
 remoteFile=/home/hiveptest/54.167.108.186-hiveptest-2/logs/, getExitCode()=12, 
getException()=null, getUser()=hiveptest, getHost()=54.167.108.186, 
getInstance()=2]: 'Address 54.167.108.186 maps to 
ec2-54-167-108-186.compute-1.amazonaws.com, but this does not map back to the 
address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
41320448   1%   39.41MB/s0:01:14
85950464   2%   41.00MB/s0:01:10
   130056192   4%   41.37MB/s0:01:08
   174522368   5%   41.63MB/s0:01:07
   218693632   7%   42.31MB/s0:01:05
   262471680   8%   42.11MB/s0:01:04
   306970624  10%   42.16MB/s0:01:03
   350945280  11%   42.01MB/s0:01:02
   384303104  12%   38.22MB/s0:01:08
   395575296  12%   30.30MB/s0:01:25
   425820160  13%   27.08MB/s0:01:34
rsync: write failed on 
/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log:
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (107 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but 
this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
rsync: write failed on 
/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log:
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (107 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but 
this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
rsync: write failed on 
/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log:
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (107 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but 
this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
rsync: write failed on 
/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log:
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (107 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but 
this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
receiving incremental file list
./
hive.log
   0   0%0.00kB/s0:00:00
rsync: write failed on 
/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log:
 No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6]
rsync: connection unexpectedly closed (107 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[generator=3.0.6]
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12684389 - PreCommit-HIVE-TRUNK-Build

 Persistent distributed cache
 

Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file

2014-12-01 Thread Ryan Blue

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28372/#review63424
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/28372/#comment105680

What does this change do?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java
https://reviews.apache.org/r/28372/#comment105672

I don't think ColInfoFromParquetFile is a great name for this class. What 
you've implemented here is a schema converter, like HiveSchemaConverter but 
from MessageType to TypeInfo rather than the other way around. I think this 
should either go in HiveSchemaConverter or a new class, like 
ParquetToHiveSchemaConverter.

This update would also help clarify the methods exposed. I think that the 
primary method this should expose is:
  StructType convert(GroupType parquetSchema)
  
File reading should be done elsewhere if it is necessary.

Also, I think this should return a StructType instead of a Pair with 
Strings. That would be a lot cleaner and would avoid the custom type string 
building immediately followed by parsing those type strings.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
https://reviews.apache.org/r/28372/#comment105675

Could you please undo the import moves? I like to avoid non-functional 
changes.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
https://reviews.apache.org/r/28372/#comment105676

Is it possible to avoid setting a file property? There are lots of cases 
where a file in the dataset would be removed so this is a brittle method of 
configuring the table. Ideally, we would check for the LIST_COLUMN_TYPES 
property and if we don't find it, conver the schema of the first file that we 
find.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
https://reviews.apache.org/r/28372/#comment105678

This non-functional change is fine with me because it fixes the style in a 
method you're editing. But if you add a newline after this conditional, then 
you should also add one after line 149.


- Ryan Blue


On Nov. 26, 2014, 5:08 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28372/
 ---
 
 (Updated Nov. 26, 2014, 5:08 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8950
 https://issues.apache.org/jira/browse/HIVE-8950
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-8950: Add support in ParquetHiveSerde to create table schema from a 
 parquet file
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
 fafd78e63e9b41c9fdb0e017b567dc719d151784 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
 4effe736fcf9d3715f03eed9885c299a7aa040dd 
   
 ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q
  PRE-CREATION 
   ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q 
 PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q
  PRE-CREATION 
   ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q 
 PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out 
 PRE-CREATION 
   
 

[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230375#comment-14230375
 ] 

Chao commented on HIVE-8981:


[~szehon] I just ran unit test on this one and the result looks correct to me. 
Any idea on how to reproduce this issue? Thanks.

 Not a directory error in mapjoin_hook.q [Spark Branch]
 --

 Key: HIVE-8981
 URL: https://issues.apache.org/jira/browse/HIVE-8981
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
 Environment: Using remote-spark context with 
 spark-master=local-cluster [2,2,1024]
Reporter: Szehon Ho
Assignee: Chao

 Hits the following exception:
 {noformat}
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost 
 task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - Caused by: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 

Re: Review Request 28510: HIVE-8974

2014-12-01 Thread Julian Hyde

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28510/#review63433
---

Ship it!


Ship It!

- Julian Hyde


On Nov. 27, 2014, 2:37 p.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28510/
 ---
 
 (Updated Nov. 27, 2014, 2:37 p.m.)
 
 
 Review request for hive, John Pullokkaran and Julian Hyde.
 
 
 Bugs: HIVE-8974
 https://issues.apache.org/jira/browse/HIVE-8974
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Upgrade to Calcite 1.0.0-SNAPSHOT
 
 
 Diffs
 -
 
   pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveDefaultRelMetadataProvider.java
  e9e052ffe8759fa9c49377c58d41450feee0b126 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveOptiqUtil.java 
 80f657e9b1e7e9e965e6814ae76de78316367135 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveTypeSystemImpl.java 
 1bc5a2cfca071ea02a446ae517481f927193f23c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/OptiqSemanticException.java
  d2b08fa64b868942b7636df171ed89f0081f7253 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/RelOptHiveTable.java 
 080d27fa873f071fb2e0f7932ad26819b79d0477 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/TraitsUtil.java 
 4b44a28ca77540fd643fc03b89dcb4b2155d081a 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCost.java 
 72fe5d6f26d0fd9a34c8e89be3040cce4593fd4a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCostUtil.java 
 7436f12f662542c41e71a7fee37179e35e4e2553 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveVolcanoPlanner.java
  5deb801649f47e0629b3583ef57c62d4a4699f78 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveAggregateRel.java
  fc198958735e12cb3503a0b4c486d8328a10a2fa 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveFilterRel.java
  8b850463ac1c3270163725f876404449ef8dc5f9 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveJoinRel.java
  3d6aa848cd4c83ec8eb22f7df449911d67a53b9b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveLimitRel.java
  f8755d0175c10e5b5461649773bf44abe998b44e 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java
  7b434ea58451bef6a6566eb241933843ee855606 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveRel.java
  4738c4ac2d33cd15d2db7fe4b8336e1f59dd5212 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveSortRel.java
  f85363d50c1c3eb9cef39072106057669454d4da 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveTableScanRel.java
  bd66459def099df6432f344a9d8439deef09daa6 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveUnionRel.java
  d34fe9540e239c13f6bd23894056305c0c402e0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HiveMergeProjectRule.java
  d6581e64fc8ea183666ea6c91397378456461088 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePartitionPrunerRule.java
  ee19a6cbab0597242214e915745631f76214f70f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePushFilterPastJoinRule.java
  1c483eabcc1aa43cc80d7b71e21a4ae4d30a7e12 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/PartitionPruner.java
  bdc8373877c1684855d256c9d45743f383fc7615 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/FilterSelectivityEstimator.java
  28bf2ad506656b78894467c30364d751b180676e 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdDistinctRowCount.java
  4be57b110c1a45819467d55e8a69e5529989c8f6 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdRowCount.java
  8c7f643940b74dd7743635c3eaa046d52d41346f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdSelectivity.java
  49d2ee5a67b72fbf6134ce71de1d7260069cd16f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdUniqueKeys.java
  c3c8bdd2466b0f46d49437fcf8d49dbb689cfcda 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java
  58320c73aafbfeec025f52ee813b3cfd06fa0821 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
  a217d70e48da0835fed3565ba510bcc9e86c0fa1 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ExprNodeConverter.java
  65c6322d68ef234fbf55a4a36b4cb47e69c30cac 
   
 

[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230427#comment-14230427
 ] 

Xuefu Zhang commented on HIVE-8970:
---

[~csun], just to clarify, w/o your patch here, do these tests give correct 
result? Do they give correct result when master=local[4]? Basically I'm unclear 
if the current golden files are correct.

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, 
 HIVE-8970.3-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-8995:
-

Assignee: Rui Li

[~ruili], could you take a look at this? Thanks.

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
  
 skewjoinopt3.q,skewjoinopt4.q,skewjoinopt5.q,skewjoinopt6.q,skewjoinopt7.q,skewjoinopt8.q,skewjoinopt9.q,smb_mapjoin9.q,smb_mapjoin_1.q,smb_mapjoin_10.q,smb_mapjoin_13.q,smb_mapjoin_14.q,smb_mapjoin_15.q,smb_mapjoin_16.q,smb_mapjoin_17.q,smb_mapjoin_2.q,smb_mapjoin_25.q,smb_mapjoin_3.q,smb_mapjoin_4.q,smb_mapjoin_5.q,smb_mapjoin_6.q,smb_mapjoin_7.q,sort_merge_join_desc_1.q,sort_merge_join_desc_2.q,sort_merge_join_desc_3.q,sort_merge_join_desc_4.q,sort_merge_join_desc_5.q,sort_merge_join_desc_6.q,sort_merge_join_desc_7.q,sort_merge_join_desc_8.q
  
 

[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230465#comment-14230465
 ] 

Szehon Ho commented on HIVE-8981:
-

That's interesting, maybe it went away with some of the recent checkins..  I 
guess we'll keep an eye out if it happens again.

 Not a directory error in mapjoin_hook.q [Spark Branch]
 --

 Key: HIVE-8981
 URL: https://issues.apache.org/jira/browse/HIVE-8981
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
 Environment: Using remote-spark context with 
 spark-master=local-cluster [2,2,1024]
Reporter: Szehon Ho
Assignee: Chao

 Hits the following exception:
 {noformat}
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost 
 task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - Caused by: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 

[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230468#comment-14230468
 ] 

Xuefu Zhang commented on HIVE-8991:
---

[~vanzin], this doesn't block anything, and so let's do it in the right way. In 
the meantime, does it make sense for you to take this JIRA while you're doing 
the research? Thanks.

 Fix custom_input_output_format [Spark Branch]
 -

 Key: HIVE-8991
 URL: https://issues.apache.org/jira/browse/HIVE-8991
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8991.1-spark.patch


 After HIVE-8836, {{custom_input_output_format}} fails because of missing 
 hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230470#comment-14230470
 ] 

Chao commented on HIVE-8970:


Yes, I believe so. When I enable mapjoin, I compared the unit test results 
against the previous results in the spark branch, which was previously compared 
against the MR results. 

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, 
 HIVE-8970.3-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8957:
--
Status: Open  (was: Patch Available)

 Remote spark context needs to clean up itself in case of connection timeout 
 [Spark Branch]
 --

 Key: HIVE-8957
 URL: https://issues.apache.org/jira/browse/HIVE-8957
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-8957.1-spark.patch


 In the current SparkClient implementation (class SparkClientImpl), the 
 constructor does some initialization and in the end waits for the remote 
 driver to connect. In case of timeout, it just throws an exception without 
 cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230473#comment-14230473
 ] 

Xuefu Zhang commented on HIVE-8957:
---

[~vanzin], would you mind owning the JIRA for now until you figure out a 
solution? 

 Remote spark context needs to clean up itself in case of connection timeout 
 [Spark Branch]
 --

 Key: HIVE-8957
 URL: https://issues.apache.org/jira/browse/HIVE-8957
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-8957.1-spark.patch


 In the current SparkClient implementation (class SparkClientImpl), the 
 constructor does some initialization and in the end waits for the remote 
 driver to connect. In case of timeout, it just throws an exception without 
 cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-12-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230478#comment-14230478
 ] 

Marcelo Vanzin commented on HIVE-8957:
--

If you don't mind the bug remaining unattended for several days, sure. I have 
my hands full with all sorts of other things at the moment.

 Remote spark context needs to clean up itself in case of connection timeout 
 [Spark Branch]
 --

 Key: HIVE-8957
 URL: https://issues.apache.org/jira/browse/HIVE-8957
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-8957.1-spark.patch


 In the current SparkClient implementation (class SparkClientImpl), the 
 constructor does some initialization and in the end waits for the remote 
 driver to connect. In case of timeout, it just throws an exception without 
 cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230486#comment-14230486
 ] 

Xuefu Zhang commented on HIVE-8981:
---

Yeah, the test seems passed in the latest test run: 
https://issues.apache.org/jira/browse/HIVE-8998?focusedCommentId=14229321page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14229321

Closing this for now.

 Not a directory error in mapjoin_hook.q [Spark Branch]
 --

 Key: HIVE-8981
 URL: https://issues.apache.org/jira/browse/HIVE-8981
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
 Environment: Using remote-spark context with 
 spark-master=local-cluster [2,2,1024]
Reporter: Szehon Ho
Assignee: Chao

 Hits the following exception:
 {noformat}
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost 
 task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - Caused by: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while 

[jira] [Resolved] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8981.
---
Resolution: Cannot Reproduce

 Not a directory error in mapjoin_hook.q [Spark Branch]
 --

 Key: HIVE-8981
 URL: https://issues.apache.org/jira/browse/HIVE-8981
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
 Environment: Using remote-spark context with 
 spark-master=local-cluster [2,2,1024]
Reporter: Szehon Ho
Assignee: Chao

 Hits the following exception:
 {noformat}
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost 
 task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - Caused by: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:100)
 2014-11-26 15:17:11,729 

[jira] [Commented] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230484#comment-14230484
 ] 

Hive QA commented on HIVE-8774:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12684451/HIVE-8774.10.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6697 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1940/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1940/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1940/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12684451 - PreCommit-HIVE-TRUNK-Build

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-12-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230489#comment-14230489
 ] 

Hive QA commented on HIVE-8970:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12684466/HIVE-8970.3-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7223 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/469/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/469/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-469/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12684466 - PreCommit-HIVE-SPARK-Build

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, 
 HIVE-8970.3-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 28283: HIVE-8900:Create encryption testing framework

2014-12-01 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28283/#review63437
---



itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
https://reviews.apache.org/r/28283/#comment105686

Do we need to set this value? For what I know, AES/CTR/NoPadding is the 
only cipher mode that HDFS supports.



itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
https://reviews.apache.org/r/28283/#comment105687

I think this method 'in itEncryptionRelatedConfIfNeeded()' can be called 
inside the block line 370
as it is only called when clusterType is encrypted. Also, we may rename the 
method for a shorter name as IfNeeded won't be used.



itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
https://reviews.apache.org/r/28283/#comment105688

What if we move this line inside initEncryptionConf()? It is part of 
encryption initialization.



itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
https://reviews.apache.org/r/28283/#comment105689

- May we rename this method so that starts with the 'init' verb? This is 
just a good pratice I've learned in order
  to read code much better. Also, IfNeeded() is the correct syntax.
- We could also get rid of the IfNeeded() word (making the name shorter) if 
if add the validation when this method
  is called instead of inside the method. It is just an opinion.



itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
https://reviews.apache.org/r/28283/#comment105690

Just to comment that AES-256 can be used only if JCE is installed in your 
environment. Otherwise, any encryption
  with this key will fail. Keys can be created, but when you try to encrypt 
something, fails. We should put a 
  comment here so that another developer knows this.



ql/src/test/templates/TestEncrytedHDFSCliDriver.vm
https://reviews.apache.org/r/28283/#comment105692

Why do we need this new class instead of TestCliDriver.vm?



shims/0.20/src/main/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
https://reviews.apache.org/r/28283/#comment105696

I think we should leave the 'hadoop.encryption.is.not.supported' key name 
on unsupported hadoop versions. This was left only as a comment for developers. 
Nobody will use this configuration key anyways.



shims/0.20/src/main/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
https://reviews.apache.org/r/28283/#comment105695

Do we need these two configuration values in the configuration environment? 
These are used only for test purposes on QTestUtil. The user won't use these 
fields on hive-site.xml ever. Or not yet.



shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
https://reviews.apache.org/r/28283/#comment105693

I think we should leave the 'hadoop.encryption.is.not.supported' key name 
on unsupported hadoop versions. This was left only as a comment for developers. 
Nobody will use this configuration key anyways.



shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
https://reviews.apache.org/r/28283/#comment105694

Do we need these two configuration values in the configuration environment? 
These are used only for test purposes on QTestUtil. The user won't use these 
fields on hive-site.xml ever. Or not yet.



shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
https://reviews.apache.org/r/28283/#comment105697

Let's import the necessary modules only. I think the IDE did this 
replacement.



shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
https://reviews.apache.org/r/28283/#comment105698

Why was this block removed? I see the keyProvider variable is initialized 
inside getMiniDfs() method (testing). But what will happen with production code?


- Sergio Pena


On Nov. 28, 2014, 1:45 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28283/
 ---
 
 (Updated Nov. 28, 2014, 1:45 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The patch includes:
 1. enable security properties for hive security cluster
 
 
 Diffs
 -
 
   .gitignore c5decaf 
   data/scripts/q_test_cleanup_for_encryption.sql PRE-CREATION 
   data/scripts/q_test_init_for_encryption.sql PRE-CREATION 
   itests/qtest/pom.xml 376f4a9 
   itests/src/test/resources/testconfiguration.properties 3ae001d 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 31d5c29 
   ql/src/test/queries/clientpositive/create_encrypted_table.q PRE-CREATION 
   ql/src/test/templates/TestEncrytedHDFSCliDriver.vm PRE-CREATION 
   

[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230502#comment-14230502
 ] 

Xuefu Zhang commented on HIVE-8957:
---

That's all right. I think I can bug you on this when you have cycles.

 Remote spark context needs to clean up itself in case of connection timeout 
 [Spark Branch]
 --

 Key: HIVE-8957
 URL: https://issues.apache.org/jira/browse/HIVE-8957
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-8957.1-spark.patch


 In the current SparkClient implementation (class SparkClientImpl), the 
 constructor does some initialization and in the end waits for the remote 
 driver to connect. In case of timeout, it just throws an exception without 
 cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230548#comment-14230548
 ] 

Chao commented on HIVE-8982:


I ran mapjoin_mapjoin and auto_join31 each 10 times on the latest spark branch, 
but couldn't reproduce the issue. Is this still occuring on jenkins?

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 

[jira] [Assigned] (HIVE-8992) Fix two bucket related test failures, infer_bucket_sort_convert_join.q and parquet_join.q [Spark Branch]

2014-12-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-8992:
-

Assignee: Jimmy Xiang

 Fix two bucket related test failures, infer_bucket_sort_convert_join.q and 
 parquet_join.q [Spark Branch]
 

 Key: HIVE-8992
 URL: https://issues.apache.org/jira/browse/HIVE-8992
 Project: Hive
  Issue Type: Sub-task
  Components: spark-branch
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 Failures shown in HIVE-8836. The seemed related to wrong reducer numbers in 
 terms of bucket join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8374) schematool fails on Postgres versions 9.2

2014-12-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8374:
---
Fix Version/s: 0.14.1

 schematool fails on Postgres versions  9.2
 ---

 Key: HIVE-8374
 URL: https://issues.apache.org/jira/browse/HIVE-8374
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-8374.1.patch, HIVE-8374.2.patch, HIVE-8374.3.patch, 
 HIVE-8374.patch


 The upgrade script for HIVE-5700 creates an UDF with language 'plpgsql',
 which is available by default only for Postgres 9.2+.
 For older Postgres versions, the language must be explicitly created,
 otherwise schematool fails with the error:
 {code}
 Error: ERROR: language plpgsql does not exist
   Hint: Use CREATE LANGUAGE to load the language into the database. 
 (state=42704,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8374) schematool fails on Postgres versions 9.2

2014-12-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230555#comment-14230555
 ] 

Sergey Shelukhin commented on HIVE-8374:


backported to 14

 schematool fails on Postgres versions  9.2
 ---

 Key: HIVE-8374
 URL: https://issues.apache.org/jira/browse/HIVE-8374
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-8374.1.patch, HIVE-8374.2.patch, HIVE-8374.3.patch, 
 HIVE-8374.patch


 The upgrade script for HIVE-5700 creates an UDF with language 'plpgsql',
 which is available by default only for Postgres 9.2+.
 For older Postgres versions, the language must be explicitly created,
 otherwise schematool fails with the error:
 {code}
 Error: ERROR: language plpgsql does not exist
   Hint: Use CREATE LANGUAGE to load the language into the database. 
 (state=42704,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.15 release

2014-12-01 Thread Thejas Nair
+1 .
Regarding the next version being 0.15 - I have some thoughts on the
versioning of hive. I will start a different thread on that.


On Mon, Dec 1, 2014 at 11:43 AM, Brock Noland br...@cloudera.com wrote:
 Hi,

 In 2014 we did two large releases. Thank you very much to the RM's for
 pushing those out! I've found that Apache projects gain traction
 through releasing often, thus I think we should aim to increase the
 rate of releases in 2015. (Not that I cannot complain since I did not
 volunteer to RM any release.)

 As such I'd like to volunteer as RM for the 0.15 release.

 Cheers,
 Brock

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup

2014-12-01 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8886:
---
Status: In Progress  (was: Patch Available)

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-9001:
---

 Summary: Ship with log4j.properties file that has a reliable time 
based rolling policy
 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


The hive log gets locked by the hive process and cannot be rolled in windows OS.
Install Hive in  Windows, start hive, try and rename hive log while Hive is 
running. 
Wait for log4j tries to rename it and it will throw the same error as it is 
locked by the process.

The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should 
be integrated to Hive (Internal as well as trunk) for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup

2014-12-01 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8886:
---
Attachment: HIVE-8886.02.patch

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230567#comment-14230567
 ] 

Szehon Ho commented on HIVE-8982:
-

Yea.  I still see some random failures in mapjoin tests like:

[http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/464/testReport/junit/org.apache.hadoop.hive.cli/TestSparkCliDriver/testCliDriver_mapjoin_hook/|http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/464/testReport/junit/org.apache.hadoop.hive.cli/TestSparkCliDriver/testCliDriver_mapjoin_hook/]

Usually when I get those, I see this exception.  I didnt dig too deep into the 
latest random failure logs to confirm again though.

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 

[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230568#comment-14230568
 ] 

Xuefu Zhang commented on HIVE-8982:
---

It doesn't seem they are happening any more. Feel free to close this.

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 

[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Attachment: HIVE-9001.1.patch

cc-ing [~sushanth] for reviewing this change.

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive (Internal as well as trunk) for a reliable 
 rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup

2014-12-01 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8886:
---
Status: Patch Available  (was: In Progress)

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Status: Patch Available  (was: Open)

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Description: 
The hive log gets locked by the hive process and cannot be rolled in windows OS.
Install Hive in  Windows, start hive, try and rename hive log while Hive is 
running. 
Wait for log4j tries to rename it and it will throw the same error as it is 
locked by the process.

The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should 
be integrated to Hive for a reliable rollover.

  was:
The hive log gets locked by the hive process and cannot be rolled in windows OS.
Install Hive in  Windows, start hive, try and rename hive log while Hive is 
running. 
Wait for log4j tries to rename it and it will throw the same error as it is 
locked by the process.

The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should 
be integrated to Hive (Internal as well as trunk) for a reliable rollover.


 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230572#comment-14230572
 ] 

Chao commented on HIVE-8982:


OK, closing for now.

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 

[jira] [Assigned] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-8982:
--

Assignee: Chao

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 

[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Attachment: (was: HIVE-9001.1.patch)

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Attachment: HIVE-9001.1.patch

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao resolved HIVE-8982.

Resolution: Cannot Reproduce

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 

[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230583#comment-14230583
 ] 

Szehon Ho commented on HIVE-8982:
-

I dug a little and found the exception again here as part of run 464.  See 
[http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-464/failed/TestSparkCliDriver-groupby_complex_types.q-auto_join9.q-groupby_map_ppr.q-and-12-more/spark.log|http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-464/failed/TestSparkCliDriver-groupby_complex_types.q-auto_join9.q-groupby_map_ppr.q-and-12-more/spark.log].
  I think its still unresolved..

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 

[jira] [Reopened] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reopened HIVE-8982:


 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 

[jira] [Resolved] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao resolved HIVE-8982.

Resolution: Cannot Reproduce

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 

[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230588#comment-14230588
 ] 

Szehon Ho commented on HIVE-8982:
-

Sorry this is the Not a directory exception that was closed in the other 
JIRA..

 IndexOutOfBounds exception in mapjoin [Spark Branch]
 

 Key: HIVE-8982
 URL: https://issues.apache.org/jira/browse/HIVE-8982
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 There are sometimes random failures in spark mapjoin during unit tests like:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
   ... 20 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at 
 org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)

[GitHub] hive pull request: How to calculate the Kendall coefficient of cor...

2014-12-01 Thread MarcinKosinski
GitHub user MarcinKosinski opened a pull request:

https://github.com/apache/hive/pull/24

How to calculate the Kendall coefficient of correlation of a pair of a 
numeric columns in the group?

In this [wiki 
page](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) 
there is a function `corr()` that calculates the Pearson coefficient of 
correlation, but my question is that: is there any function in Hive that 
enables to calculate the Kendall coefficient of correlation of a pair of a 
numeric columns in the group?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/hive HIVE-8065

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #24


commit 1628cb08e0bf1c6b168a9aa7b6f978a943cdc105
Author: Brock Noland br...@apache.org
Date:   2014-11-05T23:38:20Z

Creating branch for HIVE-8065

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1637006 
13f79535-47bb-0310-9956-ffa450edef68

commit a9a413d6f4bd7273caf3d26bd4dd2b0d9672d56d
Author: Brock Noland br...@apache.org
Date:   2014-11-14T00:04:08Z

HIVE-8749 - Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT (Sergio Pena 
via Brock)

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1639558 
13f79535-47bb-0310-9956-ffa450edef68

commit b45941d8b64e3b2553034cc6ae212a31084a694d
Author: Brock Noland br...@apache.org
Date:   2014-11-17T22:36:47Z

HIVE-8750 - Commit initial encryption work (Sergio Pena via Brock)

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1640247 
13f79535-47bb-0310-9956-ffa450edef68

commit 184cf1ef21d7f9e8ce6b9d39044708d6daf1ffab
Author: Brock Noland br...@apache.org
Date:   2014-11-18T22:51:55Z

HIVE-8904 - Hive should support multiple Key provider modes (Ferdinand Xu 
via Brock)

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1640446 
13f79535-47bb-0310-9956-ffa450edef68

commit 61c468250512d7242aa343d59f2a81e3174ea112
Author: Brock Noland br...@apache.org
Date:   2014-11-20T06:10:44Z

HIVE-8919 - Fix FileUtils.copy() method to call distcp only for HDFS files 
(not local files) (Sergio Pena via Brock)

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1640684 
13f79535-47bb-0310-9956-ffa450edef68

commit 018b67cadc0dbad64df05819d92b87f2dc5bdaf8
Author: Brock Noland br...@apache.org
Date:   2014-11-21T21:57:23Z

HIVE-8945 - Allow user to read encrypted read-only tables only if the 
scratch directory is encrypted (Sergio Pena via Brock)

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1641007 
13f79535-47bb-0310-9956-ffa450edef68




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Reopened] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]

2014-12-01 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reopened HIVE-8981:


 Not a directory error in mapjoin_hook.q [Spark Branch]
 --

 Key: HIVE-8981
 URL: https://issues.apache.org/jira/browse/HIVE-8981
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
 Environment: Using remote-spark context with 
 spark-master=local-cluster [2,2,1024]
Reporter: Szehon Ho
Assignee: Chao

 Hits the following exception:
 {noformat}
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost 
 task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - Caused by: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:100)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 

[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]

2014-12-01 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230600#comment-14230600
 ] 

Chao commented on HIVE-8981:


[~szehon] Is this issue also happening randomly? What is the failing test? Any 
suggestion on how to reproduce it?

 Not a directory error in mapjoin_hook.q [Spark Branch]
 --

 Key: HIVE-8981
 URL: https://issues.apache.org/jira/browse/HIVE-8981
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
 Environment: Using remote-spark context with 
 spark-master=local-cluster [2,2,1024]
Reporter: Szehon Ho
Assignee: Chao

 Hits the following exception:
 {noformat}
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost 
 task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
 2014-11-26 15:17:11,728 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744)
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(364)) - Caused by: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 2014-11-26 15:17:11,729 INFO  [stderr-redir-1]: client.SparkClientImpl 
 

[jira] [Updated] (HIVE-6421) abs() should preserve precision/scale of decimal input

2014-12-01 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-6421:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for review Ashutosh

 abs() should preserve precision/scale of decimal input
 --

 Key: HIVE-6421
 URL: https://issues.apache.org/jira/browse/HIVE-6421
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.15.0

 Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch, HIVE-6421.3.patch


 {noformat}
 hive describe dec1;
 OK
 c1decimal(10,2)   None 
 hive explain select c1, abs(c1) from dec1;
  ...
 Select Operator
   expressions: c1 (type: decimal(10,2)), abs(c1) (type: 
 decimal(38,18))
 {noformat}
 Given that abs() is a GenericUDF it should be possible for the return type 
 precision/scale to match the input precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.15 release

2014-12-01 Thread Thejas Nair
Brock,
When you say more frequent releases, what schedule do you have in mind
? I think a (approximately) quarterly release cycle would be good.
We branched for hive 0.14 on Sept 25, which means we have been adding
new features not in 0.14 for more than 2 months.
How about branching for 0.15 equivalent in another month or two ?
Sometime in Jan ?



On Mon, Dec 1, 2014 at 2:19 PM, Thejas Nair the...@hortonworks.com wrote:
 +1 .
 Regarding the next version being 0.15 - I have some thoughts on the
 versioning of hive. I will start a different thread on that.


 On Mon, Dec 1, 2014 at 11:43 AM, Brock Noland br...@cloudera.com wrote:
 Hi,

 In 2014 we did two large releases. Thank you very much to the RM's for
 pushing those out! I've found that Apache projects gain traction
 through releasing often, thus I think we should aim to increase the
 rate of releases in 2015. (Not that I cannot complain since I did not
 volunteer to RM any release.)

 As such I'd like to volunteer as RM for the 0.15 release.

 Cheers,
 Brock

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Review Request 27713: CBO: enable groupBy index

2014-12-01 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27713/#review63459
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java
https://reviews.apache.org/r/27713/#comment105714

I don't think you can allow function wraping index key. Since we don't know 
if UDF is going to  mutate the values (Non Null - Null, Null - Non Null).

Example:
select a, count(b) from (select a, (case a is null then 1 else a) as b from 
r1)r2 group by a;


- John Pullokkaran


On Dec. 1, 2014, 6:57 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27713/
 ---
 
 (Updated Dec. 1, 2014, 6:57 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build. The 
 basic problem is that 
 for SEL1-SEL2-GRY-...-SEL3,
 the previous version only modify SEL2, which immediately precedes GRY.
 Now, with CBO, we have lots of SELs, e.g., SEL1.
 So, the solution is to modify all of them.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 9ffa708 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  02216de 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  0f06ec9 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  74614f3 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  d699308 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27713/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong
 




Re: Review Request 28510: HIVE-8974

2014-12-01 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28510/#review63461
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
https://reviews.apache.org/r/28510/#comment105717

Why can't we reuse HiveAggregateRel?


- John Pullokkaran


On Nov. 27, 2014, 2:37 p.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28510/
 ---
 
 (Updated Nov. 27, 2014, 2:37 p.m.)
 
 
 Review request for hive, John Pullokkaran and Julian Hyde.
 
 
 Bugs: HIVE-8974
 https://issues.apache.org/jira/browse/HIVE-8974
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Upgrade to Calcite 1.0.0-SNAPSHOT
 
 
 Diffs
 -
 
   pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveDefaultRelMetadataProvider.java
  e9e052ffe8759fa9c49377c58d41450feee0b126 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveOptiqUtil.java 
 80f657e9b1e7e9e965e6814ae76de78316367135 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveTypeSystemImpl.java 
 1bc5a2cfca071ea02a446ae517481f927193f23c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/OptiqSemanticException.java
  d2b08fa64b868942b7636df171ed89f0081f7253 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/RelOptHiveTable.java 
 080d27fa873f071fb2e0f7932ad26819b79d0477 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/TraitsUtil.java 
 4b44a28ca77540fd643fc03b89dcb4b2155d081a 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCost.java 
 72fe5d6f26d0fd9a34c8e89be3040cce4593fd4a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCostUtil.java 
 7436f12f662542c41e71a7fee37179e35e4e2553 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveVolcanoPlanner.java
  5deb801649f47e0629b3583ef57c62d4a4699f78 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveAggregateRel.java
  fc198958735e12cb3503a0b4c486d8328a10a2fa 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveFilterRel.java
  8b850463ac1c3270163725f876404449ef8dc5f9 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveJoinRel.java
  3d6aa848cd4c83ec8eb22f7df449911d67a53b9b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveLimitRel.java
  f8755d0175c10e5b5461649773bf44abe998b44e 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java
  7b434ea58451bef6a6566eb241933843ee855606 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveRel.java
  4738c4ac2d33cd15d2db7fe4b8336e1f59dd5212 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveSortRel.java
  f85363d50c1c3eb9cef39072106057669454d4da 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveTableScanRel.java
  bd66459def099df6432f344a9d8439deef09daa6 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveUnionRel.java
  d34fe9540e239c13f6bd23894056305c0c402e0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HiveMergeProjectRule.java
  d6581e64fc8ea183666ea6c91397378456461088 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePartitionPrunerRule.java
  ee19a6cbab0597242214e915745631f76214f70f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePushFilterPastJoinRule.java
  1c483eabcc1aa43cc80d7b71e21a4ae4d30a7e12 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/PartitionPruner.java
  bdc8373877c1684855d256c9d45743f383fc7615 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/FilterSelectivityEstimator.java
  28bf2ad506656b78894467c30364d751b180676e 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdDistinctRowCount.java
  4be57b110c1a45819467d55e8a69e5529989c8f6 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdRowCount.java
  8c7f643940b74dd7743635c3eaa046d52d41346f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdSelectivity.java
  49d2ee5a67b72fbf6134ce71de1d7260069cd16f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdUniqueKeys.java
  c3c8bdd2466b0f46d49437fcf8d49dbb689cfcda 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java
  58320c73aafbfeec025f52ee813b3cfd06fa0821 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
  a217d70e48da0835fed3565ba510bcc9e86c0fa1 
   

[jira] [Updated] (HIVE-8948) TestStreaming is flaky

2014-12-01 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8948:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Eugene for the review.

 TestStreaming is flaky
 --

 Key: HIVE-8948
 URL: https://issues.apache.org/jira/browse/HIVE-8948
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-8948.patch


 TestStreaming seems to fail in one of its tests or another about 1 in 50 
 times.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)

2014-12-01 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8974:
-
Description: 
CLEAR LIBRARY CACHE

Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure 
and renamed a lot of classes. CALCITE-296 has the details, including a 
description of the before:after mapping.

This task is to upgrade to the version of Calcite that has the renamed 
packages. There is a 1.0.0-SNAPSHOT in Apache nexus.

Calcite functionality has not changed significantly, so it should be 
straightforward to rename. This task should be completed ASAP, before Calcite 
moves on.

  was:
Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure 
and renamed a lot of classes. CALCITE-296 has the details, including a 
description of the before:after mapping.

This task is to upgrade to the version of Calcite that has the renamed 
packages. There is a 1.0.0-SNAPSHOT in Apache nexus.

Calcite functionality has not changed significantly, so it should be 
straightforward to rename. This task should be completed ASAP, before Calcite 
moves on.


 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)

2014-12-01 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230718#comment-14230718
 ] 

Laljo John Pullokkaran commented on HIVE-8974:
--

The failure may be due to QA not clearing the local mvn repo.
I have updated your bug description (which should prompt qa run to clear cache).

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8947) HIVE-8876 also affects Postgres 9.2

2014-12-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230726#comment-14230726
 ] 

Sergey Shelukhin commented on HIVE-8947:


[~vikram.dixit] ok for 14.1?

 HIVE-8876 also affects Postgres  9.2
 -

 Key: HIVE-8947
 URL: https://issues.apache.org/jira/browse/HIVE-8947
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0

 Attachments: HIVE-8947.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9002) union all does not generate correct result for order by and limit

2014-12-01 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-9002:
-

 Summary: union all does not generate correct result for order by 
and limit
 Key: HIVE-9002
 URL: https://issues.apache.org/jira/browse/HIVE-9002
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


Right now if we have
select col from A
union all
select col from B [Operator]

it is treated as

(select col from A)
union all
(select col from B [Operator])

Although it is correct for where, group by (having) join operators, it is not 
correct for order by and limit operators. They should be

(select col from A
union all
select col from B) [order by, limit]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9002) union all does not generate correct result for order by and limit

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9002:
--
Description: 
Right now if we have
select col from A
union all
select col from B [Operator]

it is treated as

(select col from A)
union all
(select col from B [Operator])

Although it is correct for where, group by (having) join operators, it is not 
correct for order by and limit operators. They should be

(select col from A
union all
select col from B) [order by, limit]

For order by, we can refer to MySQL, Oracle, DB2

mysql

http://dev.mysql.com/doc/refman/5.1/en/union.html

oracle

https://docs.oracle.com/cd/E17952_01/refman-5.0-en/union.html

ibm

http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafykeyu.htm


  was:
Right now if we have
select col from A
union all
select col from B [Operator]

it is treated as

(select col from A)
union all
(select col from B [Operator])

Although it is correct for where, group by (having) join operators, it is not 
correct for order by and limit operators. They should be

(select col from A
union all
select col from B) [order by, limit]



 union all does not generate correct result for order by and limit
 -

 Key: HIVE-9002
 URL: https://issues.apache.org/jira/browse/HIVE-9002
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong

 Right now if we have
 select col from A
 union all
 select col from B [Operator]
 it is treated as
 (select col from A)
 union all
 (select col from B [Operator])
 Although it is correct for where, group by (having) join operators, it is not 
 correct for order by and limit operators. They should be
 (select col from A
 union all
 select col from B) [order by, limit]
 For order by, we can refer to MySQL, Oracle, DB2
 mysql
 http://dev.mysql.com/doc/refman/5.1/en/union.html
 oracle
 https://docs.oracle.com/cd/E17952_01/refman-5.0-en/union.html
 ibm
 http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafykeyu.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9002) union all does not generate correct result for order by and limit

2014-12-01 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230734#comment-14230734
 ] 

Pengcheng Xiong commented on HIVE-9002:
---

3 candidate ways to fix it
(1) fix that within HiveParser.g
(2) fix that in QB by rewriting
(3) partially reverse the patch of 
https://issues.apache.org/jira/browse/HIVE-6189 and use subqueries for union all

[~jpullokkaran], could you please take a look? 

 union all does not generate correct result for order by and limit
 -

 Key: HIVE-9002
 URL: https://issues.apache.org/jira/browse/HIVE-9002
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong

 Right now if we have
 select col from A
 union all
 select col from B [Operator]
 it is treated as
 (select col from A)
 union all
 (select col from B [Operator])
 Although it is correct for where, group by (having) join operators, it is not 
 correct for order by and limit operators. They should be
 (select col from A
 union all
 select col from B) [order by, limit]
 For order by, we can refer to MySQL, Oracle, DB2
 mysql
 http://dev.mysql.com/doc/refman/5.1/en/union.html
 oracle
 https://docs.oracle.com/cd/E17952_01/refman-5.0-en/union.html
 ibm
 http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafykeyu.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9003) Vectorized IF expr broken for the scalar and scalar case

2014-12-01 Thread Matt McCline (JIRA)
Matt McCline created HIVE-9003:
--

 Summary: Vectorized IF expr broken for the scalar and scalar case
 Key: HIVE-9003
 URL: https://issues.apache.org/jira/browse/HIVE-9003
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1


SELECT IF (bool_col, 'first', 'second') FROM ...

is broken for Vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-860) Persistent distributed cache

2014-12-01 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-860:
--
Attachment: HIVE-860.4.patch

reattach since the error No space left on device (28)

 Persistent distributed cache
 

 Key: HIVE-860
 URL: https://issues.apache.org/jira/browse/HIVE-860
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Zheng Shao
Assignee: Ferdinand Xu
 Fix For: 0.15.0

 Attachments: HIVE-860.1.patch, HIVE-860.2.patch, HIVE-860.2.patch, 
 HIVE-860.3.patch, HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch


 DistributedCache is shared across multiple jobs, if the hdfs file name is the 
 same.
 We need to make sure Hive put the same file into the same location every time 
 and do not overwrite if the file content is the same.
 We can achieve 2 different results:
 A1. Files added with the same name, timestamp, and md5 in the same session 
 will have a single copy in distributed cache.
 A2. Filed added with the same name, timestamp, and md5 will have a single 
 copy in distributed cache.
 A2 has a bigger benefit in sharing but may raise a question on when Hive 
 should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8135) Pool zookeeper connections

2014-12-01 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230762#comment-14230762
 ] 

Ferdinand Xu commented on HIVE-8135:


I think so. http://curator.apache.org/curator-recipes/index.html And I will 
take a look into the details.

 Pool zookeeper connections
 --

 Key: HIVE-8135
 URL: https://issues.apache.org/jira/browse/HIVE-8135
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu

 Today we create a ZK connection per client. We should instead have a 
 connection pool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230774#comment-14230774
 ] 

Hive QA commented on HIVE-9001:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12684492/HIVE-9001.1.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6695 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1941/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1941/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1941/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12684492 - PreCommit-HIVE-TRUNK-Build

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 28500: HIVE-8943 : Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-12-01 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28500/
---

(Updated Dec. 2, 2014, 1:34 a.m.)


Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang.


Changes
---

Fix algorithm and cleanup after discussion with Xuefu.  Original code was too 
aggressively incorporating connected mapjoins into its size calculation, new 
code only looks at the big table's connected mapjoins.


Bugs: HIVE-8943
https://issues.apache.org/jira/browse/HIVE-8943


Repository: hive-git


Description
---

SparkMapJoinOptimizer by default combines nested mapjoins into one work due to 
removal of RS for big-table. So we need to enhance the mapjoin check to 
calculate if all the MapJoins in that work (spark-stage) will fit into the 
memory, otherwise it might overwhelm memory for that particular spark executor.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 819eef1 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 
0c339a5 
  ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION 
  ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/28500/diff/


Testing
---

Added two unit tests:

1.  auto_join_stats, which sets a memory limit and checks that algorithm does 
not put more than 1 mapjoin in one BaseWork
2.  auto_join_stats2, which is the same query without memory limit, and check 
that algorithm puts all mapjoin in one BaseWork because it can.


Thanks,

Szehon Ho



[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8943:

Attachment: HIVE-8943-4.spark.patch

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943-4.spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8943:

Attachment: HIVE-8943-4.spark.branch

Fix algorithm and cleanup after discussion with Xuefu.  Original code was too 
aggressively incorporating connected mapjoins into its size calculation, new 
code only looks at the big table's connected mapjoins.

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943-4.spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-12-01 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8943:

Attachment: (was: HIVE-8943-4.spark.branch)

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943-4.spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]

2014-12-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230811#comment-14230811
 ] 

Rui Li commented on HIVE-8991:
--

Hi [~vanzin], just as [~xuefuz] said, this JIRA is only meant to fix the test 
{{custom_input_output_format.q}} after we enable unit tests with remote spark 
context. Please feel free to take it if you think of a better solution. Thanks!

 Fix custom_input_output_format [Spark Branch]
 -

 Key: HIVE-8991
 URL: https://issues.apache.org/jira/browse/HIVE-8991
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8991.1-spark.patch


 After HIVE-8836, {{custom_input_output_format}} fails because of missing 
 hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests

2014-12-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230812#comment-14230812
 ] 

Rui Li commented on HIVE-8995:
--

OK I'll have a look.

 Find thread leak in RSC Tests
 -

 Key: HIVE-8995
 URL: https://issues.apache.org/jira/browse/HIVE-8995
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li

 I was regenerating output as part of the merge:
 {noformat}
 mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
 -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q
  
 auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q
  
 bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q
  
 join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q
  
 join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q
  
 mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
  
 ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
  
 skewjoinopt3.q,skewjoinopt4.q,skewjoinopt5.q,skewjoinopt6.q,skewjoinopt7.q,skewjoinopt8.q,skewjoinopt9.q,smb_mapjoin9.q,smb_mapjoin_1.q,smb_mapjoin_10.q,smb_mapjoin_13.q,smb_mapjoin_14.q,smb_mapjoin_15.q,smb_mapjoin_16.q,smb_mapjoin_17.q,smb_mapjoin_2.q,smb_mapjoin_25.q,smb_mapjoin_3.q,smb_mapjoin_4.q,smb_mapjoin_5.q,smb_mapjoin_6.q,smb_mapjoin_7.q,sort_merge_join_desc_1.q,sort_merge_join_desc_2.q,sort_merge_join_desc_3.q,sort_merge_join_desc_4.q,sort_merge_join_desc_5.q,sort_merge_join_desc_6.q,sort_merge_join_desc_7.q,sort_merge_join_desc_8.q
  
 

[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Status: Open  (was: Patch Available)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Attachment: HIVE-8774.11.patch

address [~jpullokkaran]'s comments to remove support for constant and function 
inside parameters of count

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.11.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, 
 HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, 
 HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Status: Patch Available  (was: Open)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.11.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, 
 HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, 
 HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >