Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/
---

(Updated Jan. 22, 2015, 9:23 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Spark driver may need to load extra added class in 2 place, first, while 
execute GetJobStatusJob, it need to deserialize SparkWork. Second, while 
HiveInputFormat get splits, it need to deserialize MapWork.
Remote Driver execute AddJarJob in netty rpc thread directly as it's 
SyncJobRquest, and execute GetJobStatusJob(which wraps spark job) with its 
threadpool. HiveInputFormat get splits may happens in akka thread pool, as 
Spark send message through akka between SparkContext and DAGScheduler. So we 
may need to reset 2 threads classloader to enable this dynamic add jar in RSC.


Bugs: HIVE-9410
https://issues.apache.org/jira/browse/HIVE-9410


Repository: hive-git


Description
---

The RemoteDriver does not contains added jar in it's classpath, so it would 
failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
thread context classloader) and add it to distributed cache as well. Compare to 
Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add 
added jar into it's classpath as well.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
30a00a7 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
00aa4ec 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
1eb3ff2 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
5f9be65 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/30107/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69329
---


I'm wondering what's the story for Hive CLI. Hive CLI can add jars from local 
file system. Would this work for Hive on Spark?


ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/30107/#comment114004

Callers of getBaseWork() will add the jars to the classpath. Why this is 
necessary? Who are the callers? Any side-effect?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java
https://reviews.apache.org/r/30107/#comment114005

So, this is the code that adds the jars to the classpath of the remote 
driver?

I'm wondering why these jars are necessary in order to deserailize 
SparkWork.


- Xuefu Zhang


On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated Jan. 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li


 On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
  I'm wondering what's the story for Hive CLI. Hive CLI can add jars from 
  local file system. Would this work for Hive on Spark?

Hive CLI add jars to classpath dynamically same as this patch does for 
RemoteDriver, update thread context classloader with added jars path included. 
For Hive on Spark, Hive CLI stay the same, the issue is that RemoteDriver does 
not add these added jars into its class path, so the NoClassFound error come 
out while RemoteDriver side need related class.


 On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 367
  https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line367
 
  Callers of getBaseWork() will add the jars to the classpath. Why this 
  is necessary? Who are the callers? Any side-effect?

The reason why we need to do this is that, getBaseWork() would generate 
MapWork/ReduceWork which contains Hive operators inside, and UDTFOperator which 
contains added jar class need to be loaded. To load added jar dynamically, we 
need to reset thread context classloader, as mentioned in previous change 
summary, unlike HiveCLI, there are 2 threads in RemoteDriver side may need to 
load added jar, For akka thread, there is no proper cut-in point for add jars 
to classpath.
The side-effect is that, many HiveCLI threads may have to check to update its 
classload unneccsary.
Another possible solution is that, we update SystemClassLoader for RemoteDriver 
dynamically, which must be done in a quite hacky way, such as:

URLClassLoader sysloader = (URLClassLoader) 
ClassLoader.getSystemClassLoader();
Class sysclass = URLClassLoader.class;

try {
Method method = sysclass.getDeclaredMethod(addURL, parameters);
method.setAccessible(true);
method.invoke(sysloader, new Object[] {u});
} catch (Throwable t) {
t.printStackTrace();
throw new IOException(Error, could not add URL to system 
classloader);
}

Which one do you prefer?


 On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java,
   line 220
  https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220
 
  So, this is the code that adds the jars to the classpath of the remote 
  driver?
  
  I'm wondering why these jars are necessary in order to deserailize 
  SparkWork.

Same as previous comments, SparkWork contains MapWork/ReduceWork which contains 
operator tree, UTFFOperator need to load added jar class.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69329
---


On 一月 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated 一月 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69336
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/30107/#comment114014

#3 this would be executed in akka thread, get extra jar path from JobConf, 
and add to current thread classloader.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java
https://reviews.apache.org/r/30107/#comment114013

#2 this job is executed in thread RemoteDriver threadpool, it get extra jar 
paths from JobContext, add them to current thread classloader, and set them to 
JobConf.



spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java
https://reviews.apache.org/r/30107/#comment114012

#1  add extra jar path to JobContext, this job is executed in netty 
connection thread.


- chengxiang li


On 一月 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated 一月 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li


 On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java,
   line 220
  https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220
 
  So, this is the code that adds the jars to the classpath of the remote 
  driver?
  
  I'm wondering why these jars are necessary in order to deserailize 
  SparkWork.
 
 chengxiang li wrote:
 Same as previous comments, SparkWork contains MapWork/ReduceWork which 
 contains operator tree, UTFFOperator need to load added jar class.
 
 Xuefu Zhang wrote:
 Sorry, but which operator? UTFFOperator? I could find it in hive source.

Sorry, as you can see from the error log in JIRA, the extra class in added jar 
is contained in UDTFOperator:

org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
de.bankmark.bigbench.queries.q10.SentimentUDF
Serialization trace:
genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69329
---


On 一月 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated 一月 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread Xuefu Zhang


 On Jan. 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java,
   line 220
  https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220
 
  So, this is the code that adds the jars to the classpath of the remote 
  driver?
  
  I'm wondering why these jars are necessary in order to deserailize 
  SparkWork.
 
 chengxiang li wrote:
 Same as previous comments, SparkWork contains MapWork/ReduceWork which 
 contains operator tree, UTFFOperator need to load added jar class.

Sorry, but which operator? UTFFOperator? I could find it in hive source.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69329
---


On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated Jan. 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread Xuefu Zhang


 On Jan. 23, 2015, 3:02 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 371
  https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line371
 
  #3 this would be executed in akka thread, get extra jar path from 
  JobConf, and add to current thread classloader.

what thread is referred as akka thread?


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69336
---


On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated Jan. 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li


 On 一月 23, 2015, 3:02 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 371
  https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line371
 
  #3 this would be executed in akka thread, get extra jar path from 
  JobConf, and add to current thread classloader.
 
 Xuefu Zhang wrote:
 what thread is referred as akka thread?

Inside Spark driver, SparkContext submit spark job to DAGSchedule through akka 
message instead of directly invoke, akka hold a thread pool to handle messages.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69336
---


On 一月 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated 一月 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69341
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/30107/#comment114022

Should we also check addedJars.isEmpty() to be consistent with other places?


- Xuefu Zhang


On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated Jan. 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69344
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/30107/#comment114027

Could we add a check, something like:

if (hive.execution.engine==spark) {
  try {
  ...
}

The code as it is might make other people frown.


- Xuefu Zhang


On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated Jan. 22, 2015, 9:23 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 30a00a7 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69357
---

Ship it!


Ship It!

- Xuefu Zhang


On Jan. 23, 2015, 6:37 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30107/
 ---
 
 (Updated Jan. 23, 2015, 6:37 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9410
 https://issues.apache.org/jira/browse/HIVE-9410
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The RemoteDriver does not contains added jar in it's classpath, so it would 
 failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
 while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
 thread context classloader) and add it to distributed cache as well. Compare 
 to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
 add added jar into it's classpath as well.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 6340d1c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9d9f4e6 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 a4a166a 
   ql/src/test/queries/clientpositive/lateral_view_explode2.q PRE-CREATION 
   ql/src/test/results/clientpositive/lateral_view_explode2.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
 PRE-CREATION 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 00aa4ec 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 1eb3ff2 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 5f9be65 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30107/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li
 




Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/
---

(Updated Jan. 23, 2015, 6:44 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-9410
https://issues.apache.org/jira/browse/HIVE-9410


Repository: hive-git


Description
---

The RemoteDriver does not contains added jar in it's classpath, so it would 
failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
thread context classloader) and add it to distributed cache as well. Compare to 
Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add 
added jar into it's classpath as well.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 6340d1c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9d9f4e6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
a4a166a 
  ql/src/test/queries/clientpositive/lateral_view_explode2.q PRE-CREATION 
  ql/src/test/results/clientpositive/lateral_view_explode2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
00aa4ec 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
1eb3ff2 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
5f9be65 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/30107/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-22 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/
---

(Updated Jan. 23, 2015, 6:37 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

add more comments and fix what xuefu mentioned before.


Bugs: HIVE-9410
https://issues.apache.org/jira/browse/HIVE-9410


Repository: hive-git


Description
---

The RemoteDriver does not contains added jar in it's classpath, so it would 
failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
thread context classloader) and add it to distributed cache as well. Compare to 
Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add 
added jar into it's classpath as well.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 6340d1c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9d9f4e6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
a4a166a 
  ql/src/test/queries/clientpositive/lateral_view_explode2.q PRE-CREATION 
  ql/src/test/results/clientpositive/lateral_view_explode2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
PRE-CREATION 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
00aa4ec 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
1eb3ff2 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
5f9be65 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/30107/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-21 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/
---

(Updated Jan. 22, 2015, 3:53 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-9410
https://issues.apache.org/jira/browse/HIVE-9410


Repository: hive-git


Description
---

The RemoteDriver does not contains added jar in it's classpath, so it would 
failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
thread context classloader) and add it to distributed cache as well. Compare to 
Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add 
added jar into it's classpath as well.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
30a00a7 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
00aa4ec 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
1eb3ff2 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
5f9be65 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/30107/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-21 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/
---

(Updated Jan. 22, 2015, 3:54 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-9410
https://issues.apache.org/jira/browse/HIVE-9410


Repository: hive-git


Description
---

The RemoteDriver does not contains added jar in it's classpath, so it would 
failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
thread context classloader) and add it to distributed cache as well. Compare to 
Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add 
added jar into it's classpath as well.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
30a00a7 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
00aa4ec 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
1eb3ff2 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
5f9be65 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/30107/diff/


Testing
---


Thanks,

chengxiang li



Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]

2015-01-20 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-9410
https://issues.apache.org/jira/browse/HIVE-9410


Repository: hive-git


Description
---

The RemoteDriver does not contains added jar in it's classpath, so it would 
failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
thread context classloader) and add it to distributed cache as well. Compare to 
Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add 
added jar into it's classpath as well.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
044f189 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
00aa4ec 
  spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
1eb3ff2 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
851e937 
  
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/30107/diff/


Testing
---


Thanks,

chengxiang li