[jira] [Comment Edited] (SPARK-1911) Warn users that jars should be built with Java 6 for PySpark to work on YARN

2014-05-29 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013142#comment-14013142
 ] 

Tathagata Das edited comment on SPARK-1911 at 5/30/14 12:31 AM:


As far as I think, it is because Java 7 uses Zip64 encoding when making JARs 
with more 2^16 files and python (at least 2.x) is not able to read Zip64. So it 
fails in those times when the Spark assembly JAR has more than 65k files, which 
in turn depends on whether it has been generated with YARN and/or Hive enabled.

Java 6 uses the traditional Zip format to create JARs, even if it has more than 
65k files. So python always seems to work with Java 6 Jars

Caveat: I cant claim 100% certainty on this interpretation because there is so 
little documentation on this on the net.


was (Author: tdas):
As far as I think, it is because Java 7 uses Zip64 encoding when making JARs 
with more 2^16 files and python (at least 2.x) is not able to read Zip64. So it 
fails in those times when the Spark assembly JAR has more than 65k files, which 
in turn depends on whether it has been generated with YARN and/or Hive enabled.

 Warn users that jars should be built with Java 6 for PySpark to work on YARN
 

 Key: SPARK-1911
 URL: https://issues.apache.org/jira/browse/SPARK-1911
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Reporter: Andrew Or
 Fix For: 1.0.0


 Python sometimes fails to read jars created by Java 7. This is necessary for 
 PySpark to work in YARN, and so Spark assembly JAR should compiled in Java 6 
 for PySpark to work on YARN.
 Currently we warn users only in make-distribution.sh, but most users build 
 the jars directly. We should emphasize it in the docs especially for PySpark 
 and YARN because this issue is not trivial to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1911) Warn users that jars should be built with Java 6 for PySpark to work on YARN

2014-05-24 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008260#comment-14008260
 ] 

Tathagata Das edited comment on SPARK-1911 at 5/25/14 1:43 AM:
---

https://github.com/apache/spark/pull/859/ adds a warning in 
make-distribution.sh if they try to compile spark with java 6.

Note that the actual problem of making Java 7 compiled JARs work with Python 
still needs to be solved.


was (Author: tdas):
https://github.com/apache/spark/pull/859/files

 Warn users that jars should be built with Java 6 for PySpark to work on YARN
 

 Key: SPARK-1911
 URL: https://issues.apache.org/jira/browse/SPARK-1911
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Reporter: Andrew Or
 Fix For: 1.0.0


 Python sometimes fails to read jars created by Java 7. This is necessary for 
 PySpark to work in YARN, and so Spark assembly JAR should compiled in Java 6 
 for PySpark to work on YARN.
 Currently we warn users only in make-distribution.sh, but most users build 
 the jars directly. We should emphasize it in the docs especially for PySpark 
 and YARN because this issue is not trivial to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)