[ 
https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013142#comment-14013142
 ] 

Tathagata Das commented on SPARK-1911:
--------------------------------------

As far as I think, it is because Java 7 uses Zip64 encoding when making JARs 
with more 2^16 files and python (at least 2.x) is not able to read Zip64. So it 
fails in those times when the Spark assembly JAR has more than 65k files, which 
in turn depends on whether it has been generated with YARN and/or Hive enabled.

> Warn users that jars should be built with Java 6 for PySpark to work on YARN
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-1911
>                 URL: https://issues.apache.org/jira/browse/SPARK-1911
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation
>            Reporter: Andrew Or
>             Fix For: 1.0.0
>
>
> Python sometimes fails to read jars created by Java 7. This is necessary for 
> PySpark to work in YARN, and so Spark assembly JAR should compiled in Java 6 
> for PySpark to work on YARN.
> Currently we warn users only in make-distribution.sh, but most users build 
> the jars directly. We should emphasize it in the docs especially for PySpark 
> and YARN because this issue is not trivial to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to