[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972397#comment-13972397 ] Sean Owen commented on SPARK-1520: -- Madness. One wild guess is that the breeze .jar files have something in META-INF that, when merged together into the assembly jar, conflicts with other META-INF items. In particular I'm thinking of MANIFEST.MF entries. It's worth diffing those if you can from before and after. However this would still require that Java 7 and 6 behave differently with respect to the entries, to explain your findings. It's possible. Your last comment however suggests it's something strange with the byte code that gets output for a few classes. Java 7 is stricter about byte code. For example: https://weblogs.java.net/blog/fabriziogiudici/archive/2012/05/07/understanding-subtle-new-behaviours-jdk-7 However I would think these would manifest as quite different errors. What about running with -verbose:class to print classloading messages? it might point directly to what's failing to load, if that's it. Of course you can always build with Java 6 since that's supposed to be all that's supported/required now (see my other JIRA about making Jenkins do this), although I agree that it would be nice to get to the bottom of this, as there is no obvious reason this shouldn't work. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar, it sometimes works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972523#comment-13972523 ] Sean Owen commented on SPARK-1520: -- Regarding large numbers of files: are there INDEX.LST files used anywhere in the jars? If this gets munged or truncated while building the assembly jar, that might cause all kinds of havoc. It could be omitted. http://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#Index_File_Specification Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973291#comment-13973291 ] Xiangrui Meng commented on SPARK-1520: -- I'm using Java 6 JDK located at /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home on a mac. It can create a jar with more than 65536 files. I also found this JIRA: https://bugs.openjdk.java.net/browse/JDK-4828461 (Support Zip files with more than 64k entries) which was fixed in version 6. Note that this is for openjdk. I'm going to check the headers of assembly jars created by java 6 and 7. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973306#comment-13973306 ] Xiangrui Meng commented on SPARK-1520: -- When I try to use jar-1.6 to untar the assembly jar created by java 7: ~~~ java.util.zip.ZipException: invalid CEN header (bad signature) at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:128) at java.util.zip.ZipFile.init(ZipFile.java:89) at sun.tools.jar.Main.list(Main.java:977) at sun.tools.jar.Main.run(Main.java:222) at sun.tools.jar.Main.main(Main.java:1147) ~~~ 7z shows: ~~~ Path = spark-assembly-1.6.jar Type = zip Physical Size = 119682511 Path = spark-assembly-1.7.jar Type = zip 64-bit = + Physical Size = 119682587 ~~~ I think the number of files limit is already increased in Java 6 (at least in the latest update), but Java 7 will use zip64 format for more than 64k files, and this format cannot be recognized by Java 6. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973326#comment-13973326 ] Xiangrui Meng commented on SPARK-1520: -- The quick fix may be removing fastutil. In RDD#countApproxDistinct, we use HyperLogLog from com.clearspring.analytics:stream, which depends on fastutil. If this is the only place that introduces fastutil dependency, we should implement HyperLogLog and remove fastutil completely from Spark's dependencies. Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 - Key: SPARK-1520 URL: https://issues.apache.org/jira/browse/SPARK-1520 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major version 50). What happens is that, silently, the JRE will not load any class files from the assembled jar. {code} $ sbt/sbt assembly/assembly $ /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR] $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will exit. {code} I also noticed that if the jar is unzipped, and the classpath set to the currently directory, it just works. Finally, if the assembly jar is compiled with JDK6, it also works. The error is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only in master. *Isolation* -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:- https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 SPARK-1212 -I narrowed this down specifically to the inclusion of the breeze library. Just adding breeze to an older (unaffected) build triggered the issue.- I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it always works: {code} $ cd assembly/target/scala-2.10/ $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # fails $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar org.apache.spark.ui.UIWorkloadGenerator # succeeds {code} I also noticed something of note. The Breeze package contains single directories that have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible we are hitting some weird bugs/corner cases with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)