Re: about spark assembly jar
Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run against exactly what will be used in a real Spark distro. On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote: hi, all I suggest spark not use assembly jar as default run-time dependency(spark-submit/spark-class depend on assembly jar),use a library of all 3rd dependency jar like hadoop/hive/hbase more reasonable. 1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this jar if we want to update the version of some component(such as hadoop) 2 in our practice with spark, sometimes we meet jar compatibility issue, it is hard to diagnose compatibility issue with assembly jar - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: about spark assembly jar
yes, i am not sure what happens when building assembly jar and in my understanding it just package all the dependency jars to a big one. On 2014/9/2 16:45, Sean Owen wrote: Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run against exactly what will be used in a real Spark distro. On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote: hi, all I suggest spark not use assembly jar as default run-time dependency(spark-submit/spark-class depend on assembly jar),use a library of all 3rd dependency jar like hadoop/hive/hbase more reasonable. 1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this jar if we want to update the version of some component(such as hadoop) 2 in our practice with spark, sometimes we meet jar compatibility issue, it is hard to diagnose compatibility issue with assembly jar - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: about spark assembly jar
Sorry, The quick reply didn't cc the dev list. Sean, sometimes I have to use the spark-shell to confirm some behavior change. In that case, I have to reassembly the whole project. is there another way around, not use the the big jar in development? For the original question, I have no comments. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, September 2, 2014 at 4:58 PM, Sean Owen wrote: No, usually you unit-test your changes during development. That doesn't require the assembly. Eventually you may wish to test some change against the complete assembly. But that's a different question; I thought you were suggesting that the assembly JAR should never be created. On Tue, Sep 2, 2014 at 9:53 AM, Ye Xianjin advance...@gmail.com (mailto:advance...@gmail.com) wrote: Hi, Sean: In development, do I really need to reassembly the whole project even if I only change a line or two code in one component? I used to that but found time-consuming. -- Ye Xianjin Sent with Sparrow On Tuesday, September 2, 2014 at 4:45 PM, Sean Owen wrote: Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run against exactly what will be used in a real Spark distro. On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com (mailto:wangf...@huawei.com) wrote: hi, all I suggest spark not use assembly jar as default run-time dependency(spark-submit/spark-class depend on assembly jar),use a library of all 3rd dependency jar like hadoop/hive/hbase more reasonable. 1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this jar if we want to update the version of some component(such as hadoop) 2 in our practice with spark, sometimes we meet jar compatibility issue, it is hard to diagnose compatibility issue with assembly jar - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:dev-unsubscr...@spark.apache.org) For additional commands, e-mail: dev-h...@spark.apache.org (mailto:dev-h...@spark.apache.org) - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:dev-unsubscr...@spark.apache.org) For additional commands, e-mail: dev-h...@spark.apache.org (mailto:dev-h...@spark.apache.org)
Re: about spark assembly jar
Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$.getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getObjectInspector(ArrayWritableObjectInspector.java:66) at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59) at org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:189) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) On 2014/9/2 16:45, Sean Owen wrote: Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run against exactly what will be used in a real Spark distro. On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote: hi, all I suggest spark not use assembly jar as default run-time dependency(spark-submit/spark-class depend on assembly jar),use a library of all 3rd dependency jar like hadoop/hive/hbase more reasonable. 1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this jar if we want to update the version of some component(such as hadoop) 2 in our practice with spark, sometimes we meet jar compatibility issue, it is hard to diagnose compatibility issue with assembly jar - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: about spark assembly jar
This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$. getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$. getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator. init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM. scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call( AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local. IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer. compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main. scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain( Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain( Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.getObjectInspector( ArrayWritableObjectInspector.java:66) at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59) at org.apache.hadoop.hive.ql.io.parquet.serde. ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113) at org.apache.hadoop.hive.metastore.MetaStoreUtils. getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table. getDeserializerFromMetaStore(Table.java:283) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity( Table.java:189) at org.apache.hadoop.hive.ql.metadata.Hive.createTable( Hive.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable( DDLTask.java:4194) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask. java:281) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( TaskRunner.java:85) On 2014/9/2 16:45, Sean Owen wrote: Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run against exactly what will be used in a real Spark distro. On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote: hi, all I suggest spark not use assembly jar as default run-time dependency(spark-submit/spark-class depend on assembly jar),use a library of all 3rd dependency jar like hadoop/hive/hbase more reasonable. 1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this jar if we want to update the version of some component(such as hadoop) 2 in
Re: about spark assembly jar
Having a SSD help tremendously with assembly time. Without that, you can do the following in order for Spark to pick up the compiled classes before assembly at runtime. export SPARK_PREPEND_CLASSES=true On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote: This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$. getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$. getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator. init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM. scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call( AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local. IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer. compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main. scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain( Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain( Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.getObjectInspector( ArrayWritableObjectInspector.java:66) at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59) at org.apache.hadoop.hive.ql.io.parquet.serde. ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113) at org.apache.hadoop.hive.metastore.MetaStoreUtils. getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table. getDeserializerFromMetaStore(Table.java:283) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity( Table.java:189) at org.apache.hadoop.hive.ql.metadata.Hive.createTable( Hive.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable( DDLTask.java:4194) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask. java:281) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( TaskRunner.java:85) On 2014/9/2 16:45, Sean Owen wrote: Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run against exactly what will be used in a real Spark distro. On Tue, Sep 2, 2014 at 9:39 AM, scwf
Re: about spark assembly jar
Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :) Maybe we should add a developer notes page to document all these useful black magic. On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote: Having a SSD help tremendously with assembly time. Without that, you can do the following in order for Spark to pick up the compiled classes before assembly at runtime. export SPARK_PREPEND_CLASSES=true On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote: This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$. getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$. getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator. init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM. scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call( AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local. IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer. compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main. scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain( Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain( Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.getObjectInspector( ArrayWritableObjectInspector.java:66) at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59) at org.apache.hadoop.hive.ql.io.parquet.serde. ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113) at org.apache.hadoop.hive.metastore.MetaStoreUtils. getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table. getDeserializerFromMetaStore(Table.java:283) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity( Table.java:189) at org.apache.hadoop.hive.ql.metadata.Hive.createTable( Hive.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable( DDLTask.java:4194) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask. java:281) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( TaskRunner.java:85) On 2014/9/2 16:45, Sean Owen wrote: Hm, are you suggesting that the Spark
Re: about spark assembly jar
SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be easier to find): https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com) wrote: Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :) Maybe we should add a developer notes page to document all these useful black magic. On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote: Having a SSD help tremendously with assembly time. Without that, you can do the following in order for Spark to pick up the compiled classes before assembly at runtime. export SPARK_PREPEND_CLASSES=true On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote: This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$. getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$. getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator. init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM. scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call( AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local. IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer. compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main. scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain( Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain( Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.getObjectInspector( ArrayWritableObjectInspector.java:66) at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59) at org.apache.hadoop.hive.ql.io.parquet.serde. ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113) at org.apache.hadoop.hive.metastore.MetaStoreUtils. getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table. getDeserializerFromMetaStore(Table.java:283) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity( Table.java:189) at org.apache.hadoop.hive.ql.metadata.Hive.createTable( Hive.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable( DDLTask.java:4194) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask. java:281) at
Re: about spark assembly jar
Cool, didn't notice that, thanks Josh! On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen rosenvi...@gmail.com wrote: SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be easier to find): https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com) wrote: Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :) Maybe we should add a developer notes page to document all these useful black magic. On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote: Having a SSD help tremendously with assembly time. Without that, you can do the following in order for Spark to pick up the compiled classes before assembly at runtime. export SPARK_PREPEND_CLASSES=true On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote: This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$. getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$. getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator. init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM. scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call( AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local. IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer. compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main. scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain( Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain( Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.getObjectInspector( ArrayWritableObjectInspector.java:66) at org.apache.hadoop.hive.ql.io.parquet.serde. ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59) at org.apache.hadoop.hive.ql.io.parquet.serde. ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113) at org.apache.hadoop.hive.metastore.MetaStoreUtils. getDeserializer(MetaStoreUtils.java:339) at org.apache.hadoop.hive.ql.metadata.Table. getDeserializerFromMetaStore(Table.java:283) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity( Table.java:189) at org.apache.hadoop.hive.ql.metadata.Hive.createTable( Hive.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable( DDLTask.java:4194) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask. java:281) at
Re: about spark assembly jar
Yea, SSD + SPARK_PREPEND_CLASSES is great for iterative development! Then why it is ok with a bag of 3rd jars but throw error with assembly jar, any one have idea? On 2014/9/3 2:57, Cheng Lian wrote: Cool, didn't notice that, thanks Josh! On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen rosenvi...@gmail.com mailto:rosenvi...@gmail.com wrote: SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be easier to find): https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com mailto:lian.cs@gmail.com) wrote: Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :) Maybe we should add a developer notes page to document all these useful black magic. On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com mailto:r...@databricks.com wrote: Having a SSD help tremendously with assembly time. Without that, you can do the following in order for Spark to pick up the compiled classes before assembly at runtime. export SPARK_PREPEND_CLASSES=true On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com mailto:sandy.r...@cloudera.com wrote: This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com mailto:wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found. at scala.tools.nsc.symtab.Definitions$definitions$. getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$. getClass(Definitions.scala:608) at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator. init(GenJVM.scala:127) at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM. scala:85) at scala.tools.nsc.Global$Run.compileSources(Global.scala:953) at scala.tools.nsc.Global$Run.compile(Global.scala:1041) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102) at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call( AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile( AnalyzingCompiler.scala:41) at org.jetbrains.jps.incremental.scala.local. IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28) at org.jetbrains.jps.incremental.scala.local.LocalServer. compile(LocalServer.scala:25) at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main. scala:58) at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain( Main.scala:21) at org.jetbrains.jps.incremental.scala.remote.Main.nailMain( Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) 2 i test my branch which updated hive version to org.apache.hive 0.13.1 it run successfully when use a bag of 3rd jars as dependency but throw error using assembly jar, it seems assembly jar lead to conflict ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo at org.apache.hadoop.hive.ql.io.parquet.serde.