about spark assembly jar

2014-09-02 Thread scwf

hi, all
  I suggest spark not use assembly jar as default run-time 
dependency(spark-submit/spark-class depend on assembly jar),use a library of 
all 3rd dependency jar like hadoop/hive/hbase more reasonable.

  1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this 
jar if we want to update the version of some component(such as hadoop)
  2 in our practice with spark, sometimes we meet jar compatibility issue, it 
is hard to diagnose compatibility issue with assembly jar







-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: about spark assembly jar

2014-09-02 Thread Sean Owen
Hm, are you suggesting that the Spark distribution be a bag of 100
JARs? It doesn't quite seem reasonable. It does not remove version
conflicts, just pushes them to run-time, which isn't good. The
assembly is also necessary because that's where shading happens. In
development, you want to run against exactly what will be used in a
real Spark distro.

On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote:
 hi, all
   I suggest spark not use assembly jar as default run-time
 dependency(spark-submit/spark-class depend on assembly jar),use a library of
 all 3rd dependency jar like hadoop/hive/hbase more reasonable.

   1 assembly jar packaged all 3rd jars into a big one, so we need rebuild
 this jar if we want to update the version of some component(such as hadoop)
   2 in our practice with spark, sometimes we meet jar compatibility issue,
 it is hard to diagnose compatibility issue with assembly jar







 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: about spark assembly jar

2014-09-02 Thread scwf

yes, i am not sure what happens when building assembly jar and in my 
understanding it just package all the dependency jars to a big one.

On 2014/9/2 16:45, Sean Owen wrote:

Hm, are you suggesting that the Spark distribution be a bag of 100
JARs? It doesn't quite seem reasonable. It does not remove version
conflicts, just pushes them to run-time, which isn't good. The
assembly is also necessary because that's where shading happens. In
development, you want to run against exactly what will be used in a
real Spark distro.

On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote:

hi, all
   I suggest spark not use assembly jar as default run-time
dependency(spark-submit/spark-class depend on assembly jar),use a library of
all 3rd dependency jar like hadoop/hive/hbase more reasonable.

   1 assembly jar packaged all 3rd jars into a big one, so we need rebuild
this jar if we want to update the version of some component(such as hadoop)
   2 in our practice with spark, sometimes we meet jar compatibility issue,
it is hard to diagnose compatibility issue with assembly jar







-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org








-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: about spark assembly jar

2014-09-02 Thread Ye Xianjin
Sorry, The quick reply didn't cc the dev list.

Sean, sometimes I have to use the spark-shell to confirm some behavior change. 
In that case, I have to reassembly the whole project.  is there another way 
around, not use the the big jar in development? For the original question, I 
have no comments. 

-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, September 2, 2014 at 4:58 PM, Sean Owen wrote:

 No, usually you unit-test your changes during development. That
 doesn't require the assembly. Eventually you may wish to test some
 change against the complete assembly.
 
 But that's a different question; I thought you were suggesting that
 the assembly JAR should never be created.
 
 On Tue, Sep 2, 2014 at 9:53 AM, Ye Xianjin advance...@gmail.com 
 (mailto:advance...@gmail.com) wrote:
  Hi, Sean:
  In development, do I really need to reassembly the whole project even if I
  only change a line or two code in one component?
  I used to that but found time-consuming.
  
  --
  Ye Xianjin
  Sent with Sparrow
  
  On Tuesday, September 2, 2014 at 4:45 PM, Sean Owen wrote:
  
  Hm, are you suggesting that the Spark distribution be a bag of 100
  JARs? It doesn't quite seem reasonable. It does not remove version
  conflicts, just pushes them to run-time, which isn't good. The
  assembly is also necessary because that's where shading happens. In
  development, you want to run against exactly what will be used in a
  real Spark distro.
  
  On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com 
  (mailto:wangf...@huawei.com) wrote:
  
  hi, all
  I suggest spark not use assembly jar as default run-time
  dependency(spark-submit/spark-class depend on assembly jar),use a library of
  all 3rd dependency jar like hadoop/hive/hbase more reasonable.
  
  1 assembly jar packaged all 3rd jars into a big one, so we need rebuild
  this jar if we want to update the version of some component(such as hadoop)
  2 in our practice with spark, sometimes we meet jar compatibility issue,
  it is hard to diagnose compatibility issue with assembly jar
  
  
  
  
  
  
  
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
  (mailto:dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org 
  (mailto:dev-h...@spark.apache.org)
  
  
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
  (mailto:dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org 
  (mailto:dev-h...@spark.apache.org)
  
 
 
 




Re: about spark assembly jar

2014-09-02 Thread scwf

Hi sean owen,
here are some problems when i used assembly jar
1 i put spark-assembly-*.jar to the lib directory of my application, it throw 
compile error

Error:scalac: Error: class scala.reflect.BeanInfo not found.
scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found.

at 
scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655)

at 
scala.tools.nsc.symtab.Definitions$definitions$.getClass(Definitions.scala:608)

at 
scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.init(GenJVM.scala:127)

at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.scala:85)

at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)

at scala.tools.nsc.Global$Run.compile(Global.scala:1041)

at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)

at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)

at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)

at xsbt.CompilerInterface.run(CompilerInterface.scala:27)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102)

at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:48)

at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)

at 
org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)

at 
org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:25)

at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:58)

at 
org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:21)

at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
2 i test my branch which updated hive version to org.apache.hive 0.13.1
  it run successfully when use a bag of 3rd jars as dependency but throw error 
using assembly jar, it seems assembly jar lead to conflict
  ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getObjectInspector(ArrayWritableObjectInspector.java:66)
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283)
at 
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:189)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:597)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)




On 2014/9/2 16:45, Sean Owen wrote:

Hm, are you suggesting that the Spark distribution be a bag of 100
JARs? It doesn't quite seem reasonable. It does not remove version
conflicts, just pushes them to run-time, which isn't good. The
assembly is also necessary because that's where shading happens. In
development, you want to run against exactly what will be used in a
real Spark distro.

On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote:

hi, all
   I suggest spark not use assembly jar as default run-time
dependency(spark-submit/spark-class depend on assembly jar),use a library of
all 3rd dependency jar like hadoop/hive/hbase more reasonable.

   1 assembly jar packaged all 3rd jars into a big one, so we need rebuild
this jar if we want to update the version of some component(such as hadoop)
   2 in our practice with spark, sometimes we meet jar compatibility issue,
it is hard to diagnose compatibility issue with assembly jar







-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org









Re: about spark assembly jar

2014-09-02 Thread Sandy Ryza
This doesn't help for every dependency, but Spark provides an option to
build the assembly jar without Hadoop and its dependencies.  We make use of
this in CDH packaging.

-Sandy


On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote:

 Hi sean owen,
 here are some problems when i used assembly jar
 1 i put spark-assembly-*.jar to the lib directory of my application, it
 throw compile error

 Error:scalac: Error: class scala.reflect.BeanInfo not found.
 scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not
 found.

 at scala.tools.nsc.symtab.Definitions$definitions$.
 getModuleOrClass(Definitions.scala:655)

 at scala.tools.nsc.symtab.Definitions$definitions$.
 getClass(Definitions.scala:608)

 at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.
 init(GenJVM.scala:127)

 at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
 scala:85)

 at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)

 at scala.tools.nsc.Global$Run.compile(Global.scala:1041)

 at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)

 at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)

 at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)

 at xsbt.CompilerInterface.run(CompilerInterface.scala:27)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:39)

 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at sbt.compiler.AnalyzingCompiler.call(
 AnalyzingCompiler.scala:102)

 at sbt.compiler.AnalyzingCompiler.compile(
 AnalyzingCompiler.scala:48)

 at sbt.compiler.AnalyzingCompiler.compile(
 AnalyzingCompiler.scala:41)

 at org.jetbrains.jps.incremental.scala.local.
 IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)

 at org.jetbrains.jps.incremental.scala.local.LocalServer.
 compile(LocalServer.scala:25)

 at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
 scala:58)

 at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
 Main.scala:21)

 at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
 Main.scala)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:39)

 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
 2 i test my branch which updated hive version to org.apache.hive 0.13.1
   it run successfully when use a bag of 3rd jars as dependency but throw
 error using assembly jar, it seems assembly jar lead to conflict
   ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
 at org.apache.hadoop.hive.ql.io.parquet.serde.
 ArrayWritableObjectInspector.getObjectInspector(
 ArrayWritableObjectInspector.java:66)
 at org.apache.hadoop.hive.ql.io.parquet.serde.
 ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)
 at org.apache.hadoop.hive.ql.io.parquet.serde.
 ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
 at org.apache.hadoop.hive.metastore.MetaStoreUtils.
 getDeserializer(MetaStoreUtils.java:339)
 at org.apache.hadoop.hive.ql.metadata.Table.
 getDeserializerFromMetaStore(Table.java:283)
 at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(
 Table.java:189)
 at org.apache.hadoop.hive.ql.metadata.Hive.createTable(
 Hive.java:597)
 at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
 DDLTask.java:4194)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.
 java:281)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
 TaskRunner.java:85)





 On 2014/9/2 16:45, Sean Owen wrote:

 Hm, are you suggesting that the Spark distribution be a bag of 100
 JARs? It doesn't quite seem reasonable. It does not remove version
 conflicts, just pushes them to run-time, which isn't good. The
 assembly is also necessary because that's where shading happens. In
 development, you want to run against exactly what will be used in a
 real Spark distro.

 On Tue, Sep 2, 2014 at 9:39 AM, scwf wangf...@huawei.com wrote:

 hi, all
I suggest spark not use assembly jar as default run-time
 dependency(spark-submit/spark-class depend on assembly jar),use a
 library of
 all 3rd dependency jar like hadoop/hive/hbase more reasonable.

1 assembly jar packaged all 3rd jars into a big one, so we need
 rebuild
 this jar if we want to update the version of some component(such as
 hadoop)
2 in 

Re: about spark assembly jar

2014-09-02 Thread Reynold Xin
Having a SSD help tremendously with assembly time.

Without that, you can do the following in order for Spark to pick up the
compiled classes before assembly at runtime.

export SPARK_PREPEND_CLASSES=true


On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 This doesn't help for every dependency, but Spark provides an option to
 build the assembly jar without Hadoop and its dependencies.  We make use of
 this in CDH packaging.

 -Sandy


 On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote:

  Hi sean owen,
  here are some problems when i used assembly jar
  1 i put spark-assembly-*.jar to the lib directory of my application, it
  throw compile error
 
  Error:scalac: Error: class scala.reflect.BeanInfo not found.
  scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not
  found.
 
  at scala.tools.nsc.symtab.Definitions$definitions$.
  getModuleOrClass(Definitions.scala:655)
 
  at scala.tools.nsc.symtab.Definitions$definitions$.
  getClass(Definitions.scala:608)
 
  at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.
  init(GenJVM.scala:127)
 
  at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
  scala:85)
 
  at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)
 
  at scala.tools.nsc.Global$Run.compile(Global.scala:1041)
 
  at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
 
  at
 xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)
 
  at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
 
  at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke(
  NativeMethodAccessorImpl.java:39)
 
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
  DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at sbt.compiler.AnalyzingCompiler.call(
  AnalyzingCompiler.scala:102)
 
  at sbt.compiler.AnalyzingCompiler.compile(
  AnalyzingCompiler.scala:48)
 
  at sbt.compiler.AnalyzingCompiler.compile(
  AnalyzingCompiler.scala:41)
 
  at org.jetbrains.jps.incremental.scala.local.
  IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)
 
  at org.jetbrains.jps.incremental.scala.local.LocalServer.
  compile(LocalServer.scala:25)
 
  at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
  scala:58)
 
  at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
  Main.scala:21)
 
  at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
  Main.scala)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke(
  NativeMethodAccessorImpl.java:39)
 
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
  DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
  2 i test my branch which updated hive version to org.apache.hive 0.13.1
it run successfully when use a bag of 3rd jars as dependency but throw
  error using assembly jar, it seems assembly jar lead to conflict
ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
  at org.apache.hadoop.hive.ql.io.parquet.serde.
  ArrayWritableObjectInspector.getObjectInspector(
  ArrayWritableObjectInspector.java:66)
  at org.apache.hadoop.hive.ql.io.parquet.serde.
  ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)
  at org.apache.hadoop.hive.ql.io.parquet.serde.
  ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
  at org.apache.hadoop.hive.metastore.MetaStoreUtils.
  getDeserializer(MetaStoreUtils.java:339)
  at org.apache.hadoop.hive.ql.metadata.Table.
  getDeserializerFromMetaStore(Table.java:283)
  at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(
  Table.java:189)
  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(
  Hive.java:597)
  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
  DDLTask.java:4194)
  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.
  java:281)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
  TaskRunner.java:85)
 
 
 
 
 
  On 2014/9/2 16:45, Sean Owen wrote:
 
  Hm, are you suggesting that the Spark distribution be a bag of 100
  JARs? It doesn't quite seem reasonable. It does not remove version
  conflicts, just pushes them to run-time, which isn't good. The
  assembly is also necessary because that's where shading happens. In
  development, you want to run against exactly what will be used in a
  real Spark distro.
 
  On Tue, Sep 2, 2014 at 9:39 AM, scwf 

Re: about spark assembly jar

2014-09-02 Thread Cheng Lian
Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)

Maybe we should add a developer notes page to document all these useful
black magic.


On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote:

 Having a SSD help tremendously with assembly time.

 Without that, you can do the following in order for Spark to pick up the
 compiled classes before assembly at runtime.

 export SPARK_PREPEND_CLASSES=true


 On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

  This doesn't help for every dependency, but Spark provides an option to
  build the assembly jar without Hadoop and its dependencies.  We make use
 of
  this in CDH packaging.
 
  -Sandy
 
 
  On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote:
 
   Hi sean owen,
   here are some problems when i used assembly jar
   1 i put spark-assembly-*.jar to the lib directory of my application, it
   throw compile error
  
   Error:scalac: Error: class scala.reflect.BeanInfo not found.
   scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo
 not
   found.
  
   at scala.tools.nsc.symtab.Definitions$definitions$.
   getModuleOrClass(Definitions.scala:655)
  
   at scala.tools.nsc.symtab.Definitions$definitions$.
   getClass(Definitions.scala:608)
  
   at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.
   init(GenJVM.scala:127)
  
   at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
   scala:85)
  
   at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)
  
   at scala.tools.nsc.Global$Run.compile(Global.scala:1041)
  
   at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
  
   at
  xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)
  
   at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
  
   at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke(
   NativeMethodAccessorImpl.java:39)
  
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(
   DelegatingMethodAccessorImpl.java:25)
  
   at java.lang.reflect.Method.invoke(Method.java:597)
  
   at sbt.compiler.AnalyzingCompiler.call(
   AnalyzingCompiler.scala:102)
  
   at sbt.compiler.AnalyzingCompiler.compile(
   AnalyzingCompiler.scala:48)
  
   at sbt.compiler.AnalyzingCompiler.compile(
   AnalyzingCompiler.scala:41)
  
   at org.jetbrains.jps.incremental.scala.local.
   IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)
  
   at org.jetbrains.jps.incremental.scala.local.LocalServer.
   compile(LocalServer.scala:25)
  
   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
   scala:58)
  
   at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
   Main.scala:21)
  
   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
   Main.scala)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke(
   NativeMethodAccessorImpl.java:39)
  
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(
   DelegatingMethodAccessorImpl.java:25)
  
   at java.lang.reflect.Method.invoke(Method.java:597)
  
   at
 com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
   2 i test my branch which updated hive version to org.apache.hive 0.13.1
 it run successfully when use a bag of 3rd jars as dependency but
 throw
   error using assembly jar, it seems assembly jar lead to conflict
 ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
   at org.apache.hadoop.hive.ql.io.parquet.serde.
   ArrayWritableObjectInspector.getObjectInspector(
   ArrayWritableObjectInspector.java:66)
   at org.apache.hadoop.hive.ql.io.parquet.serde.
  
 ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)
   at org.apache.hadoop.hive.ql.io.parquet.serde.
   ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
   at org.apache.hadoop.hive.metastore.MetaStoreUtils.
   getDeserializer(MetaStoreUtils.java:339)
   at org.apache.hadoop.hive.ql.metadata.Table.
   getDeserializerFromMetaStore(Table.java:283)
   at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(
   Table.java:189)
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(
   Hive.java:597)
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
   DDLTask.java:4194)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.
   java:281)
   at
 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
   TaskRunner.java:85)
  
  
  
  
  
   On 2014/9/2 16:45, Sean Owen wrote:
  
   Hm, are you suggesting that the Spark 

Re: about spark assembly jar

2014-09-02 Thread Josh Rosen
SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be 
easier to find): 
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools


On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com) wrote:

Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)  

Maybe we should add a developer notes page to document all these useful  
black magic.  


On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote:  

 Having a SSD help tremendously with assembly time.  
  
 Without that, you can do the following in order for Spark to pick up the  
 compiled classes before assembly at runtime.  
  
 export SPARK_PREPEND_CLASSES=true  
  
  
 On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com  
 wrote:  
  
  This doesn't help for every dependency, but Spark provides an option to  
  build the assembly jar without Hadoop and its dependencies. We make use  
 of  
  this in CDH packaging.  
   
  -Sandy  
   
   
  On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote:  
   
   Hi sean owen,  
   here are some problems when i used assembly jar  
   1 i put spark-assembly-*.jar to the lib directory of my application, it  
   throw compile error  

   Error:scalac: Error: class scala.reflect.BeanInfo not found.  
   scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo  
 not  
   found.  

   at scala.tools.nsc.symtab.Definitions$definitions$.  
   getModuleOrClass(Definitions.scala:655)  

   at scala.tools.nsc.symtab.Definitions$definitions$.  
   getClass(Definitions.scala:608)  

   at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.  
   init(GenJVM.scala:127)  

   at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.  
   scala:85)  

   at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)  

   at scala.tools.nsc.Global$Run.compile(Global.scala:1041)  

   at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)  

   at  
  xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)  

   at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)  

   at xsbt.CompilerInterface.run(CompilerInterface.scala:27)  

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  

   at sun.reflect.NativeMethodAccessorImpl.invoke(  
   NativeMethodAccessorImpl.java:39)  

   at sun.reflect.DelegatingMethodAccessorImpl.invoke(  
   DelegatingMethodAccessorImpl.java:25)  

   at java.lang.reflect.Method.invoke(Method.java:597)  

   at sbt.compiler.AnalyzingCompiler.call(  
   AnalyzingCompiler.scala:102)  

   at sbt.compiler.AnalyzingCompiler.compile(  
   AnalyzingCompiler.scala:48)  

   at sbt.compiler.AnalyzingCompiler.compile(  
   AnalyzingCompiler.scala:41)  

   at org.jetbrains.jps.incremental.scala.local.  
   IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)  

   at org.jetbrains.jps.incremental.scala.local.LocalServer.  
   compile(LocalServer.scala:25)  

   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.  
   scala:58)  

   at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(  
   Main.scala:21)  

   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(  
   Main.scala)  

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  

   at sun.reflect.NativeMethodAccessorImpl.invoke(  
   NativeMethodAccessorImpl.java:39)  

   at sun.reflect.DelegatingMethodAccessorImpl.invoke(  
   DelegatingMethodAccessorImpl.java:25)  

   at java.lang.reflect.Method.invoke(Method.java:597)  

   at  
 com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)  
   2 i test my branch which updated hive version to org.apache.hive 0.13.1  
   it run successfully when use a bag of 3rd jars as dependency but  
 throw  
   error using assembly jar, it seems assembly jar lead to conflict  
   ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo  
   at org.apache.hadoop.hive.ql.io.parquet.serde.  
   ArrayWritableObjectInspector.getObjectInspector(  
   ArrayWritableObjectInspector.java:66)  
   at org.apache.hadoop.hive.ql.io.parquet.serde.  

 ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)  
   at org.apache.hadoop.hive.ql.io.parquet.serde.  
   ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)  
   at org.apache.hadoop.hive.metastore.MetaStoreUtils.  
   getDeserializer(MetaStoreUtils.java:339)  
   at org.apache.hadoop.hive.ql.metadata.Table.  
   getDeserializerFromMetaStore(Table.java:283)  
   at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(  
   Table.java:189)  
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(  
   Hive.java:597)  
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(  
   DDLTask.java:4194)  
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.  
   java:281)  
   at  
 

Re: about spark assembly jar

2014-09-02 Thread Cheng Lian
Cool, didn't notice that, thanks Josh!


On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen rosenvi...@gmail.com wrote:

 SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could
 probably be easier to find):
 https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools


 On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com)
 wrote:

 Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)

 Maybe we should add a developer notes page to document all these useful
 black magic.


 On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote:

  Having a SSD help tremendously with assembly time.
 
  Without that, you can do the following in order for Spark to pick up the
  compiled classes before assembly at runtime.
 
  export SPARK_PREPEND_CLASSES=true
 
 
  On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com
  wrote:
 
   This doesn't help for every dependency, but Spark provides an option
 to
   build the assembly jar without Hadoop and its dependencies. We make
 use
  of
   this in CDH packaging.
  
   -Sandy
  
  
   On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote:
  
Hi sean owen,
here are some problems when i used assembly jar
1 i put spark-assembly-*.jar to the lib directory of my application,
 it
throw compile error
   
Error:scalac: Error: class scala.reflect.BeanInfo not found.
scala.tools.nsc.MissingRequirementError: class
 scala.reflect.BeanInfo
  not
found.
   
at scala.tools.nsc.symtab.Definitions$definitions$.
getModuleOrClass(Definitions.scala:655)
   
at scala.tools.nsc.symtab.Definitions$definitions$.
getClass(Definitions.scala:608)
   
at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.
init(GenJVM.scala:127)
   
at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
scala:85)
   
at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)
   
at scala.tools.nsc.Global$Run.compile(Global.scala:1041)
   
at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
   
at
   xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)
   
at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
   
at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
   
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
   
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
   
at java.lang.reflect.Method.invoke(Method.java:597)
   
at sbt.compiler.AnalyzingCompiler.call(
AnalyzingCompiler.scala:102)
   
at sbt.compiler.AnalyzingCompiler.compile(
AnalyzingCompiler.scala:48)
   
at sbt.compiler.AnalyzingCompiler.compile(
AnalyzingCompiler.scala:41)
   
at org.jetbrains.jps.incremental.scala.local.
IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)
   
at org.jetbrains.jps.incremental.scala.local.LocalServer.
compile(LocalServer.scala:25)
   
at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
scala:58)
   
at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
Main.scala:21)
   
at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
Main.scala)
   
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
   
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
   
at java.lang.reflect.Method.invoke(Method.java:597)
   
at
  com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
2 i test my branch which updated hive version to org.apache.hive
 0.13.1
it run successfully when use a bag of 3rd jars as dependency but
  throw
error using assembly jar, it seems assembly jar lead to conflict
ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
at org.apache.hadoop.hive.ql.io.parquet.serde.
ArrayWritableObjectInspector.getObjectInspector(
ArrayWritableObjectInspector.java:66)
at org.apache.hadoop.hive.ql.io.parquet.serde.
   
 
 ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)
at org.apache.hadoop.hive.ql.io.parquet.serde.
ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.
getDeserializer(MetaStoreUtils.java:339)
at org.apache.hadoop.hive.ql.metadata.Table.
getDeserializerFromMetaStore(Table.java:283)
at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(
Table.java:189)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(
Hive.java:597)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
DDLTask.java:4194)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.
java:281)
at
  

Re: about spark assembly jar

2014-09-02 Thread scwf

Yea, SSD + SPARK_PREPEND_CLASSES is great for iterative development!

Then why it is ok with a bag of 3rd jars but throw error with assembly jar, any 
one have idea?

On 2014/9/3 2:57, Cheng Lian wrote:

Cool, didn't notice that, thanks Josh!


On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen rosenvi...@gmail.com 
mailto:rosenvi...@gmail.com wrote:

SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably 
be easier to find): 
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools


On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com 
mailto:lian.cs@gmail.com) wrote:


Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)

Maybe we should add a developer notes page to document all these useful
black magic.


On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com 
mailto:r...@databricks.com wrote:

 Having a SSD help tremendously with assembly time.

 Without that, you can do the following in order for Spark to pick up the
 compiled classes before assembly at runtime.

 export SPARK_PREPEND_CLASSES=true


 On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com 
mailto:sandy.r...@cloudera.com
 wrote:

  This doesn't help for every dependency, but Spark provides an option to
  build the assembly jar without Hadoop and its dependencies.  We make use
 of
  this in CDH packaging.
 
  -Sandy
 
 
  On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com 
mailto:wangf...@huawei.com wrote:
 
   Hi sean owen,
   here are some problems when i used assembly jar
   1 i put spark-assembly-*.jar to the lib directory of my application, 
it
   throw compile error
  
   Error:scalac: Error: class scala.reflect.BeanInfo not found.
   scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo
 not
   found.
  
   at scala.tools.nsc.symtab.Definitions$definitions$.
   getModuleOrClass(Definitions.scala:655)
  
   at scala.tools.nsc.symtab.Definitions$definitions$.
   getClass(Definitions.scala:608)
  
   at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.
   init(GenJVM.scala:127)
  
   at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.
   scala:85)
  
   at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)
  
   at scala.tools.nsc.Global$Run.compile(Global.scala:1041)
  
   at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
  
   at
  xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)
  
   at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
  
   at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke(
   NativeMethodAccessorImpl.java:39)
  
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(
   DelegatingMethodAccessorImpl.java:25)
  
   at java.lang.reflect.Method.invoke(Method.java:597)
  
   at sbt.compiler.AnalyzingCompiler.call(
   AnalyzingCompiler.scala:102)
  
   at sbt.compiler.AnalyzingCompiler.compile(
   AnalyzingCompiler.scala:48)
  
   at sbt.compiler.AnalyzingCompiler.compile(
   AnalyzingCompiler.scala:41)
  
   at org.jetbrains.jps.incremental.scala.local.
   IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)
  
   at org.jetbrains.jps.incremental.scala.local.LocalServer.
   compile(LocalServer.scala:25)
  
   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.
   scala:58)
  
   at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(
   Main.scala:21)
  
   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(
   Main.scala)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  
   at sun.reflect.NativeMethodAccessorImpl.invoke(
   NativeMethodAccessorImpl.java:39)
  
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(
   DelegatingMethodAccessorImpl.java:25)
  
   at java.lang.reflect.Method.invoke(Method.java:597)
  
   at
 com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
   2 i test my branch which updated hive version to org.apache.hive 
0.13.1
 it run successfully when use a bag of 3rd jars as dependency but
 throw
   error using assembly jar, it seems assembly jar lead to conflict
 ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
   at org.apache.hadoop.hive.ql.io.parquet.serde.