about spark assembly jar

2014-09-02 Thread scwf
hi, all I suggest spark not use assembly jar as default run-time dependency(spark-submit/spark-class depend on assembly jar),use a library of all 3rd dependency jar like hadoop/hive/hbase more reasonable. 1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this jar if

Re: about spark assembly jar

2014-09-02 Thread Sean Owen
Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not remove version conflicts, just pushes them to run-time, which isn't good. The assembly is also necessary because that's where shading happens. In development, you want to run

Re: about spark assembly jar

2014-09-02 Thread scwf
yes, i am not sure what happens when building assembly jar and in my understanding it just package all the dependency jars to a big one. On 2014/9/2 16:45, Sean Owen wrote: Hm, are you suggesting that the Spark distribution be a bag of 100 JARs? It doesn't quite seem reasonable. It does not

Re: about spark assembly jar

2014-09-02 Thread Ye Xianjin
Sorry, The quick reply didn't cc the dev list. Sean, sometimes I have to use the spark-shell to confirm some behavior change. In that case, I have to reassembly the whole project. is there another way around, not use the the big jar in development? For the original question, I have no

Re: about spark assembly jar

2014-09-02 Thread scwf
Hi sean owen, here are some problems when i used assembly jar 1 i put spark-assembly-*.jar to the lib directory of my application, it throw compile error Error:scalac: Error: class scala.reflect.BeanInfo not found. scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found.

Re: about spark assembly jar

2014-09-02 Thread Sandy Ryza
This doesn't help for every dependency, but Spark provides an option to build the assembly jar without Hadoop and its dependencies. We make use of this in CDH packaging. -Sandy On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote: Hi sean owen, here are some problems when i used

Re: about spark assembly jar

2014-09-02 Thread Reynold Xin
Having a SSD help tremendously with assembly time. Without that, you can do the following in order for Spark to pick up the compiled classes before assembly at runtime. export SPARK_PREPEND_CLASSES=true On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote: This doesn't

Re: about spark assembly jar

2014-09-02 Thread Cheng Lian
Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :) Maybe we should add a developer notes page to document all these useful black magic. On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote: Having a SSD help tremendously with assembly time. Without that, you can

Re: about spark assembly jar

2014-09-02 Thread Josh Rosen
SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be easier to find):  https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com) wrote: Yea, SSD + SPARK_PREPEND_CLASSES totally

Re: about spark assembly jar

2014-09-02 Thread Cheng Lian
Cool, didn't notice that, thanks Josh! On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen rosenvi...@gmail.com wrote: SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be easier to find): https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On

Re: about spark assembly jar

2014-09-02 Thread scwf
Yea, SSD + SPARK_PREPEND_CLASSES is great for iterative development! Then why it is ok with a bag of 3rd jars but throw error with assembly jar, any one have idea? On 2014/9/3 2:57, Cheng Lian wrote: Cool, didn't notice that, thanks Josh! On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen