Re: Questions about serialization and SparkConf

Ilya Ganelin Wed, 29 Oct 2014 20:08:28 -0700

Hello Steve .

1) When you call new SparkConf you should get an object with the default
config values. You can reference the spark configuration and tuning pages
for details on what those are.

2) Yes. Properties set in this configuration will be pushed down to worker
nodes actually executing the spark job. The way this is done is through the
instance of a SparkContext which accepts the SparkConf as a parameter. This
shared config is what will be used by all RDDs and processes spawned as a
function of this context.

E.g. when creating a new RDD with sc.parallelize() or reading a text file
in with sc.textFile() .

I think that to address 3-4 you should reason in terms of the SparkContext.

In short, you shouldn't need to worry about explicitly controlling what is
happening on the slave nodes. Spark should abstract away that layer so that
you can write parallelizable code that the resource manager i.e. YARN
pushes out to your cluster.
On Oct 29, 2014 2:58 PM, "Steve Lewis" <lordjoe2...@gmail.com> wrote:

>      Assume in my executor I say
>
> SparkConf sparkConf = new SparkConf();
>         sparkConf.set("spark.kryo.registrator",
> "com.lordjoe.distributed.hydra.HydraKryoSerializer");
>        sparkConf.set("mysparc.data", "Some user Data");
>          sparkConf.setAppName("Some App");
>
> Now
>    1) Are there default values set in some system file which are populated
> if I call new SparkConf - if not how do I get those? _ I think i see
> defaults foe the master, the Serializer...
>     2) If I set a property in SparkConf for my SparkContext will I see
> that property in a Slave machine?
>      3) If I set a property anf then call showSparkProperties() do I see
> that property set and if not how can I see the property set - say in
> another thread as in
>  if in some other thread on the executor   say as in
> showSparkPropertiesInAnotherThread();
>       4) How can a slave machine access properties set on the executor
>
> I an really interested in   sparkConf.set("spark.kryo.registrator",
> "com.lordjoe.distributed.hydra.HydraKryoSerializer");
>     which needs to be used by the Slave
>
>
>    /**
>      * dump all spark properties to System.err
>      */
>     public static void showSparkProperties()
>     {
>         SparkConf sparkConf = new SparkConf();
>         Tuple2<String, String>[] all = sparkConf.getAll();
>         for (Tuple2<String, String> prp  : all) {
>             System.err.println(prp._1().toString() + "=" + prp._2());
>         }
>     }
>
>     public static void  showSparkPropertiesInAnotherThread()
>     {
>         new Thread(new Runnable() {
>             @Override
>             public void run() {
>                 showSparkProperties();
>             }
>         }).start();
>     }
>
>

Re: Questions about serialization and SparkConf

Reply via email to