Hi Gourav,

I did a prove as you said, for me it’s working, I am using spark in local mode, 
master and worker in the same machine. I run the example in spark-shell 
—package com.databricks:spark-csv_2.10:1.3.0 without errors.

BR

From:  Gourav Sengupta <gourav.sengu...@gmail.com>
Date:  Monday, February 15, 2016 at 10:03
To:  Jorge Machado <jom...@me.com>
Cc:  Spark Group <user@spark.apache.org>
Subject:  Re: Using SPARK packages in Spark Cluster

Hi Jorge/ All,

Please please please go through this link  
http://spark.apache.org/docs/latest/spark-standalone.html. 
The link tells you how to start a SPARK cluster in local mode. If you have not 
started or worked in SPARK cluster in local mode kindly do not attempt in 
answering this question.

My question is how to use packages like  
https://github.com/databricks/spark-csv when I using SPARK cluster in local 
mode.

Regards,
Gourav Sengupta


On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado <jom...@me.com> wrote:
Hi Gourav, 

I did not unterstand your problem… the - - packages  command should not make 
any difference if you are running standalone or in YARN for example.  
Give us an example what packages are you trying to load, and what error are you 
getting…  If you want to use the libraries in spark-packages.org without the 
--packages why do you not use maven ? 
Regards 


On 12/02/2016, at 13:22, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:

Hi,

I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
http://spark.apache.org/docs/latest/spark-standalone.html using the following 
code:

--------------------------------------------------------------------------------------------------------------------------
sc.stop()
conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
                  .setMaster("spark://hostname:7077") \
                  .set('spark.shuffle.service.enabled', True) \
                  .set('spark.dynamicAllocation.enabled','true') \
                  .set('spark.executor.memory','20g') \
                  .set('spark.driver.memory', '4g') \
                  .set('spark.default.parallelism',(multiprocessing.cpu_count() 
-1 ))
conf.getAll()
sc = SparkContext(conf = conf)

-----(we should definitely be able to optimise the configuration but that is 
not the point here) ---

I am not able to use packages, a list of which is mentioned here 
http://spark-packages.org, using this method. 

Where as if I use the standard "pyspark --packages" option then the packages 
load just fine.

I will be grateful if someone could kindly let me know how to load packages 
when starting a cluster as mentioned above.


Regards,
Gourav Sengupta




-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Reply via email to