Hi Gourav, I did a prove as you said, for me it’s working, I am using spark in local mode, master and worker in the same machine. I run the example in spark-shell —package com.databricks:spark-csv_2.10:1.3.0 without errors.
BR From: Gourav Sengupta <gourav.sengu...@gmail.com> Date: Monday, February 15, 2016 at 10:03 To: Jorge Machado <jom...@me.com> Cc: Spark Group <user@spark.apache.org> Subject: Re: Using SPARK packages in Spark Cluster Hi Jorge/ All, Please please please go through this link http://spark.apache.org/docs/latest/spark-standalone.html. The link tells you how to start a SPARK cluster in local mode. If you have not started or worked in SPARK cluster in local mode kindly do not attempt in answering this question. My question is how to use packages like https://github.com/databricks/spark-csv when I using SPARK cluster in local mode. Regards, Gourav Sengupta On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado <jom...@me.com> wrote: Hi Gourav, I did not unterstand your problem… the - - packages command should not make any difference if you are running standalone or in YARN for example. Give us an example what packages are you trying to load, and what error are you getting… If you want to use the libraries in spark-packages.org without the --packages why do you not use maven ? Regards On 12/02/2016, at 13:22, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: Hi, I am creating sparkcontext in a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html using the following code: -------------------------------------------------------------------------------------------------------------------------- sc.stop() conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \ .setMaster("spark://hostname:7077") \ .set('spark.shuffle.service.enabled', True) \ .set('spark.dynamicAllocation.enabled','true') \ .set('spark.executor.memory','20g') \ .set('spark.driver.memory', '4g') \ .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) conf.getAll() sc = SparkContext(conf = conf) -----(we should definitely be able to optimise the configuration but that is not the point here) --- I am not able to use packages, a list of which is mentioned here http://spark-packages.org, using this method. Where as if I use the standard "pyspark --packages" option then the packages load just fine. I will be grateful if someone could kindly let me know how to load packages when starting a cluster as mentioned above. Regards, Gourav Sengupta -- Informativa sulla Privacy: http://www.unibs.it/node/8155