Hi, So far no one is able to get my question at all. I know what it takes to load packages via SPARK shell or SPARK submit.
How do I load packages when starting a SPARK cluster, as mentioned here http://spark.apache.org/docs/latest/spark-standalone.html ? Regards, Gourav Sengupta On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com> wrote: > with conf option > > spark-submit --conf 'key = value ' > > Hope that helps you. > > On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com> > wrote: > >> Hi Gourav, >> you can use like below to load packages at the start of the spark shell. >> >> spark-shell --packages com.databricks:spark-csv_2.10:1.1.0 >> >> On 14 February 2016 at 03:34, Gourav Sengupta <gourav.sengu...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I was interested in knowing how to load the packages into SPARK cluster >>> started locally. Can someone pass me on the links to set the conf file so >>> that the packages can be loaded? >>> >>> Regards, >>> Gourav >>> >>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com> wrote: >>> >>>> Hello Gourav, >>>> >>>> The packages need to be loaded BEFORE you start the JVM, therefore you >>>> won't be able to add packages dynamically in code. You should use the >>>> --packages with pyspark before you start your application. >>>> One option is to add a `conf` that will load some packages if you are >>>> constantly going to use them. >>>> >>>> Best, >>>> Burak >>>> >>>> >>>> >>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta < >>>> gourav.sengu...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am creating sparkcontext in a SPARK standalone cluster as mentioned >>>>> here: http://spark.apache.org/docs/latest/spark-standalone.html using >>>>> the following code: >>>>> >>>>> >>>>> -------------------------------------------------------------------------------------------------------------------------- >>>>> sc.stop() >>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \ >>>>> .setMaster("spark://hostname:7077") \ >>>>> .set('spark.shuffle.service.enabled', True) \ >>>>> .set('spark.dynamicAllocation.enabled','true') \ >>>>> .set('spark.executor.memory','20g') \ >>>>> .set('spark.driver.memory', '4g') \ >>>>> >>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) >>>>> conf.getAll() >>>>> sc = SparkContext(conf = conf) >>>>> >>>>> -----(we should definitely be able to optimise the configuration but >>>>> that is not the point here) --- >>>>> >>>>> I am not able to use packages, a list of which is mentioned here >>>>> http://spark-packages.org, using this method. >>>>> >>>>> Where as if I use the standard "pyspark --packages" option then the >>>>> packages load just fine. >>>>> >>>>> I will be grateful if someone could kindly let me know how to load >>>>> packages when starting a cluster as mentioned above. >>>>> >>>>> >>>>> Regards, >>>>> Gourav Sengupta >>>>> >>>> >>>> >>> >> >