Hi Gourav,

If your question is how to distribute python package dependencies across
the Spark cluster programmatically? ...here is an example -

         $ export
PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'

And in code:

        sc.addPyFile('/path/to/thrift.zip')
        sc.addPyFile('/path/to/happybase.zip')

Regards,
Ram



On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> So far no one is able to get my question at all. I know what it takes to
> load packages via SPARK shell or SPARK submit.
>
> How do I load packages when starting a SPARK cluster, as mentioned here
> http://spark.apache.org/docs/latest/spark-standalone.html ?
>
>
> Regards,
> Gourav Sengupta
>
>
>
>
> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> with conf option
>>
>> spark-submit --conf 'key = value '
>>
>> Hope that helps you.
>>
>> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com>
>> wrote:
>>
>>> Hi Gourav,
>>> you can use like below to load packages at the start of the spark shell.
>>>
>>> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>>>
>>> On 14 February 2016 at 03:34, Gourav Sengupta <gourav.sengu...@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I was interested in knowing how to load the packages into SPARK cluster
>>>> started locally. Can someone pass me on the links to set the conf file so
>>>> that the packages can be loaded?
>>>>
>>>> Regards,
>>>> Gourav
>>>>
>>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com> wrote:
>>>>
>>>>> Hello Gourav,
>>>>>
>>>>> The packages need to be loaded BEFORE you start the JVM, therefore you
>>>>> won't be able to add packages dynamically in code. You should use the
>>>>> --packages with pyspark before you start your application.
>>>>> One option is to add a `conf` that will load some packages if you are
>>>>> constantly going to use them.
>>>>>
>>>>> Best,
>>>>> Burak
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
>>>>> gourav.sengu...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am creating sparkcontext in a SPARK standalone cluster as
>>>>>> mentioned here:
>>>>>> http://spark.apache.org/docs/latest/spark-standalone.html using the
>>>>>> following code:
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------------------------------------------------------
>>>>>> sc.stop()
>>>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False)
>>>>>> \
>>>>>>                   .setMaster("spark://hostname:7077") \
>>>>>>                   .set('spark.shuffle.service.enabled', True) \
>>>>>>                   .set('spark.dynamicAllocation.enabled','true') \
>>>>>>                   .set('spark.executor.memory','20g') \
>>>>>>                   .set('spark.driver.memory', '4g') \
>>>>>>
>>>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>>>>>> conf.getAll()
>>>>>> sc = SparkContext(conf = conf)
>>>>>>
>>>>>> -----(we should definitely be able to optimise the configuration but
>>>>>> that is not the point here) ---
>>>>>>
>>>>>> I am not able to use packages, a list of which is mentioned here
>>>>>> http://spark-packages.org, using this method.
>>>>>>
>>>>>> Where as if I use the standard "pyspark --packages" option then the
>>>>>> packages load just fine.
>>>>>>
>>>>>> I will be grateful if someone could kindly let me know how to load
>>>>>> packages when starting a cluster as mentioned above.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Gourav Sengupta
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to