pass configuration parameters to PySpark job

Oleg Ruchovets Mon, 18 May 2015 07:33:47 -0700

Hi ,
   I am looking a way to pass configuration parameters to spark job.
In general I have quite simple PySpark job.


  def process_model(k, vc):
       ....
       do something
       ....


 sc = SparkContext(appName="TAD")
    lines = sc.textFile(input_job_files)
    result = lines.map(doSplit).groupByKey().map(lambda (k,vc):
process_model(k,vc))

Question:
    In case I need to pass to process_model function additional metadata ,
parameters , etc ...

   I tried to do something like
   param = 'param1'
  result = lines.map(doSplit).groupByKey().map(lambda (param,k,vc):
process_model(param1,k,vc)) ,

but job stops to work , also it looks like not elegant solution.
Is there a way to have access to SparkContext from my custom functions?
I found that there are methods setLocalProperty/getLocalProperty   but I
didn't find example how to use it for my requirements (from my function).

It would be great to have short example how to pass parameters.

Thanks
Oleg.

pass configuration parameters to PySpark job

Reply via email to