Re: Submitting job with external dependencies to pyspark
Usually this isn't done as the data is meant to be on a shared/distributed storage, eg HDFS, S3, etc. Spark should then read this data into a dataframe and your code logic applies to the dataframe in a distributed manner. On Wed, 29 Jan 2020 at 09:37, Tharindu Mathew wrote: > That was really helpful. Thanks! I actually solved my problem using by > creating a venv and using the venv flags. Wondering now how to submit the > data as an archive? Any idea? > > On Mon, Jan 27, 2020, 9:25 PM Chris Teoh wrote: > >> Use --py-files >> >> See >> https://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies >> >> I hope that helps. >> >> On Tue, 28 Jan 2020, 9:46 am Tharindu Mathew, >> wrote: >> >>> Hi, >>> >>> Newbie to pyspark/spark here. >>> >>> I'm trying to submit a job to pyspark with a dependency. Spark DL in >>> this case. While the local environment has this the pyspark does not see >>> it. How do I correctly start pyspark so that it sees this dependency? >>> >>> Using Spark 2.3.0 in a cloudera setup. >>> >>> -- >>> Regards, >>> Tharindu Mathew >>> http://tharindumathew.com >>> >> -- Chris
Re: Submitting job with external dependencies to pyspark
That was really helpful. Thanks! I actually solved my problem using by creating a venv and using the venv flags. Wondering now how to submit the data as an archive? Any idea? On Mon, Jan 27, 2020, 9:25 PM Chris Teoh wrote: > Use --py-files > > See > https://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies > > I hope that helps. > > On Tue, 28 Jan 2020, 9:46 am Tharindu Mathew, > wrote: > >> Hi, >> >> Newbie to pyspark/spark here. >> >> I'm trying to submit a job to pyspark with a dependency. Spark DL in this >> case. While the local environment has this the pyspark does not see it. How >> do I correctly start pyspark so that it sees this dependency? >> >> Using Spark 2.3.0 in a cloudera setup. >> >> -- >> Regards, >> Tharindu Mathew >> http://tharindumathew.com >> >
Re: Submitting job with external dependencies to pyspark
Use --py-files See https://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies I hope that helps. On Tue, 28 Jan 2020, 9:46 am Tharindu Mathew, wrote: > Hi, > > Newbie to pyspark/spark here. > > I'm trying to submit a job to pyspark with a dependency. Spark DL in this > case. While the local environment has this the pyspark does not see it. How > do I correctly start pyspark so that it sees this dependency? > > Using Spark 2.3.0 in a cloudera setup. > > -- > Regards, > Tharindu Mathew > http://tharindumathew.com >
Submitting job with external dependencies to pyspark
Hi, Newbie to pyspark/spark here. I'm trying to submit a job to pyspark with a dependency. Spark DL in this case. While the local environment has this the pyspark does not see it. How do I correctly start pyspark so that it sees this dependency? Using Spark 2.3.0 in a cloudera setup. -- Regards, Tharindu Mathew http://tharindumathew.com