Re: Submitting job with external dependencies to pyspark

2020-01-28 Thread Chris Teoh
Usually this isn't done as the data is meant to be on a shared/distributed storage, eg HDFS, S3, etc. Spark should then read this data into a dataframe and your code logic applies to the dataframe in a distributed manner. On Wed, 29 Jan 2020 at 09:37, Tharindu Mathew wrote: > That was really

Re: Submitting job with external dependencies to pyspark

2020-01-28 Thread Tharindu Mathew
That was really helpful. Thanks! I actually solved my problem using by creating a venv and using the venv flags. Wondering now how to submit the data as an archive? Any idea? On Mon, Jan 27, 2020, 9:25 PM Chris Teoh wrote: > Use --py-files > > See >

Start a standalone server as root and use it with user accounts

2020-01-28 Thread Ben Caine
Hi, I'd like to have a single standalone server, running as root on my machine, on which jobs can be run from multiple user accounts on the same machine. However, when I do this, writing files gives me error similar to the one in this Stackoverflow question