I've never tried to run a stand-alone cluster alongside hadoop, but why not run Spark as a yarn application? That way it can absolutely (in fact preferably) use the distributed file system.
On Fri, Nov 9, 2018 at 5:04 PM, Arijit Tarafdar <arij...@live.com> wrote: > Hello All, > > > > We have a requirement to run PySpark in standalone cluster mode and also > reference python libraries (egg/wheel) which are not local but placed in a > distributed storage like HDFS. From the code it looks like none of cases > are supported. > > > > Questions are: > > > > 1. Why is PySpark supported only in standalone client mode? > 2. Why –py-files only support local files and not files stored in > remote stores? > > > > We will like to update the Spark code to support these scenarios but just > want to be aware of any technical difficulties that the community has faced > while trying to support those. > > > > Thanks, Arijit >