I think you have to use Spark Java API, in PySpark, functions running on spark executors (such as map function) can only written in python.
On Thu, Sep 28, 2017 at 12:48 AM, Giuseppe Celano < cel...@informatik.uni-leipzig.de> wrote: > Hi everyone, > > I would like to apply a java script to many files in parallel. I am > wondering whether I should definitely use the Spark Java API, or I could > also run the script using the Python API (with which I am more familiar > with), without this affecting performance. Thanks. > > Giuseppe > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >