Spark is written in Scala, so yes it's still the strongest option. You also get the Dataset type with Scala (compile time type-safety), and that's not an available feature with Python.
That said, I think the Python API is a viable candidate if you use Pandas for Data Science. There are similarities between the DataFrame and Pandas APIs, and you can convert a Spark DataFrame to a Pandas DataFrame. On Mon, Nov 21, 2016 at 1:51 PM, Brandon White <bwwintheho...@gmail.com> wrote: > Hello all, > > I will be starting a new Spark codebase and I would like to get opinions > on using Python over Scala. Historically, the Scala API has always been the > strongest interface to Spark. Is this still true? Are there still many > benefits and additional features in the Scala API that are not available in > the Python API? Are there any performance concerns using the Python API > that do not exist when using the Scala API? Anything else I should know > about? > > I appreciate any insight you have on using the Scala API over the Python > API. > > Brandon >