When running a Pyspark application on my local machine I am able to save and retrieve from the Mongodb server using the Mongodb Spark connector. All works properly. When submitting the exact same application on my Amazon EMR cluster I can see that the package for the Spark driver is being properly collected from Maven when the job is submitted. However, it is not working.
>From my instance of Amazon EMR I can communicate with the database using Pymongo without problems. I can load/save dataframes when using pyspark interactively from the driver, but when submitting jobs via spark-submit over the yarn cluster it hangs. The problem gives no error messages, it just shows 0 activity on the driver and executor. The pyspark application just stops until manually terminated. Has anyone else used the Mongodb Spark connector from Amazon EMR? --
