Spark Mongodb connector hangs indefinitely, not working on Amazon EMR

Daniel Stojanov Tue, 21 Apr 2020 19:11:15 -0700

When running a Pyspark application on my local machine I am able to save
and retrieve from the Mongodb server using the Mongodb Spark connector. All
works properly. When submitting the exact same application on my Amazon EMR
cluster I can see that the package for the Spark driver is being properly
collected from Maven when the job is submitted. However, it is not working.


>From my instance of Amazon EMR I can communicate with the database using
Pymongo without problems. I can load/save dataframes when using pyspark
interactively from the driver, but when submitting jobs via spark-submit
over the yarn cluster it hangs.

The problem gives no error messages, it just shows 0 activity on the driver
and executor. The pyspark application just stops until manually terminated.

Has anyone else used the Mongodb Spark connector from Amazon EMR?


--

Spark Mongodb connector hangs indefinitely, not working on Amazon EMR

Reply via email to