[CONNECT] Why Can't We Specify Cluster Deploy Mode for Spark Connect?

Nagatomi Yasukazu Sun, 08 Sep 2024 17:08:53 -0700

Hi All,

Why is it not possible to specify cluster as the deploy mode for Spark
Connect?


As discussed in the following thread, it appears that there is an
"arbitrary decision" within spark-submit that "Cluster mode is not
applicable" to Spark Connect.

GitHub Issue Comment:
https://github.com/kubeflow/spark-operator/issues/1801#issuecomment-2000494607

> This will circumvent the submission error you may have gotten if you
tried to just run the SparkConnectServer directly. From my investigation,
that looks to be an arbitrary decision within spark-submit that Cluster
mode is "not applicable" to SparkConnect. Which is sort of true except when
using this operator :)

I have reviewed the following commit and pull request, but I could not find
any discussion or reason explaining why cluster mode is not available:

Related Commit:
https://github.com/apache/spark/commit/11260310f65e1a30f6b00b380350e414609c5fd4

Related Pull Request:
https://github.com/apache/spark/pull/39928

This restriction poses a significant obstacle when trying to use Spark
Connect with the Spark Operator. If there is a technical reason for this, I
would like to know more about it. Additionally, if this issue is being
tracked on JIRA or elsewhere, I would appreciate it if you could provide a
link.

Thank you in advance.

Best regards,
Yasukazu Nagatomi

[CONNECT] Why Can't We Specify Cluster Deploy Mode for Spark Connect?

Reply via email to