HyukjinKwon commented on code in PR #49107:
URL: https://github.com/apache/spark/pull/49107#discussion_r1945916315
##########
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala:
##########
@@ -237,6 +265,16 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
testPySpark(false)
}
+ test("run Python application with Spark Connect in yarn-client mode") {
Review Comment:
Actually the reason seems to be:
```
Traceback (most recent call last):
File
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py",
line 28, in require_minimum_pandas_version
ModuleNotFoundError: No module named 'pandas'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/home/runner/work/spark/spark/resource-managers/yarn/target/tmp/spark-ba2c7cc1-250b-4e3d-89aa-a6c729012dcf/test.py",
line 13, in <module>
"spark.api.mode", "connect").master("yarn").getOrCreate()
^^^^^^^^^^^^^
File
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/session.py",
line 492, in getOrCreate
File
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/session.py",
line 19, in <module>
File
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/utils.py",
line 35, in check_dependencies
File
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py",
line 43, in require_minimum_pandas_version
pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED]
Pandas >= 2.0.0 must be installed; however, it was not found.
14:03:30.599 INFO org.apache.spark.util.ShutdownHookManager: Shutdown hook
called
14:03:30.604 INFO org.apache.spark.util.ShutdownHookManager: Deleting
directory /tmp/spark-f973a6e2-72c5-4759-8709-b18b15afc3d2
14:03:30.608 INFO org.apache.spark.util.ShutdownHookManager: Deleting
directory /tmp/localPyFiles-ce5279c9-5a6a-4547-84e9-3d01302054d0
(BaseYarnClusterSuite.scala:242)
- run Python application with Spark Connect in yarn-cluster mode *** FAILED
***
FAILED did not equal FINISHED WARNING: Using incubator modules:
jdk.incubator.vector
Exception in thread "main" org.apache.spark.SparkException: Application
application_1738850370406_0018 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1393)
at
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1827)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1137)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1146)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
(BaseYarnClusterSuite.scala:242)
- run Python application in yarn-cluster mode using spark.yarn.appMasterEnv
to override local envvar
- ```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]