sryza opened a new pull request, #52770: URL: https://github.com/apache/spark/pull/52770
### What changes were proposed in this pull request? Hides the `SparkUserAppException` and stack trace when a pipeline run fails. ### Why are the changes needed? I hit this when I ran a pipeline that had no flows: ``` org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:127) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1028) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:226) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:95) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1166) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1175) at org.apache.spark.deploy.SparkPipelines$.main(SparkPipelines.scala:42) at org.apache.spark.deploy.SparkPipelines.main(SparkPipelines.scala) ``` This is not information that's relevant to the user. ### Does this PR introduce _any_ user-facing change? Not for anything that's been released. ### How was this patch tested? Ran the CLI and observed this error was gone and the other output remained the same: ``` > spark-pipelines run --conf spark.sql.catalogImplementation=hive WARNING: Using incubator modules: jdk.incubator.vector 2025-10-28 13:22:49: Loading pipeline spec from /Users/sandy.ryza/sdp-test/demo2/pipeline.yml... 2025-10-28 13:22:49: Creating Spark session... WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 25/10/28 13:22:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable /Users/sandy.ryza/oss/python/lib/pyspark.zip/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.sql.catalogImplementation to Some(hive) due to [CANNOT_MODIFY_STATIC_CONFIG] Cannot modify the value of the static Spark config: "spark.sql.catalogImplementation". SQLSTATE: 46110 2025-10-28 13:22:53: Creating dataflow graph... 2025-10-28 13:22:53: Registering graph elements... 2025-10-28 13:22:53: Loading definitions. Root directory: '/Users/sandy.ryza/sdp-test/demo2'. 2025-10-28 13:22:53: Found 2 files matching glob 'transformations/**/*' 2025-10-28 13:22:53: Importing /Users/sandy.ryza/sdp-test/demo2/transformations/example_python_materialized_view.py... 2025-10-28 13:22:53: Registering SQL file /Users/sandy.ryza/sdp-test/demo2/transformations/example_sql_materialized_view.sql... 2025-10-28 13:22:53: Starting run... 25/10/28 13:22:55 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 25/10/28 13:22:55 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected] Traceback (most recent call last): File "/Users/sandy.ryza/oss/python/pyspark/pipelines/cli.py", line 413, in <module> run( File "/Users/sandy.ryza/oss/python/pyspark/pipelines/cli.py", line 340, in run handle_pipeline_events(result_iter) File "/Users/sandy.ryza/oss/python/lib/pyspark.zip/pyspark/pipelines/spark_connect_pipeline.py", line 53, in handle_pipeline_events for result in iter: File "/Users/sandy.ryza/oss/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1186, in execute_command_as_iterator File "/Users/sandy.ryza/oss/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1619, in _execute_and_fetch_as_iterator File "/Users/sandy.ryza/oss/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1893, in _handle_error File "/Users/sandy.ryza/oss/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1966, in _handle_rpc_error pyspark.errors.exceptions.connect.AnalysisException: [PIPELINE_DATASET_WITHOUT_FLOW] Pipeline dataset `spark_catalog`.`default`.`abc` does not have any defined flows. Please attach a query with the dataset's definition, or explicitly define at least one flow that writes to the dataset. SQLSTATE: 0A000 25/10/28 13:22:57 INFO ShutdownHookManager: Shutdown hook called 25/10/28 13:22:57 INFO ShutdownHookManager: Deleting directory /private/var/folders/1v/dqhbgmt10vl6v3tdlwvvx90r0000gp/T/spark-1214d042-270d-407f-8324-0dfcdf72c38c ``` ### Was this patch authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this patch, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
