ParkGyeongTae commented on code in PR #4936: URL: https://github.com/apache/zeppelin/pull/4936#discussion_r2121181036
########## spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java: ########## @@ -37,32 +36,72 @@ /** - * Support %spark-submit which run spark-submit command. Internally, - * it would run shell command via ShellInterpreter. - * + * Interpreter that supports the `%spark-submit` command in Apache Zeppelin. + * <p> + * This interpreter allows users to submit Spark jobs using the standard `spark-submit` CLI + * interface. + * Internally, it delegates execution to the ShellInterpreter to run `spark-submit` as a shell + * command. + * <p> + * Key features: + * - Automatically builds and executes the `spark-submit` command using the configured SPARK_HOME + * path. + * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin frontend. + * - Tracks the YARN Application ID from the logs, allowing the job to be cancelled via `yarn + * application -kill`. + * - Handles both YARN and local Spark modes. + * <p> + * Required configuration: + * - SPARK_HOME must be set in the interpreter setting or environment variables. It should point + * to the root + * directory of a valid Spark installation. + * <p> + * Example usage in a Zeppelin notebook: + * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar spark-args */ public class SparkSubmitInterpreter extends ShellInterpreter { private static final Logger LOGGER = LoggerFactory.getLogger(SparkSubmitInterpreter.class); - private String sparkHome; - - // paragraphId --> yarnAppId + private final String sparkHome; private ConcurrentMap<String, String> yarnAppIdMap = new ConcurrentHashMap<>(); public SparkSubmitInterpreter(Properties property) { super(property); - // Set time to be max integer so that the shell process won't timeout. - setProperty("shell.command.timeout.millisecs", Integer.MAX_VALUE + ""); - this.sparkHome = properties.getProperty("SPARK_HOME"); + setProperty("shell.command.timeout.millisecs", String.valueOf(Integer.MAX_VALUE)); + this.sparkHome = property.getProperty("SPARK_HOME"); LOGGER.info("SPARK_HOME: {}", sparkHome); } + /** + * Executes a spark-submit command based on the user's input in a Zeppelin notebook paragraph. + * <p> + * This method constructs the full spark-submit CLI command using the configured SPARK_HOME and + * the + * provided arguments. It performs validation (e.g., SPARK_HOME presence), logs the execution, + * and registers a listener to extract Spark UI information from the output logs. + * <p> + * If SPARK_HOME is not set, an error result is returned. + * After execution, any associated YARN application ID is removed from the internal tracking map. + * + * @param cmd The spark-submit arguments entered by the user (e.g., "--class ... + * /path/to/jar"). + * @param context The interpreter context for the current paragraph execution. + * @return An {@link InterpreterResult} representing the outcome of the spark-submit execution. + */ @Override public InterpreterResult internalInterpret(String cmd, InterpreterContext context) { if (StringUtils.isBlank(cmd)) { return new InterpreterResult(InterpreterResult.Code.SUCCESS); } + + if (StringUtils.isBlank(sparkHome)) { + String errorMsg = "SPARK_HOME is not set. Please configure SPARK_HOME in the interpreter " + + "setting or environment."; + LOGGER.error("Failed to run spark-submit: {}", errorMsg); + return new InterpreterResult(InterpreterResult.Code.ERROR, errorMsg); + } + String sparkSubmitCommand = sparkHome + "/bin/spark-submit " + cmd.trim(); Review Comment: I added code to validate the existence of the `spark-submit` file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org