Correct me if I'm wrong, but won't Interactive Mode require me to rewrite my application code into statements that would then be submitted within the POST/sessions/{sessionId}/statements <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessionssessionidstatements> request <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessionssessionidstatements> as code property?
The thing is that I don't want to take the application logic out of my JAR file containing my Spark application, because I'll be using Livy's HTTP Rest API to submit remote Spark jobs via Apache-Airflow. *Shubham Gupta* Software Engineer zomato On Mon, Oct 1, 2018 at 7:30 AM Jeff Zhang <zjf...@gmail.com> wrote: > BTW, zeppelin has integrated livy's interactive mode to run spark code. > You may try this as well. > > https://zeppelin.apache.org/docs/0.8.0/interpreter/livy.html > > > > Jeff Zhang <zjf...@gmail.com>于2018年10月1日周一 上午9:58写道: > >> >> Have you tried the interactive mode ? >> >> Shubham Gupta <y2k.shubhamgu...@gmail.com>于2018年10月1日周一 上午9:30写道: >> >>> I'm trying to use Livy to remotely submit several Spark *jobs*. Lets >>> say I want to perform following *spark-submit task remotely* (with all >>> the options as-such) >>> >>> spark-submit \ >>> --class com.company.drivers.JumboBatchPipelineDriver \ >>> --conf spark.driver.cores=1 \ >>> --conf spark.driver.memory=1g \ >>> --conf spark.dynamicAllocation.enabled=true \ >>> --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \ >>> --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \ >>> --master yarn \ >>> --deploy-mode cluster \ >>> /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \ >>> \ >>> --start=2012-12-21 \ >>> --end=2012-12-21 \ >>> --pipeline=db-importer \ >>> --run-spiders >>> >>> *NOTE: The options after the JAR (--start, --end etc.) are specific to >>> my Spark application. I'm using scopt <https://github.com/scopt/scopt> for >>> this* >>> ------------------------------ >>> >>> - >>> >>> I'm aware that I can supply all the various options in above >>> spark-submit command using Livy POST/batches request >>> >>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches> >>> . >>> - >>> >>> But since I have to make over 250 spark-submits remotely, I'd like >>> to exploit Livy's *session-management capabilities*; i.e., I want >>> Livy to create a SparkSession once and then use it for all my >>> spark-submit requests. >>> - >>> >>> The POST/sessions request >>> >>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions> >>> allows >>> me to specify quite a few options for instantiating a SparkSession >>> remotely. >>> However, I see no *session argument* in POST/batches request >>> >>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches> >>> . >>> >>> ------------------------------ >>> >>> My questions are >>> >>> >>> 1. How can I make use of the SparkSession that I created using >>> POST/sessions request for submitting my Spark job using POST/batches >>> request? >>> 2. In case its not possible, why is that the case? >>> 3. Any workarounds? >>> >>> ------------------------------ >>> >>> I've referred to following examples but they only demonstrate supplying ( >>> python) *code* for Sparkjob within Livy's POST request >>> >>> - pi_app >>> >>> <https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py> >>> - rssanders3/airflow-spark-operator-plugin >>> >>> <https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py> >>> - livy/examples <https://livy.incubator.apache.org/examples/> >>> >>> ------------------------------ >>> >>> Here's the link <https://stackoverflow.com/questions/51746286/> to my >>> original question on StackOverflow >>> >>> *Shubham Gupta* >>> Software Engineer >>> zomato >>> >>