Correct me if I'm wrong, but won't Interactive Mode require me to rewrite
my application code into statements that would then be submitted within the
POST/sessions/{sessionId}/statements
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessionssessionidstatements>
request
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessionssessionidstatements>
as
code property?

The thing is that I don't want to take the application logic out of my JAR
file containing my Spark application, because I'll be using Livy's HTTP
Rest API to submit remote Spark jobs via Apache-Airflow.

*Shubham Gupta*
Software Engineer
 zomato


On Mon, Oct 1, 2018 at 7:30 AM Jeff Zhang <zjf...@gmail.com> wrote:

> BTW, zeppelin has integrated livy's interactive mode to run spark code.
> You may try this as well.
>
> https://zeppelin.apache.org/docs/0.8.0/interpreter/livy.html
>
>
>
> Jeff Zhang <zjf...@gmail.com>于2018年10月1日周一 上午9:58写道:
>
>>
>> Have you tried the interactive mode ?
>>
>> Shubham Gupta <y2k.shubhamgu...@gmail.com>于2018年10月1日周一 上午9:30写道:
>>
>>> I'm trying to use Livy to remotely submit several Spark *jobs*. Lets
>>> say I want to perform following *spark-submit task remotely* (with all
>>> the options as-such)
>>>
>>> spark-submit \
>>> --class com.company.drivers.JumboBatchPipelineDriver \
>>> --conf spark.driver.cores=1 \
>>> --conf spark.driver.memory=1g \
>>> --conf spark.dynamicAllocation.enabled=true \
>>> --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
>>> --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
>>> --master yarn \
>>> --deploy-mode cluster \
>>> /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
>>> \
>>> --start=2012-12-21 \
>>> --end=2012-12-21 \
>>> --pipeline=db-importer \
>>> --run-spiders
>>>
>>> *NOTE: The options after the JAR (--start, --end etc.) are specific to
>>> my Spark application. I'm using scopt <https://github.com/scopt/scopt> for
>>> this*
>>> ------------------------------
>>>
>>>    -
>>>
>>>    I'm aware that I can supply all the various options in above
>>>    spark-submit command using Livy POST/batches request
>>>    
>>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>>>    .
>>>    -
>>>
>>>    But since I have to make over 250 spark-submits remotely, I'd like
>>>    to exploit Livy's *session-management capabilities*; i.e., I want
>>>    Livy to create a SparkSession once and then use it for all my
>>>    spark-submit requests.
>>>    -
>>>
>>>    The POST/sessions request
>>>    
>>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions> 
>>> allows
>>>    me to specify quite a few options for instantiating a SparkSession 
>>> remotely.
>>>    However, I see no *session argument* in POST/batches request
>>>    
>>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>>>    .
>>>
>>> ------------------------------
>>>
>>> My questions are
>>>
>>>
>>>    1. How can I make use of the SparkSession that I created using
>>>    POST/sessions request for submitting my Spark job using POST/batches
>>>     request?
>>>    2. In case its not possible, why is that the case?
>>>    3. Any workarounds?
>>>
>>> ------------------------------
>>>
>>> I've referred to following examples but they only demonstrate supplying (
>>> python) *code* for Sparkjob within Livy's POST request
>>>
>>>    - pi_app
>>>    
>>> <https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py>
>>>    - rssanders3/airflow-spark-operator-plugin
>>>    
>>> <https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py>
>>>    - livy/examples <https://livy.incubator.apache.org/examples/>
>>>
>>> ------------------------------
>>>
>>> Here's the link <https://stackoverflow.com/questions/51746286/> to my
>>> original question on StackOverflow
>>>
>>> *Shubham Gupta*
>>> Software Engineer
>>>  zomato
>>>
>>

Reply via email to