Re: How to submit a streaming query that runs forever using Livy?

kant kodali Wed, 06 Dec 2017 10:03:37 -0800

@Meisam Fathi

1. I am using Spark Standalone mode so no YARN anywhere. How do I set Appid
in Standalone mode?
2. All the streaming Jobs I had written so far I always do .start() and
.awaitTermination(). Are you saying .awaitTermination is not needed with
livy?
  When I saw the livy test
<https://github.com/apache/incubator-livy/blob/d4bd76f09690079c47364b3349f549e32db4d621/examples/src/main/scala/org/apache/livy/examples/WordCountApp.scala#L106>
It
does look like it is calling awaitTermination() but with a timeout in my
case there is no timeout because I want the query
  to run forever. The reason why it blocks forever because I a calling
.get() on a java Future which is a blocking call and it never returns
because
  I have awaitTermination(). so If I remove .get() and just have livy
.submit(new StreamingJob()) this seem to work fine and I dont need the
return value because I am returning void anyway.  so the question really
now is
  Do I need to cal awaitTermination() or not ? I can say for sure which
spark-submit if one has a streaming job with .start() alone it wont work
they must
  have awaitTermination().


Please let me know.

Thanks!





On Wed, Dec 6, 2017 at 9:03 AM, Meisam Fathi <[email protected]> wrote:

> Please find some of the answers inlline.
>
>>     SparkSession sparkSession = ctx.sparkSession();
>>
>>     ctx.sc().getConf().setAppName("MyJob"); // *This app name is not getting 
>> set when I go http://localhost:8998/sessions 
>> <http://localhost:8998/sessions>*
>>
>>     By this time the spark session is already created. You should set
> the configs before starting SparkContext.
>
>
>>
>> *                                       // I can see my query but Appid is 
>> always set to null*
>>
>> AppId should be given to Livy by YARN (if you are running on YARN). It
> may take a while to get response if YARN is busy. If you are not getting an
> appID at all, then your application was not submitted correctly. You may
> want to check your cluster manager UI for more information
>
>
>>
>> *    System.out.println("READING STREAM");*
>>
>> This will be executed on the driver node. If you are running Spark in
> standalone mode or in client mode, the driver node is the same node that
> runs Spark. If you are running Spark in cluster mode, the driver is a
> random node on the cluster assigned by YARN.
>
>
>>
>>
>>     df.printSchema(); // *Where does these print statements go ?*
>>
>>
>>
> Same as above.
>
>
>> awaitTermination(); // *This thing blocks forever and I don't want to set a 
>> timeout. *
>>
>>
>> *                                      // so what should I do to fire and 
>> forget a streaming job ? *
>>
>> I believe you can call .start()
>
>>         livy.submit(new StreamingJob()).get(); // *This will block forever*
>>
>> Livy.submit.get returns a value only if the job succeeds. You may want to
> use
> onJobFailed(JobHandle<T> job, Throwable cause) as well to handle errors
> and get a better idea why the job is not returning.
>
>
>
>>         System.out.println("SUBMITTED JAR!");  // *The control will never 
>> get here so I can't submit another job.*\
>>
>> See above.
>
> Thanks,
> Meisam
>

Re: How to submit a streaming query that runs forever using Livy?

Reply via email to