@Meisam Fathi 1. I am using Spark Standalone mode so no YARN anywhere. How do I set Appid in Standalone mode? 2. All the streaming Jobs I had written so far I always do .start() and .awaitTermination(). Are you saying .awaitTermination is not needed with livy? When I saw the livy test <https://github.com/apache/incubator-livy/blob/d4bd76f09690079c47364b3349f549e32db4d621/examples/src/main/scala/org/apache/livy/examples/WordCountApp.scala#L106> It does look like it is calling awaitTermination() but with a timeout in my case there is no timeout because I want the query to run forever. The reason why it blocks forever because I a calling .get() on a java Future which is a blocking call and it never returns because I have awaitTermination(). so If I remove .get() and just have livy .submit(new StreamingJob()) this seem to work fine and I dont need the return value because I am returning void anyway. so the question really now is Do I need to cal awaitTermination() or not ? I can say for sure which spark-submit if one has a streaming job with .start() alone it wont work they must have awaitTermination().
Please let me know. Thanks! On Wed, Dec 6, 2017 at 9:03 AM, Meisam Fathi <[email protected]> wrote: > Please find some of the answers inlline. > >> SparkSession sparkSession = ctx.sparkSession(); >> >> ctx.sc().getConf().setAppName("MyJob"); // *This app name is not getting >> set when I go http://localhost:8998/sessions >> <http://localhost:8998/sessions>* >> >> By this time the spark session is already created. You should set > the configs before starting SparkContext. > > >> >> * // I can see my query but Appid is >> always set to null* >> >> AppId should be given to Livy by YARN (if you are running on YARN). It > may take a while to get response if YARN is busy. If you are not getting an > appID at all, then your application was not submitted correctly. You may > want to check your cluster manager UI for more information > > >> >> * System.out.println("READING STREAM");* >> >> This will be executed on the driver node. If you are running Spark in > standalone mode or in client mode, the driver node is the same node that > runs Spark. If you are running Spark in cluster mode, the driver is a > random node on the cluster assigned by YARN. > > >> >> >> df.printSchema(); // *Where does these print statements go ?* >> >> >> > Same as above. > > >> awaitTermination(); // *This thing blocks forever and I don't want to set a >> timeout. * >> >> >> * // so what should I do to fire and >> forget a streaming job ? * >> >> I believe you can call .start() > >> livy.submit(new StreamingJob()).get(); // *This will block forever* >> >> Livy.submit.get returns a value only if the job succeeds. You may want to > use > onJobFailed(JobHandle<T> job, Throwable cause) as well to handle errors > and get a better idea why the job is not returning. > > > >> System.out.println("SUBMITTED JAR!"); // *The control will never >> get here so I can't submit another job.*\ >> >> See above. > > Thanks, > Meisam >
