I don't personally use Spark Streaming so I don't know how important the awaitAnyTermination() call is, have you tested your code without it to see how it works? Also I believe you are correct that if you got rid of it you would have access to the id needed to cancel.
Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States From: kant kodali <kanth...@gmail.com> To: user@livy.incubator.apache.org Date: 01/24/2018 04:51 PM Subject: Re: How to cancel the running streaming job using livy? Do I need to awaitAnyTermination() ? How come it works with or without in livy case? In spark-submit case I am sure streaming jobs wont run forever with awaitAnyTermination(). The problem here is if I post another livy job to cancel then I would need an id. And I cannot return an id if my code has awaitTermination() like this. public Void call(JobContext ctx) throws Exception { SparkSession sparkSession = ctx.sparkSession(); Dataset<Row> df = sparkSession.readStream().format("kafka").load(); df.createOrReplaceTempView("table"); Dataset<Row> resultSet1 = sparkSession.sql("select * from table"); resultSet1.writeStream().format("console").start(); //Streaming query1 started Dataset<Row> resultSet2 = sparkSession.sql("select count(*) from table"); resultSet2.writeStream().format("console").start(); //Streaming query2 started sparkSession.streams().awaitAnyTermination(); // DO I NEED THIS ? return null; } On Wed, Jan 24, 2018 at 4:00 PM, Alex Bozarth <ajboz...@us.ibm.com> wrote: In response to Marcelo's "big if", from reading the code stopping/canceling/deleting Livy sessions/jobs will always "kill" any running Spark jobs, leaving them in a failed state. It should allow you to individually cancel those jobs using the cancel command from earlier in this mail thread, but it wont fix the final state issue you had (since they'll still be listed as failed). As for more long term solutions to this problem, from what I can tell this is a general limitation of Spark Streaming currently as there is no way to stop endless stream jobs through Spark directly. Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States Inactive hide details for Marcelo Vanzin ---01/24/2018 03:30:24 PM---No. Livy doesn't keep track of everything your code does -Marcelo Vanzin ---01/24/2018 03:30:24 PM---No. Livy doesn't keep track of everything your code does - that's up to you to do. From: Marcelo Vanzin <van...@cloudera.com> To: user@livy.incubator.apache.org Cc: Alex Bozarth <ajboz...@us.ibm.com> Date: 01/24/2018 03:30 PM Subject: Re: How to cancel the running streaming job using livy? No. Livy doesn't keep track of everything your code does - that's up to you to do. If (big if here, don't remember the code) you submit separate jobs for each streaming query, then maybe canceling that Livy job will cancel any Spark jobs started by it. But that makes a lot of assumptions about how the Livy code works and how streaming queries work internally w.r.t Spark jobs. On Wed, Jan 24, 2018 at 3:25 PM, kant kodali <kanth...@gmail.com> wrote: Sure I guess I can do that..Is that the only way? Is there any REST call I can make to SPARK maybe to cancel any of the Streaming Query? Sorry if this is too naive. On Wed, Jan 24, 2018 at 2:46 PM, Marcelo Vanzin < van...@cloudera.com> wrote: Then that has nothing to do with Livy. You need to store a reference to your StreamingQuery (returned by start()) somewhere, and if you want to stop it, call its "stop()" method by submitting a new Livy job that does it. On Wed, Jan 24, 2018 at 2:42 PM, kant kodali <kanth...@gmail.com> wrote: Ok Let me paste some code to try and avoid the confusion. In the below code I am running two streaming queries. Now here are my two simple questions. 1) Does each Streaming Query below spawn one job or multiple jobs? 2) What should I do if I need to kill everything related to streaming query1 but not streaming query2? public Void call(JobContext ctx) throws Exception { SparkSession sparkSession = ctx.sparkSession(); Dataset<Row> df = sparkSession.readStream().format("kafka" ).load(); df.createOrReplaceTempView("table"); Dataset<Row> resultSet1 = sparkSession.sql("select * from table"); resultSet1.writeStream().format("console").start (); //Streaming query1 started Dataset<Row> resultSet2 = sparkSession.sql("select count(*) from table"); //Streaming query2 started sparkSession.streams().awaitAnyTermination(); return null; } Thanks! On Wed, Jan 24, 2018 at 1:47 PM, Marcelo Vanzin < van...@cloudera.com> wrote: I'm a little confused about what is meant as a job here, after all this discussion... For "interactive sessions", stopping a session means stopping the SparkContext. So the final state of any running jobs in that session should be the same as if you stopped the SparkContext without explicitly stopping the jobs in a normal, non-Livy application. For batches, stopping a batch means killing the Spark application, so all bets are off as to what happens there. On Wed, Jan 24, 2018 at 1:08 PM, Alex Bozarth < ajboz...@us.ibm.com> wrote: You are correct that you are using the term Job incorrectly (at least according to how Spark/Livy uses it). Each spark-submit is a a single Spark Application and can include many jobs (which are broken down themselves into stages and tasks). In Livy using sessions would be like using spark-shell rather than spark-submit, you probably want to use batches instead (which utilize spark-submit), then you would use that delete command as mentioned earlier. As for the result being listed as FAILED and not CANCELLED, that is as intended. When a Livy Session is stopped (deleted) is sends a command to all the running jobs (in your case each of you apps only have one "Job") to set as failed. @Marcelo you wrote the code that does this, do you remember why you had Jobs killed instead of cancelled when a Livy session is stopped? Otherwise we may be able to open a JIRA and change this, but I am unsure of any potential consequences. Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States Inactive hide details for kant kodali ---01/23/2018 11:44:26 PM---I tried POST to sessions/{session id}/jobs/{job id}/cancel akant kodali ---01/23/2018 11:44:26 PM---I tried POST to sessions/{session id}/jobs/{job id}/cancel and that doesn't seem to cancel either. From: kant kodali <kanth...@gmail.com> To: user@livy.incubator.apache.org Date: 01/23/2018 11:44 PM Subject: Re: How to cancel the running streaming job using livy? I tried POST to sessions/{session id}/jobs/{job id}/cancel and that doesn't seem to cancel either. I think first of all the word "job" is used in so many context that it might be misleading. Imagine for a second I don't have livy and I just use spark-submit command line to spawn . say I do that following spark-submit hello1.jar // streaming job1 (runs forever) spark-submit hello2.jar //streaming job2 (runs forever) The number of jobs I spawned is two and now I want to be able to cancel one of them..These jobs reads data from kafka and will be split into stages and task now sometimes these tasks are also called jobs according to SPARK UI for some reason. And looks like live may be is cancelling those with the above end point. It would be great help if someone could try from their end and see if they are able to cancel the jobs? Thanks! On Fri, Jan 19, 2018 at 4:03 PM, Alex Bozarth < ajboz...@us.ibm.com> wrote: Ah, that's why I couldn't find cancel in JobHandle, but it was implemented in all it's implementations, which all implement it as would be expected. Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States Inactive hide details for Marcelo Vanzin ---01/19/2018 03:55:43 PM---A JobHandle (which you get by submitting a Job) is a FuturMarcelo Vanzin ---01/19/2018 03:55:43 PM---A JobHandle (which you get by submitting a Job) is a Future, and Futures have a "cancel()" method. From: Marcelo Vanzin <van...@cloudera.com> To: user@livy.incubator.apache.org Date: 01/19/2018 03:55 PM Subject: Re: How to cancel the running streaming job using livy? A JobHandle (which you get by submitting a Job) is a Future, and Futures have a "cancel()" method. I don't remember the details about how "cancel()" is implemented in Livy, though. On Fri, Jan 19, 2018 at 3:52 PM, Alex Bozarth <ajboz...@us.ibm.com> wrote: Ok so I looked into this a bit more. I misunderstood you a bit before, the delete call is for ending livy sessions using the rest API, not jobs and not via the Java API. As for the Job state that makes sense, if you end the session the session kills all currently running jobs. What you want to to send cancel requests to the jobs the session is running. From my research I found that there is a way to do this via the REST API, but it isn't documented for some reason. Doing a POST to /{session id}/jobs/{job id}/cancel will cancel a job. As for the Java API, the feature isn't part of the Java interface, but most implementations of it add it, such as the Scala API which ScalaJobHandle class on sumbit which has a cancel function. I'm not sure how you're submitting you jobs, but there should be a cancel function available to you somewhere depending on the client you're using. From this discussion I've realized our current documentation is even more lacking that I had thought. Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States Inactive hide details for kant kodali ---01/18/2018 06:09:59 PM---Also just tried the below and got the state. It ended up in "kant kodali ---01/18/2018 06:09:59 PM---Also just tried the below and got the state. It ended up in "FAILED" stated when I expected it to be From: kant kodali < kanth...@gmail.com> To: user@livy.incubator.apache.org Date: 01/18/2018 06:09 PM Subject: Re: How to cancel the running streaming job using livy? Also just tried the below and got the state. It ended up in "FAILED" stated when I expected it to be in "CANCELLED" state. Also from the docs it is not clear if it kills the session or the job? if it kills the session I can't spawn any other Job. Sorry cancelling job had been a bit confusing for me. DELETE /sessions/0 On Thu, Jan 18, 2018 at 5:55 PM, kant kodali < kanth...@gmail.com> wrote: oh this raises couple questions. 1) Is there a programmatic way to cancel a job? 2) is there any programmatic way to get session id? If not, how do I get a sessionId when I spawn multiple jobs or multiple sessions? On Thu, Jan 18, 2018 at 5:39 PM, Alex Bozarth <ajboz...@us.ibm.com> wrote: You make a DELETE call as detailed here: http://livy.apache.org/docs/latest/rest-api.html#response Alex Bozarth Software Engineer Spark Technology Center E-mail: ajboz...@us.ibm.com GitHub: github.com/ajbozarth 505 Howard Street San Francisco, CA 94105 United States Inactive hide details for kant kodali ---01/18/2018 05:34:07 PM---Hi All, I was able to submit a streaming job to livy however kant kodali ---01/18/2018 05:34:07 PM---Hi All, I was able to submit a streaming job to livy however I wasn't able to find From: kant kodali <kanth...@gmail.com> To: user@livy.incubator.apache.org Date: 01/18/2018 05:34 PM Subject: How to cancel the running streaming job using livy? Hi All, I was able to submit a streaming job to livy however I wasn't able to find any way to cancel the running the job? Please let me know. Thanks! -- Marcelo -- Marcelo -- Marcelo -- Marcelo