Re: Spark structured streaming -Kafka - deployment / monitor and restart
There're sections in SS programming guide which exactly answer these questions: http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm the author) to help you commit the offset to Kafka with the specific group ID. https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer After then, you can also leverage the Kafka ecosystem to monitor the progress in point of Kafka's view, especially the gap between highest offset and committed offset. Hope this helps. Thanks, Jungtaek Lim (HeartSaVioR) On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi wrote: > In 3.0 the community just added it. > > On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, > wrote: > >> Hi, >> >> We are trying to move our existing code from spark dstreams to structured >> streaming for one of the old application which we built few years ago. >> >> Structured streaming job doesn’t have streaming tab in sparkui. Is there >> a way to monitor the job submitted by us in structured streaming ? Since >> the job runs for every trigger, how can we kill the job and restart if >> needed. >> >> Any suggestions on this please >> >> Thanks, >> Asmath >> >> >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Spark structured streaming -Kafka - deployment / monitor and restart
In 3.0 the community just added it. On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, wrote: > Hi, > > We are trying to move our existing code from spark dstreams to structured > streaming for one of the old application which we built few years ago. > > Structured streaming job doesn’t have streaming tab in sparkui. Is there a > way to monitor the job submitted by us in structured streaming ? Since the > job runs for every trigger, how can we kill the job and restart if needed. > > Any suggestions on this please > > Thanks, > Asmath > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: File Not Found: /tmp/spark-events in Spark 3.0
Thank you all for the responses. I believe the user shouldn't be worried about creating the log dir explicitly. The event logging should behave like other logs (e.g. master or slave) that the directory should be automatically created if not exist. -- ND On 7/2/20 9:19 AM, Zero wrote: This could be the result of you not setting the location of eventLog properly. By default, it's/TMP/Spark-Events, and since the files in the/TMP directory are cleaned up regularly, you could have this problem. -- Original -- *From:* "Xin Jinhan"<18183124...@163.com>; *Date:* Thu, Jul 2, 2020 08:39 PM *To:* "user"; *Subject:* Re: File Not Found: /tmp/spark-events in Spark 3.0 Hi, First, the /tmp/spark-events is the default storage location of spark eventLog, but the log is stored only when you set the 'spark.eventLog.enabled=true', which maybe your spark 2.4.6 set to false. So you can just set it to false and the error will disappear. Second, I suggest to open the eventLog and you can specify the log location with 'spark.eventLog.dir' either a filesystem or local one, because you maybe to check the log later.(can simplely use spark-history-server) Regards Jinhan -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark structured streaming -Kafka - deployment / monitor and restart
Hi, We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago. Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed. Any suggestions on this please Thanks, Asmath - To unsubscribe e-mail: user-unsubscr...@spark.apache.org