Re: Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-05 Thread Jungtaek Lim
There're sections in SS programming guide which exactly answer these
questions:

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries

Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm
the author) to help you commit the offset to Kafka with the specific group
ID.

https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer

After then, you can also leverage the Kafka ecosystem to monitor the
progress in point of Kafka's view, especially the gap between highest
offset and committed offset.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)


On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi 
wrote:

> In 3.0 the community just added it.
>
> On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, 
> wrote:
>
>> Hi,
>>
>> We are trying to move our existing code from spark dstreams to structured
>> streaming for one of the old application which we built few years ago.
>>
>> Structured streaming job doesn’t have streaming tab in sparkui. Is there
>> a way to monitor the job submitted by us in structured streaming ? Since
>> the job runs for every trigger, how can we kill the job and restart if
>> needed.
>>
>> Any suggestions on this please
>>
>> Thanks,
>> Asmath
>>
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-05 Thread Gabor Somogyi
In 3.0 the community just added it.

On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, 
wrote:

> Hi,
>
> We are trying to move our existing code from spark dstreams to structured
> streaming for one of the old application which we built few years ago.
>
> Structured streaming job doesn’t have streaming tab in sparkui. Is there a
> way to monitor the job submitted by us in structured streaming ? Since the
> job runs for every trigger, how can we kill the job and restart if needed.
>
> Any suggestions on this please
>
> Thanks,
> Asmath
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: File Not Found: /tmp/spark-events in Spark 3.0

2020-07-05 Thread ArtemisDev
Thank you all for the responses.  I believe the user shouldn't be 
worried about creating the log dir explicitly.  The event logging should 
behave like other logs (e.g. master or slave) that the directory should 
be automatically created if not exist.


-- ND

On 7/2/20 9:19 AM, Zero wrote:


This could be the result of you not setting the location of eventLog 
properly. By default, it's/TMP/Spark-Events, and since the files in 
the/TMP directory are cleaned up regularly, you could have this problem.


-- Original --
*From:* "Xin Jinhan"<18183124...@163.com>;
*Date:* Thu, Jul 2, 2020 08:39 PM
*To:* "user";
*Subject:* Re: File Not Found: /tmp/spark-events in Spark 3.0

Hi,

First, the /tmp/spark-events is the default storage location of spark
eventLog, but the log is stored only when you set the
'spark.eventLog.enabled=true', which maybe your spark 2.4.6 set to 
false. So

you can just set it to false and the error will disappear.

Second, I suggest to open the eventLog and you can specify the log 
location

with 'spark.eventLog.dir' either a filesystem or local one, because you
maybe to check the log later.(can simplely use spark-history-server)

Regards
Jinhan



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-05 Thread KhajaAsmath Mohammed
Hi,

We are trying to move our existing code from spark dstreams to structured 
streaming for one of the old application which we built few years ago.

Structured streaming job doesn’t have streaming tab in sparkui. Is there a way 
to monitor the job submitted by us in structured streaming ? Since the job runs 
for every trigger, how can we kill the job and restart if needed. 

Any suggestions on this please 

Thanks,
Asmath



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org