Re: Spark structured streaming: periodically refresh static data frame

Appu K Wed, 14 Feb 2018 03:35:19 -0800

TD,

Thanks a lot for the quick reply :)



Did I understand it right that in the main thread, to wait for the
termination of the context I'll not be able to use
 outStream.awaitTermination()  -  [ since i'll be closing in inside another
thread ]

What would be a good approach to keep the main app long running if I’ve to
restart queries?

Should i just wait for 2.3 where i'll be able to join two structured
streams ( if the release is just a few weeks away )

Appreciate all the help!

thanks
App



On 14 February 2018 at 4:41:52 PM, Tathagata Das (
tathagata.das1...@gmail.com) wrote:

Let me fix my mistake :)
What I suggested in that earlier thread does not work. The streaming query
that joins a streaming dataset with a batch view, does not correctly pick
up when the view is updated. It works only when you restart the query. That
is,
- stop the query
- recreate the dataframes,
- start the query on the new dataframe using the same checkpoint location
as the previous query

Note that you dont need to restart the whole process/cluster/application,
just restart the query in the same process/cluster/application. This should
be very fast (within a few seconds). So, unless you have latency SLAs of 1
second, you can periodically restart the query without restarting the
process.

Apologies for my misdirections in that earlier thread. Hope this helps.

TD

On Wed, Feb 14, 2018 at 2:57 AM, Appu K <kut...@gmail.com> wrote:

> More specifically,
>
> Quoting TD from the previous thread
> "Any streaming query that joins a streaming dataframe with the view will
> automatically start using the most updated data as soon as the view is
> updated”
>
> Wondering if I’m doing something wrong in  https://gist.github.com/
> anonymous/90dac8efadca3a69571e619943ddb2f6
>
> My streaming dataframe is not using the updated data, even though the view
> is updated!
>
> Thank you
>
>
> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
>
> Hi,
>
> I had followed the instructions from the thread https://mail-archives.
> apache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-
> 41cd-4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a
> static data frame periodically that gets joined to a structured streaming
> query.
>
> However, the streaming query results does not reflect the data from the
> refreshed static data frame.
>
> Code is here https://gist.github.com/anonymous/
> 90dac8efadca3a69571e619943ddb2f6
>
> I’m using spark 2.2.1 . Any pointers would be highly helpful
>
> Thanks a lot
>
> Appu
>
>

Re: Spark structured streaming: periodically refresh static data frame

Reply via email to