Re: Spark structured streaming: periodically refresh static data frame

2020-09-17 Thread Harsh
As per the solution, if we are closing and starting the query, then what
happens to the the state which is maintained in memory, will that be
retained ? 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark structured streaming: periodically refresh static data frame

2018-02-25 Thread naresh Goud
Appu,

I am also landed in same problem.

Are you able to solve this issue? Could you please share snippet of code if
your able to do?

Thanks,
Naresh

On Wed, Feb 14, 2018 at 8:04 PM, Tathagata Das 
wrote:

> 1. Just loop like this.
>
>
> def startQuery(): Streaming Query = {
>// Define the dataframes and start the query
> }
>
> // call this on main thread
> while (notShutdown) {
>val query = startQuery()
>query.awaitTermination(refreshIntervalMs)
>query.stop()
>// refresh static data
> }
>
>
> 2. Yes, stream-stream joins in 2.3.0, soon to be released. RC3 is
> available if you want to test it right now - https://dist.apache.org/
> repos/dist/dev/spark/v2.3.0-rc3-bin/.
>
>
>
> On Wed, Feb 14, 2018 at 3:34 AM, Appu K  wrote:
>
>> TD,
>>
>> Thanks a lot for the quick reply :)
>>
>>
>> Did I understand it right that in the main thread, to wait for the
>> termination of the context I'll not be able to use
>>  outStream.awaitTermination()  -  [ since i'll be closing in inside another
>> thread ]
>>
>> What would be a good approach to keep the main app long running if I’ve
>> to restart queries?
>>
>> Should i just wait for 2.3 where i'll be able to join two structured
>> streams ( if the release is just a few weeks away )
>>
>> Appreciate all the help!
>>
>> thanks
>> App
>>
>>
>>
>> On 14 February 2018 at 4:41:52 PM, Tathagata Das (
>> tathagata.das1...@gmail.com) wrote:
>>
>> Let me fix my mistake :)
>> What I suggested in that earlier thread does not work. The streaming
>> query that joins a streaming dataset with a batch view, does not correctly
>> pick up when the view is updated. It works only when you restart the query.
>> That is,
>> - stop the query
>> - recreate the dataframes,
>> - start the query on the new dataframe using the same checkpoint location
>> as the previous query
>>
>> Note that you dont need to restart the whole process/cluster/application,
>> just restart the query in the same process/cluster/application. This should
>> be very fast (within a few seconds). So, unless you have latency SLAs of 1
>> second, you can periodically restart the query without restarting the
>> process.
>>
>> Apologies for my misdirections in that earlier thread. Hope this helps.
>>
>> TD
>>
>> On Wed, Feb 14, 2018 at 2:57 AM, Appu K  wrote:
>>
>>> More specifically,
>>>
>>> Quoting TD from the previous thread
>>> "Any streaming query that joins a streaming dataframe with the view will
>>> automatically start using the most updated data as soon as the view is
>>> updated”
>>>
>>> Wondering if I’m doing something wrong in  https://gist.github.com/anony
>>> mous/90dac8efadca3a69571e619943ddb2f6
>>>
>>> My streaming dataframe is not using the updated data, even though the
>>> view is updated!
>>>
>>> Thank you
>>>
>>>
>>> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
>>>
>>> Hi,
>>>
>>> I had followed the instructions from the thread https://mail-archives.a
>>> pache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-41CD-4
>>> ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a static
>>> data frame periodically that gets joined to a structured streaming query.
>>>
>>> However, the streaming query results does not reflect the data from the
>>> refreshed static data frame.
>>>
>>> Code is here https://gist.github.com/anonymous/90dac8efadca3a69571e6
>>> 19943ddb2f6
>>>
>>> I’m using spark 2.2.1 . Any pointers would be highly helpful
>>>
>>> Thanks a lot
>>>
>>> Appu
>>>
>>>
>>
>


Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Tathagata Das
1. Just loop like this.


def startQuery(): Streaming Query = {
   // Define the dataframes and start the query
}

// call this on main thread
while (notShutdown) {
   val query = startQuery()
   query.awaitTermination(refreshIntervalMs)
   query.stop()
   // refresh static data
}


2. Yes, stream-stream joins in 2.3.0, soon to be released. RC3 is available
if you want to test it right now -
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/.



On Wed, Feb 14, 2018 at 3:34 AM, Appu K  wrote:

> TD,
>
> Thanks a lot for the quick reply :)
>
>
> Did I understand it right that in the main thread, to wait for the
> termination of the context I'll not be able to use
>  outStream.awaitTermination()  -  [ since i'll be closing in inside another
> thread ]
>
> What would be a good approach to keep the main app long running if I’ve to
> restart queries?
>
> Should i just wait for 2.3 where i'll be able to join two structured
> streams ( if the release is just a few weeks away )
>
> Appreciate all the help!
>
> thanks
> App
>
>
>
> On 14 February 2018 at 4:41:52 PM, Tathagata Das (
> tathagata.das1...@gmail.com) wrote:
>
> Let me fix my mistake :)
> What I suggested in that earlier thread does not work. The streaming query
> that joins a streaming dataset with a batch view, does not correctly pick
> up when the view is updated. It works only when you restart the query. That
> is,
> - stop the query
> - recreate the dataframes,
> - start the query on the new dataframe using the same checkpoint location
> as the previous query
>
> Note that you dont need to restart the whole process/cluster/application,
> just restart the query in the same process/cluster/application. This should
> be very fast (within a few seconds). So, unless you have latency SLAs of 1
> second, you can periodically restart the query without restarting the
> process.
>
> Apologies for my misdirections in that earlier thread. Hope this helps.
>
> TD
>
> On Wed, Feb 14, 2018 at 2:57 AM, Appu K  wrote:
>
>> More specifically,
>>
>> Quoting TD from the previous thread
>> "Any streaming query that joins a streaming dataframe with the view will
>> automatically start using the most updated data as soon as the view is
>> updated”
>>
>> Wondering if I’m doing something wrong in  https://gist.github.com/anony
>> mous/90dac8efadca3a69571e619943ddb2f6
>>
>> My streaming dataframe is not using the updated data, even though the
>> view is updated!
>>
>> Thank you
>>
>>
>> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
>>
>> Hi,
>>
>> I had followed the instructions from the thread https://mail-archives.a
>> pache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-41CD-
>> 4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a static
>> data frame periodically that gets joined to a structured streaming query.
>>
>> However, the streaming query results does not reflect the data from the
>> refreshed static data frame.
>>
>> Code is here https://gist.github.com/anonymous/90dac8efadca3a69571e6
>> 19943ddb2f6
>>
>> I’m using spark 2.2.1 . Any pointers would be highly helpful
>>
>> Thanks a lot
>>
>> Appu
>>
>>
>


Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
TD,

Thanks a lot for the quick reply :)


Did I understand it right that in the main thread, to wait for the
termination of the context I'll not be able to use
 outStream.awaitTermination()  -  [ since i'll be closing in inside another
thread ]

What would be a good approach to keep the main app long running if I’ve to
restart queries?

Should i just wait for 2.3 where i'll be able to join two structured
streams ( if the release is just a few weeks away )

Appreciate all the help!

thanks
App



On 14 February 2018 at 4:41:52 PM, Tathagata Das (
tathagata.das1...@gmail.com) wrote:

Let me fix my mistake :)
What I suggested in that earlier thread does not work. The streaming query
that joins a streaming dataset with a batch view, does not correctly pick
up when the view is updated. It works only when you restart the query. That
is,
- stop the query
- recreate the dataframes,
- start the query on the new dataframe using the same checkpoint location
as the previous query

Note that you dont need to restart the whole process/cluster/application,
just restart the query in the same process/cluster/application. This should
be very fast (within a few seconds). So, unless you have latency SLAs of 1
second, you can periodically restart the query without restarting the
process.

Apologies for my misdirections in that earlier thread. Hope this helps.

TD

On Wed, Feb 14, 2018 at 2:57 AM, Appu K  wrote:

> More specifically,
>
> Quoting TD from the previous thread
> "Any streaming query that joins a streaming dataframe with the view will
> automatically start using the most updated data as soon as the view is
> updated”
>
> Wondering if I’m doing something wrong in  https://gist.github.com/
> anonymous/90dac8efadca3a69571e619943ddb2f6
>
> My streaming dataframe is not using the updated data, even though the view
> is updated!
>
> Thank you
>
>
> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
>
> Hi,
>
> I had followed the instructions from the thread https://mail-archives.
> apache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-
> 41cd-4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a
> static data frame periodically that gets joined to a structured streaming
> query.
>
> However, the streaming query results does not reflect the data from the
> refreshed static data frame.
>
> Code is here https://gist.github.com/anonymous/
> 90dac8efadca3a69571e619943ddb2f6
>
> I’m using spark 2.2.1 . Any pointers would be highly helpful
>
> Thanks a lot
>
> Appu
>
>


Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Tathagata Das
Let me fix my mistake :)
What I suggested in that earlier thread does not work. The streaming query
that joins a streaming dataset with a batch view, does not correctly pick
up when the view is updated. It works only when you restart the query. That
is,
- stop the query
- recreate the dataframes,
- start the query on the new dataframe using the same checkpoint location
as the previous query

Note that you dont need to restart the whole process/cluster/application,
just restart the query in the same process/cluster/application. This should
be very fast (within a few seconds). So, unless you have latency SLAs of 1
second, you can periodically restart the query without restarting the
process.

Apologies for my misdirections in that earlier thread. Hope this helps.

TD

On Wed, Feb 14, 2018 at 2:57 AM, Appu K  wrote:

> More specifically,
>
> Quoting TD from the previous thread
> "Any streaming query that joins a streaming dataframe with the view will
> automatically start using the most updated data as soon as the view is
> updated”
>
> Wondering if I’m doing something wrong in  https://gist.github.com/
> anonymous/90dac8efadca3a69571e619943ddb2f6
>
> My streaming dataframe is not using the updated data, even though the view
> is updated!
>
> Thank you
>
>
> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
>
> Hi,
>
> I had followed the instructions from the thread https://mail-archives.
> apache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-
> 41cd-4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a
> static data frame periodically that gets joined to a structured streaming
> query.
>
> However, the streaming query results does not reflect the data from the
> refreshed static data frame.
>
> Code is here https://gist.github.com/anonymous/
> 90dac8efadca3a69571e619943ddb2f6
>
> I’m using spark 2.2.1 . Any pointers would be highly helpful
>
> Thanks a lot
>
> Appu
>
>


Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
More specifically,

Quoting TD from the previous thread
"Any streaming query that joins a streaming dataframe with the view will
automatically start using the most updated data as soon as the view is
updated”

Wondering if I’m doing something wrong in
https://gist.github.com/anonymous/90dac8efadca3a69571e619943ddb2f6

My streaming dataframe is not using the updated data, even though the view
is updated!

Thank you


On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:

Hi,

I had followed the instructions from the thread
https://mail-archives.apache.org/mod_mbox/spark-user/201704.mbox/%3cd1315d33-41cd-4ba3-8b77-0879f3669...@qvantel.com%3E
while
trying to reload a static data frame periodically that gets joined to a
structured streaming query.

However, the streaming query results does not reflect the data from the
refreshed static data frame.

Code is here
https://gist.github.com/anonymous/90dac8efadca3a69571e619943ddb2f6

I’m using spark 2.2.1 . Any pointers would be highly helpful

Thanks a lot

Appu