Re: Can mapWithState state func be called every batchInterval?
Actually each element of mapwithstate has a time out component. You can write a function to "treat" your time out. You can match it with your batch size and do fun stuff when the batch ends. People do session management with the same approach. When activity is registered the session is refreshed, and the session is deleted("one way to treat it") when time out happens. ..Mana -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-mapWithState-state-func-be-called-every-batchInterval-tp27877p27898.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Can mapWithState state func be called every batchInterval?
What are you expecting? If you want to update every key on every batch, it's going to be linear on the number of keys... there's no real way around that. On Tue, Oct 11, 2016 at 9:49 AM, Daan Debie wrote: > That's nice and all, but I'd rather have a solution involving mapWithState > of course :) I'm just wondering why it doesn't support this use case yet. > > On Tue, Oct 11, 2016 at 3:41 PM, Cody Koeninger wrote: >> >> They're telling you not to use the old function because it's linear on the >> total number of keys, not keys in the batch, so it's slow. >> >> But if that's what you really want, go ahead and do it, and see if it >> performs well enough. >> >> >> On Oct 11, 2016 6:28 AM, "DandyDev" wrote: >> >> Hi there, >> >> I've built a Spark Streaming app that accepts certain events from Kafka, >> and >> I want to keep some state between the events. So I've successfully used >> mapWithState for that. The problem is, that I want the state for keys to >> be >> updated on every batchInterval, because "lack" of events is also >> significant >> to the use case. This doesn't seem possible with mapWithState, unless I'm >> missing something. >> >> Previously I looked at updateStateByKey, which says: >> > In every batch, Spark will apply the state update function for all >> > existing keys, regardless of whether they have new data in a batch or >> > not. >> >> That is what I want, however, I've seen several tutorials/blog posts where >> the advise was not to use updateStateByKey anymore, and use mapWithState >> instead. >> >> So my questions: >> >> - Can mapWithState state function be called every batchInterval, even when >> no events exist for that interval? >> - If not, is it okay to use updateStateByKey instead? Or will it be >> deprecated in the near future? >> - If mapWithState doesn't support my need, is there another way to >> accomplish the goal of updating state every batchInterval, that still uses >> mapWithState, together with some other mechanism? >> >> Thanks in advance! >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Can-mapWithState-state-func-be-called-every-batchInterval-tp27877.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Can mapWithState state func be called every batchInterval?
That's nice and all, but I'd rather have a solution involving mapWithState of course :) I'm just wondering why it doesn't support this use case yet. On Tue, Oct 11, 2016 at 3:41 PM, Cody Koeninger wrote: > They're telling you not to use the old function because it's linear on the > total number of keys, not keys in the batch, so it's slow. > > But if that's what you really want, go ahead and do it, and see if it > performs well enough. > > On Oct 11, 2016 6:28 AM, "DandyDev" wrote: > > Hi there, > > I've built a Spark Streaming app that accepts certain events from Kafka, > and > I want to keep some state between the events. So I've successfully used > mapWithState for that. The problem is, that I want the state for keys to be > updated on every batchInterval, because "lack" of events is also > significant > to the use case. This doesn't seem possible with mapWithState, unless I'm > missing something. > > Previously I looked at updateStateByKey, which says: > > In every batch, Spark will apply the state update function for all > > existing keys, regardless of whether they have new data in a batch or > not. > > That is what I want, however, I've seen several tutorials/blog posts where > the advise was not to use updateStateByKey anymore, and use mapWithState > instead. > > So my questions: > > - Can mapWithState state function be called every batchInterval, even when > no events exist for that interval? > - If not, is it okay to use updateStateByKey instead? Or will it be > deprecated in the near future? > - If mapWithState doesn't support my need, is there another way to > accomplish the goal of updating state every batchInterval, that still uses > mapWithState, together with some other mechanism? > > Thanks in advance! > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Can-mapWithState-state-func-be-called- > every-batchInterval-tp27877.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >
Re: Can mapWithState state func be called every batchInterval?
They're telling you not to use the old function because it's linear on the total number of keys, not keys in the batch, so it's slow. But if that's what you really want, go ahead and do it, and see if it performs well enough. On Oct 11, 2016 6:28 AM, "DandyDev" wrote: Hi there, I've built a Spark Streaming app that accepts certain events from Kafka, and I want to keep some state between the events. So I've successfully used mapWithState for that. The problem is, that I want the state for keys to be updated on every batchInterval, because "lack" of events is also significant to the use case. This doesn't seem possible with mapWithState, unless I'm missing something. Previously I looked at updateStateByKey, which says: > In every batch, Spark will apply the state update function for all > existing keys, regardless of whether they have new data in a batch or not. That is what I want, however, I've seen several tutorials/blog posts where the advise was not to use updateStateByKey anymore, and use mapWithState instead. So my questions: - Can mapWithState state function be called every batchInterval, even when no events exist for that interval? - If not, is it okay to use updateStateByKey instead? Or will it be deprecated in the near future? - If mapWithState doesn't support my need, is there another way to accomplish the goal of updating state every batchInterval, that still uses mapWithState, together with some other mechanism? Thanks in advance! -- View this message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/Can-mapWithState-state-func-be- called-every-batchInterval-tp27877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org