Re: updateStateByKey performance & API

2015-03-23 Thread Andre Schumacher
Forwarded Message ---- >Subject: Re: updateStateByKey performance & API >Date: Wed, 18 Mar 2015 13:06:15 +0200 >From: Nikos Viorres >To: Akhil Das >CC: user@spark.apache.org > >Hi Akhil, > >Yes, that's what we are planning on doing at the end of the data. At the

Re: updateStateByKey performance & API

2015-03-18 Thread Nikos Viorres
ally >> bad. >> Is there a possibility of implementing in the future and extra call in the >> API for updating only a specific subset of keys? >> >> p.s. i will try asap to setting the dstream as non-serialized but then i >> am >> worried about GC and check

Re: updateStateByKey performance & API

2015-03-18 Thread Akhil Das
am > worried about GC and checkpointing performance > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/updateStateByKey-performance-API-tp22113.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > &

updateStateByKey performance & API

2015-03-17 Thread nvrs
performance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/updateStateByKey-performance-API-tp22113.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To

updateStateByKey performance / API

2015-03-17 Thread Nikos Viorres
Hi all, We are having a few issues with the performance of updateStateByKey operation in Spark Streaming (1.2.1 at the moment) and any advice would be greatly appreciated. Specifically, on each tick of the system (which is set at 10 secs) we need to update a state tuple where the key is the user_i