Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2
After comparing with previous code, I got it work by making the return a Some instead of Tuple2. Perhaps some day I will understand this. spr wrote --code val updateDnsCount = (values: Seq[(Int, Time)], state: Option[(Int, Time)]) = { val currentCount = if (values.isEmpty) 0 else values.map( x = x._1).sum val newMinTime = if (values.isEmpty) Time(Long.MaxValue) else values.map( x = x._2).min val (previousCount, minTime) = state.getOrElse((0, Time(System.currentTimeMillis))) // (currentCount + previousCount, Seq(minTime, newMinTime).min) == old Some(currentCount + previousCount, Seq(minTime, newMinTime).min) // == new } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/overloaded-method-value-updateStateByKey-cannot-be-applied-to-when-Key-is-a-Tuple2-tp18644p18750.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2
My understanding is that the reason you have an Option is so you could filter out tuples when None is returned. This way your state data won't grow forever. -Original Message- From: spr [mailto:s...@yarcdata.com] Sent: November-12-14 2:25 PM To: u...@spark.incubator.apache.org Subject: Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2 After comparing with previous code, I got it work by making the return a Some instead of Tuple2. Perhaps some day I will understand this. spr wrote --code val updateDnsCount = (values: Seq[(Int, Time)], state: Option[(Int, Time)]) = { val currentCount = if (values.isEmpty) 0 else values.map( x = x._1).sum val newMinTime = if (values.isEmpty) Time(Long.MaxValue) else values.map( x = x._2).min val (previousCount, minTime) = state.getOrElse((0, Time(System.currentTimeMillis))) // (currentCount + previousCount, Seq(minTime, newMinTime).min) == old Some(currentCount + previousCount, Seq(minTime, newMinTime).min) // == new } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/overloaded-method-value-updateStateByKey-cannot-be-applied-to-when-Key-is-a-Tuple2-tp18644p18750.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2
Adrian, do you know if this is documented somewhere? I was also under the impression that setting a key's value to None would cause the key to be discarded (without any explicit filtering on the user's part) but can not find any official documentation to that effect On Wed, Nov 12, 2014 at 2:43 PM, Adrian Mocanu amoc...@verticalscope.com wrote: My understanding is that the reason you have an Option is so you could filter out tuples when None is returned. This way your state data won't grow forever. -Original Message- From: spr [mailto:s...@yarcdata.com] Sent: November-12-14 2:25 PM To: u...@spark.incubator.apache.org Subject: Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2 After comparing with previous code, I got it work by making the return a Some instead of Tuple2. Perhaps some day I will understand this. spr wrote --code val updateDnsCount = (values: Seq[(Int, Time)], state: Option[(Int, Time)]) = { val currentCount = if (values.isEmpty) 0 else values.map( x = x._1).sum val newMinTime = if (values.isEmpty) Time(Long.MaxValue) else values.map( x = x._2).min val (previousCount, minTime) = state.getOrElse((0, Time(System.currentTimeMillis))) // (currentCount + previousCount, Seq(minTime, newMinTime).min) == old Some(currentCount + previousCount, Seq(minTime, newMinTime).min) // == new } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/overloaded-method-value-updateStateByKey-cannot-be-applied-to-when-Key-is-a-Tuple2-tp18644p18750.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2
You are correct; the filtering I’m talking about is done implicitly. You don’t have to do it yourself. Spark will do it for you and remove those entries from the state collection. From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com] Sent: November-12-14 3:50 PM To: Adrian Mocanu Cc: spr; u...@spark.incubator.apache.org Subject: Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2 Adrian, do you know if this is documented somewhere? I was also under the impression that setting a key's value to None would cause the key to be discarded (without any explicit filtering on the user's part) but can not find any official documentation to that effect On Wed, Nov 12, 2014 at 2:43 PM, Adrian Mocanu amoc...@verticalscope.commailto:amoc...@verticalscope.com wrote: My understanding is that the reason you have an Option is so you could filter out tuples when None is returned. This way your state data won't grow forever. -Original Message- From: spr [mailto:s...@yarcdata.commailto:s...@yarcdata.com] Sent: November-12-14 2:25 PM To: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2 After comparing with previous code, I got it work by making the return a Some instead of Tuple2. Perhaps some day I will understand this. spr wrote --code val updateDnsCount = (values: Seq[(Int, Time)], state: Option[(Int, Time)]) = { val currentCount = if (values.isEmpty) 0 else values.map( x = x._1).sum val newMinTime = if (values.isEmpty) Time(Long.MaxValue) else values.map( x = x._2).min val (previousCount, minTime) = state.getOrElse((0, Time(System.currentTimeMillis))) // (currentCount + previousCount, Seq(minTime, newMinTime).min) == old Some(currentCount + previousCount, Seq(minTime, newMinTime).min) // == new } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/overloaded-method-value-updateStateByKey-cannot-be-applied-to-when-Key-is-a-Tuple2-tp18644p18750.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2
I'm missing something simpler (I think). That is, why do I need a Some instead of Tuple2? Because a Some might or might not be there, but a Tuple2 must be there? Or something like that? From: Adrian Mocanu amoc...@verticalscope.commailto:amoc...@verticalscope.com You are correct; the filtering I’m talking about is done implicitly. You don’t have to do it yourself. Spark will do it for you and remove those entries from the state collection. From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com] Adrian, do you know if this is documented somewhere? I was also under the impression that setting a key's value to None would cause the key to be discarded (without any explicit filtering on the user's part) but can not find any official documentation to that effect On Wed, Nov 12, 2014 at 2:43 PM, Adrian Mocanu amoc...@verticalscope.commailto:amoc...@verticalscope.com wrote: My understanding is that the reason you have an Option is so you could filter out tuples when None is returned. This way your state data won't grow forever. -Original Message- From: spr After comparing with previous code, I got it work by making the return a Some instead of Tuple2. Perhaps some day I will understand this.