Ok that patch does fix the key lookup exception. However, curious about the
time validity check..isValidTime (
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L264
)

Why does (time - zerotime) have to be a multiple of slide duration ?
Shouldn't the reduceByKeyAndWindow aggregate every record in a given window
(zeroTime to zeroTime+windowDuration)?


On Tue, Jun 17, 2014 at 10:55 PM, Hatch M <hatchman1...@gmail.com> wrote:

> Thanks! Will try to get the fix and retest.
>
>
> On Tue, Jun 17, 2014 at 5:30 PM, onpoq l <onpo...@gmail.com> wrote:
>
>> There is a bug:
>>
>> https://github.com/apache/spark/pull/961#issuecomment-45125185
>>
>>
>> On Tue, Jun 17, 2014 at 8:19 PM, Hatch M <hatchman1...@gmail.com> wrote:
>> > Trying to aggregate over a sliding window, playing with the slide
>> duration.
>> > Playing around with the slide interval I can see the aggregation works
>> but
>> > mostly fails with the below error. The stream has records coming in at
>> > 100ms.
>> >
>> > JavaPairDStream<String, AggregateObject> aggregatedDStream =
>> > pairsDStream.reduceByKeyAndWindow(aggFunc, new Duration(60000), new
>> > Duration(600000));
>> >
>> > 14/06/18 00:14:46 INFO dstream.ShuffledDStream: Time 1403050486900 ms is
>> > invalid as zeroTime is 1403050485800 ms and slideDuration is 60000 ms
>> and
>> > difference is 1100 ms
>> > 14/06/18 00:14:46 ERROR actor.OneForOneStrategy: key not found:
>> > 1403050486900 ms
>> > java.util.NoSuchElementException: key not found: 1403050486900 ms
>> > at scala.collection.MapLike$class.default(MapLike.scala:228)
>> >
>> > Any hints on whats going on here?
>> > Thanks!
>> > Hatch
>> >
>>
>
>

Reply via email to