The updateFunction given in updateStateByKey should be called on ALL the
keys are in the state, even if there is no new data in the batch for some
key. Is that not the behavior you see?

What do you mean by "show all the existing states"? You have access to the
latest state RDD by doing stateStream.foreachRDD(...). There you can do
whatever operation on all the key-state pairs.

TD




On Thu, Jul 17, 2014 at 11:58 AM, Yan Fang <yanfang...@gmail.com> wrote:

> Hi TD,
>
> Thank you for the quick replying and backing my approach. :)
>
> 1) The example is this:
>
> 1. In the first 2 second interval, after updateStateByKey, I get a few
> keys and their states, say, ("a" -> 1, "b" -> 2, "c" -> 3)
> 2. In the following 2 second interval, I only receive "c" and "d" and
> their value. But I want to update/display the state of "a" and "b"
> accordingly.
> * It seems I have no way to "access" the "a" and "b" and get their states.
> * also, do I have a way to show all the existing states?
>
> I guess the approach to solve this will be similar to what you mentioned
> for 2). But the difficulty is that, if I want to display all the existing
> states, need to bundle all the rest keys to one key.
>
> Thank you.
>
> Cheers,
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>
>
> On Thu, Jul 17, 2014 at 11:36 AM, Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> For accessing previous version, I would do it the same way. :)
>>
>> 1. Can you elaborate on what you mean by that with an example? What do
>> you mean by "accessing" keys?
>>
>> 2. Yeah, that is hard to do with the ability to do point lookups into an
>> RDD, which we dont support yet. You could try embedding the related key in
>> the values of the keys that need it. That is, B will is present in the
>> value of key A. Then put this transformed DStream through updateStateByKey.
>>
>> TD
>>
>
>

Reply via email to