[GitHub] spark pull request: [DOCS] Added important updateStateByKey detail...

mvogiatzis Mon, 06 Jul 2015 01:38:45 -0700

Github user mvogiatzis commented on the pull request:

    https://github.com/apache/spark/pull/7229#issuecomment-118775714
  
    The documentation above states: "This can be used to maintain arbitrary 
state data for each key" .
    
    I would expect that each key means each incoming key in the batch, as the 
absence of new values for a key would mean no change of the existing state 
(e.g. for better performance). I found out the hard way 
([stackoverflow](http://stackoverflow.com/questions/31204748/spark-broadcast-to-all-keys-updatestatebykey),
 mailing list, local testing).
    
    I can move the description in the method doc above (although feels a big 
lengthy) or better in code, but I feel this information is necessary.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [DOCS] Added important updateStateByKey detail...

Reply via email to