GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/18107

    [SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to 
choose implementation

    
    ## What changes were proposed in this pull request?
    
    A bunch of changes to the StateStore APIs and implementation.
    Current state store API has a bunch of problems that causes too many 
transient objects causing memory pressure.
    - `StateStore.get(): Option` forces creation of Some/None objects for every 
get. Changed this to return the row or null.
    - `StateStore.iterator(): (UnsafeRow, UnsafeRow)` forces creation of new 
tuple for each record returned. Changed this to return a UnsafeRowTuple which 
can be reused across records.
    - `StateStore.updates()` requires the implementation to keep track of 
updates, while this is used minimally (only by Append mode in streaming 
aggregations). Removed this.
    
    Additionally,
    - Added a configuration that allows the user to specify which 
implementation to use. 
    - Added new metrics to understand the time taken to update keys, remove 
keys and commit all changes to the state store. These metrics will be visible 
on the plan diagram in the SQL tab of the UI.
    - Refactored unit tests such that they can be reused to test any 
implementation of StateStore.
    
    ## How was this patch tested?
    Old and new unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-20376

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18107
    
----
commit 03f5bf3f1fc4e6d60b43d7c05a3cdc6dddcbd1af
Author: Tathagata Das <[email protected]>
Date:   2017-05-25T10:55:09Z

    Refactored StateStore APIs and added conf to choose StateStore
    implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to