GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/17902

    [SPARK-20641][core] Add key-value store abstraction and LevelDB 
implementation.

    This change adds an abstraction and LevelDB implementation for a key-value
    store that will be used to store UI and SHS data.
    
    The interface is described in KVStore.java (see javadoc). Specifics
    of the LevelDB implementation are discussed in the javadocs of both
    LevelDB.java and LevelDBTypeInfo.java.
    
    Included also are a few small benchmarks just to get some idea of
    latency. Because they're too slow for regular unit test runs, they're
    disabled by default.
    
    Tested with the include unit tests, and also as part of the overall feature
    implementation (including running SHS with hundreds of apps).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark shs-ng/M1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17902
    
----
commit f3b7e0bb9c141058fdbcf202a4b8a47a25237613
Author: Marcelo Vanzin <[email protected]>
Date:   2016-10-03T19:09:18Z

    SHS-NG M1: Add KVStore abstraction, LevelDB implementation.
    
    The interface is described in KVIndex.java (see javadoc). Specifics
    of the LevelDB implementation are discussed in the javadocs of both
    LevelDB.java and LevelDBTypeInfo.java.
    
    Included also are a few small benchmarks just to get some idea of
    latency. Because they're too slow for regular unit test runs, they're
    disabled by default.

commit 52ed2b45c09e7104e4fef5adcf78025f53b7a8e0
Author: Marcelo Vanzin <[email protected]>
Date:   2016-11-01T18:34:25Z

    SHS-NG M1: Add support for arrays when indexing.
    
    This is needed because some UI types have compound keys.

commit 4112afe723f85412035ad3a9c4801b583e74f876
Author: Marcelo Vanzin <[email protected]>
Date:   2016-11-03T22:18:24Z

    SHS-NG M1: Fix counts in LevelDB when updating entries.
    
    Also add unit test. When updating, the code needs to keep track of
    the aggregated delta to be added to each count stored in the db,
    instead of reading the count from the db for each update.

commit 718cabd098dd6a534e7952066cd43f89f6875a14
Author: Marcelo Vanzin <[email protected]>
Date:   2017-03-18T03:17:04Z

    SHS-NG M1: Try to prevent db use after close.
    
    This causes JVM crashes in the leveldb library, so try to avoid it;
    if there are still issues, we'll neeed locking.

commit 45a027fd5e32421b57846236180d6012ee72e69b
Author: Marcelo Vanzin <[email protected]>
Date:   2017-03-24T20:19:07Z

    SHS-NG M1: Use Java 8 lambdas.
    
    Also rename LevelDBIteratorSuite to work around some super weird
    issue with sbt.

commit e592bf69b94c3308d194c2cb678be133931b95b5
Author: Marcelo Vanzin <[email protected]>
Date:   2017-03-25T00:24:08Z

    SHS-NG M1: Compress values stored in LevelDB.
    
    LevelDB has built-in support for snappy compression, but it seems
    to be buggy in the leveldb-jni library; the compression threads
    don't seem to run by default, and when you enable them, there are
    weird issues when stopping the DB.
    
    So just do compression manually using the JRE libraries; it's probably
    a little slower but it saves a good chunk of disk space.

commit 889963f2ffbcb628f9e53e7142fd37931ba09a54
Author: Marcelo Vanzin <[email protected]>
Date:   2017-03-25T01:24:58Z

    SHS-NG M1: Use type aliases as keys in Level DB.
    
    The type name gets repeated a lot in the store, so using it as the prefix
    for every key causes disk usage to grow unnecessarily. Instead, create a
    short alias for the type and keep a mapping of aliases to known types in
    a map in memory; the map is also saved to the database so it can be read
    later.

commit 84ab160699ef8dad4df1fa4cbba29deec7c92c06
Author: Marcelo Vanzin <[email protected]>
Date:   2017-04-03T18:35:50Z

    SHS-NG M1: Separate index introspection from storage.
    
    The new KVTypeInfo class can help with writing different implementations
    of KVStore without duplicating logic from LevelDBTypeInfo.

commit 7b870212e80e70b8c3f3eb4279e3bb9ec0125d2d
Author: Marcelo Vanzin <[email protected]>
Date:   2017-04-26T18:54:33Z

    SHS-NG M1: Remove unused methods from KVStore.
    
    Turns out I ended up not using the raw storage methods in KVStore, so
    this change removes them to simplify the API and save some code.

commit 5197c218525db2ad849dfe77d83dddf2311bb5ad
Author: Marcelo Vanzin <[email protected]>
Date:   2017-05-05T21:36:00Z

    SHS-NG M1: Add "max" and "last" to kvstore iterators.
    
    This makes it easier for callers to control the end of iteration,
    making it easier to write Scala code that automatically closes
    underlying iterator resources. Before, code had to use Scala's
    "takeWhile", convert the result to a list, and manually close the
    iterators; with these two parameters, that can be avoided in a
    bunch of cases, with iterators auto-closing when the last element
    is reached.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to