[GitHub] spark pull request: Branch 1.6

magictips Mon, 22 Feb 2016 08:18:34 -0800

GitHub user magictips opened a pull request:

    https://github.com/apache/spark/pull/11304


    Branch 1.6

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was the this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11304
    
----
commit 82a71aba043a0b1ed50168d2b5b312c79b8c8fa3
Author: gatorsmile <[email protected]>
Date:   2015-12-06T19:15:02Z

    [SPARK-12138][SQL] Escape \u in the generated comments of codegen
    
    When \u appears in a comment block (i.e. in /**/), code gen will break. So, 
in Expression and CodegenFallback, we escape \u to \\u.
    
    yhuai Please review it. I did reproduce it and it works after the fix. 
Thanks!
    
    Author: gatorsmile <[email protected]>
    
    Closes #10155 from gatorsmile/escapeU.
    
    (cherry picked from commit 49efd03bacad6060d99ed5e2fe53ba3df1d1317e)
    Signed-off-by: Yin Huai <[email protected]>

commit c54b698ecc284bce9b80c40ba46008bd6321c812
Author: Burak Yavuz <[email protected]>
Date:   2015-12-07T08:21:55Z

    [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when 
Jenkins load is high
    
    We need to make sure that the last entry is indeed the last entry in the 
queue.
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #10110 from brkyvz/batch-wal-test-fix.
    
    (cherry picked from commit 6fd9e70e3ed43836a0685507fff9949f921234f4)
    Signed-off-by: Tathagata Das <[email protected]>

commit 3f230f7b331cf6d67426cece570af3f1340f526e
Author: Sun Rui <[email protected]>
Date:   2015-12-07T18:38:17Z

    [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.
    
    This PR:
    1. Suppress all known warnings.
    2. Cleanup test cases and fix some errors in test cases.
    3. Fix errors in HiveContext related test cases. These test cases are 
actually not run previously due to a bug of creating TestHiveContext.
    4. Support 'testthat' package version 0.11.0 which prefers that test cases 
be under 'tests/testthat'
    5. Make sure the default Hadoop file system is local when running test 
cases.
    6. Turn on warnings into errors.
    
    Author: Sun Rui <[email protected]>
    
    Closes #10030 from sun-rui/SPARK-12034.
    
    (cherry picked from commit 39d677c8f1ee7ebd7e142bec0415cf8f90ac84b6)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit fed453821d81470b9035d33e36fa6ef1df99c0de
Author: Davies Liu <[email protected]>
Date:   2015-12-07T19:00:25Z

    [SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler
    
    Currently, the current line is not cleared by Cltr-C
    
    After this patch
    ```
    >>> asdfasdf^C
    Traceback (most recent call last):
      File "~/spark/python/pyspark/context.py", line 225, in signal_handler
        raise KeyboardInterrupt()
    KeyboardInterrupt
    ```
    
    It's still worse than 1.5 (and before).
    
    Author: Davies Liu <[email protected]>
    
    Closes #10134 from davies/fix_cltrc.
    
    (cherry picked from commit ef3f047c07ef0ac4a3a97e6bc11e1c28c6c8f9a0)
    Signed-off-by: Davies Liu <[email protected]>

commit 539914f1a8d3a0f59e67c178f86e741927e7a658
Author: Tathagata Das <[email protected]>
Date:   2015-12-07T19:03:59Z

    [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner 
not present
    
    The reason is that TrackStateRDDs generated by trackStateByKey expect the 
previous batch's TrackStateRDDs to have a partitioner. However, when recovery 
from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a 
partitioner attached to it. This is because RDD checkpoints do not preserve the 
partitioner (SPARK-12004).
    
    While #9983 solves SPARK-12004 by preserving the partitioner through RDD 
checkpoints, there may be a non-zero chance that the saving and recovery fails. 
To be resilient, this PR repartitions the previous state RDD if the partitioner 
is not detected.
    
    Author: Tathagata Das <[email protected]>
    
    Closes #9988 from tdas/SPARK-11932.
    
    (cherry picked from commit 5d80d8c6a54b2113022eff31187e6d97521bd2cf)
    Signed-off-by: Tathagata Das <[email protected]>

commit c8aa5f2011cf30a360d5206ee45202c4b1d61e21
Author: Xusen Yin <[email protected]>
Date:   2015-12-07T21:16:47Z

    [SPARK-11963][DOC] Add docs for QuantileDiscretizer
    
    https://issues.apache.org/jira/browse/SPARK-11963
    
    Author: Xusen Yin <[email protected]>
    
    Closes #9962 from yinxusen/SPARK-11963.
    
    (cherry picked from commit 871e85d9c14c6b19068cc732951a8ae8db61b411)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit cdeb89b34614fb39062976c4796d187992333c88
Author: Andrew Ray <[email protected]>
Date:   2015-12-07T23:01:00Z

    [SPARK-12184][PYTHON] Make python api doc for pivot consistant with scala 
doc
    
    In SPARK-11946 the API for pivot was changed a bit and got updated doc, the 
doc changes were not made for the python api though. This PR updates the python 
doc to be consistent.
    
    Author: Andrew Ray <[email protected]>
    
    Closes #10176 from aray/sql-pivot-python-doc.
    
    (cherry picked from commit 36282f78b888743066843727426c6d806231aa97)
    Signed-off-by: Yin Huai <[email protected]>

commit 115bfbdae82b1c2804ea501ffd420d0aa17aac45
Author: Joseph K. Bradley <[email protected]>
Date:   2015-12-08T00:37:09Z

    [SPARK-12160][MLLIB] Use SQLContext.getOrCreate in MLlib
    
    Switched from using SQLContext constructor to using getOrCreate, mainly in 
model save/load methods.
    
    This covers all instances in spark.mllib.  There were no uses of the 
constructor in spark.ml.
    
    CC: mengxr yhuai
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #10161 from jkbradley/mllib-sqlcontext-fix.
    
    (cherry picked from commit 3e7e05f5ee763925ed60410d7de04cf36b723de1)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 3c683ed5ffe704a6fec7c6d434eeed784276470d
Author: somideshmukh <[email protected]>
Date:   2015-12-08T07:26:34Z

    [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using 
include_example
    
    Made new patch contaning only markdown examples moved to exmaple/folder.
    Ony three  java code were not shfted since they were contaning compliation 
error ,these classes are
    1)StandardScale 2)NormalizerExample 3)VectorIndexer
    
    Author: Xusen Yin <[email protected]>
    Author: somideshmukh <[email protected]>
    
    Closes #10002 from somideshmukh/SomilBranch1.33.
    
    (cherry picked from commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 8652fc03c21f79b41ce13f41991feba11fc7b29c
Author: Takahashi Hiroshi <[email protected]>
Date:   2015-12-08T07:46:55Z

    [SPARK-10259][ML] Add @since annotation to ml.classification
    
    Add since annotation to ml.classification
    
    Author: Takahashi Hiroshi <[email protected]>
    
    Closes #8534 from taishi-oss/issue10259.
    
    (cherry picked from commit 7d05a624510f7299b3dd07f87c203db1ff7caa3e)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 5c8216920b4110d8fc4329e1fe52543ee17c4a54
Author: Yanbo Liang <[email protected]>
Date:   2015-12-08T07:50:57Z

    [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example 
code
    
    Add ```SQLTransformer``` user guide, example code and make Scala API doc 
more clear.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10006 from yanboliang/spark-11958.
    
    (cherry picked from commit 4a39b5a1bee28cec792d509654f6236390cafdcb)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit c9e5274ae3d8e6967bee240ec0b7ba17cd15d34e
Author: cody koeninger <[email protected]>
Date:   2015-12-08T11:02:35Z

    [SPARK-12103][STREAMING][KAFKA][DOC] document that K means Key and V â¦
    
    â¦means Value
    
    Author: cody koeninger <[email protected]>
    
    Closes #10132 from koeninger/SPARK-12103.
    
    (cherry picked from commit 48a9804b2ad89b3fb204c79f0dbadbcfea15d8dc)
    Signed-off-by: Sean Owen <[email protected]>

commit 870f435628b7c0eac5f6c45fa19b14ab5289c657
Author: Jeff Zhang <[email protected]>
Date:   2015-12-08T11:05:06Z

    [SPARK-12166][TEST] Unset hadoop related environment in testing
    
    Author: Jeff Zhang <[email protected]>
    
    Closes #10172 from zjffdu/SPARK-12166.
    
    (cherry picked from commit 708129187a460aca30790281e9221c0cd5e271df)
    Signed-off-by: Sean Owen <[email protected]>

commit 8a791a3273039602f91ae311b612eeaeca10ddc7
Author: Cheng Lian <[email protected]>
Date:   2015-12-08T11:18:59Z

    [SPARK-11551][DOC][EXAMPLE] Revert PR #10002
    
    This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819.
    
    The original PR wasn't tested on Jenkins before being merged.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #10200 from liancheng/revert-pr-10002.
    
    (cherry picked from commit da2012a0e152aa078bdd19a5c7f91786a2dd7016)
    Signed-off-by: Cheng Lian <[email protected]>

commit c8f9eb749afb825b99a04b0e8f1e9311c5c6c944
Author: Sean Owen <[email protected]>
Date:   2015-12-08T14:34:47Z

    [SPARK-11652][CORE] Remote code execution with InvokerTransformer
    
    Fix commons-collection group ID to commons-collections for version 3.x
    
    Patches earlier PR at https://github.com/apache/spark/pull/9731
    
    Author: Sean Owen <[email protected]>
    
    Closes #10198 from srowen/SPARK-11652.2.
    
    (cherry picked from commit e3735ce1602826f0a8e0ca9e08730923843449ee)
    Signed-off-by: Sean Owen <[email protected]>

commit 8ef33aa1f6d3dc8772c9277a5372a991765af1b3
Author: Wenchen Fan <[email protected]>
Date:   2015-12-08T18:13:40Z

    [SPARK-12201][SQL] add type coercion rule for greatest/least
    
    checked with hive, greatest/least should cast their children to a tightest 
common type,
    i.e. `(int, long) => long`, `(int, string) => error`, `(decimal(10,5), 
decimal(5, 10)) => error`
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #10196 from cloud-fan/type-coercion.
    
    (cherry picked from commit 381f17b540d92507cc07adf18bce8bc7e5ca5407)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 9eeb0f25abd4bd528a5363fda60b1cd1eb34d05b
Author: gatorsmile <[email protected]>
Date:   2015-12-08T18:15:58Z

    [SPARK-12195][SQL] Adding BigDecimal, Date and Timestamp into Encoder
    
    This PR is to add three more data types into Encoder, including 
`BigDecimal`, `Date` and `Timestamp`.
    
    marmbrus cloud-fan rxin Could you take a quick look at these three types? 
Not sure if it can be merged to 1.6. Thank you very much!
    
    Author: gatorsmile <[email protected]>
    
    Closes #10188 from gatorsmile/dataTypesinEncoder.
    
    (cherry picked from commit c0b13d5565c45ae2acbe8cfb17319c92b6a634e4)
    Signed-off-by: Michael Armbrust <[email protected]>

commit be0fe9b450f1bb87b9ce2e0ea153dc496d66a664
Author: gatorsmile <[email protected]>
Date:   2015-12-08T18:25:57Z

    [SPARK-12188][SQL] Code refactoring and comment correction in Dataset APIs
    
    This PR contains the following updates:
    
    - Created a new private variable `boundTEncoder` that can be shared by 
multiple functions, `RDD`, `select` and `collect`.
    - Replaced all the `queryExecution.analyzed` by the function call 
`logicalPlan`
    - A few API comments are using wrong class names (e.g., `DataFrame`) or 
parameter names (e.g., `n`)
    - A few API descriptions are wrong. (e.g., `mapPartitions`)
    
    marmbrus rxin cloud-fan Could you take a look and check if they are 
appropriate? Thank you!
    
    Author: gatorsmile <[email protected]>
    
    Closes #10184 from gatorsmile/datasetClean.
    
    (cherry picked from commit 5d96a710a5ed543ec81e383620fc3b2a808b26a1)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 1c8451b5e71508b974db965502db6dc3b1b4b4c0
Author: Yuhao Yang <[email protected]>
Date:   2015-12-08T18:29:51Z

    [SPARK-10393] use ML pipeline in LDA example
    
    jira: https://issues.apache.org/jira/browse/SPARK-10393
    
    Since the logic of the text processing part has been moved to ML 
estimators/transformers, replace the related code in LDA Example with the ML 
pipeline.
    
    Author: Yuhao Yang <[email protected]>
    Author: yuhaoyang <[email protected]>
    
    Closes #8551 from hhbyyh/ldaExUpdate.
    
    (cherry picked from commit 872a2ee281d84f40a786f765bf772cdb06e8c956)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 9145bfb814c5f53c5e9c0de7e0d6b7aca99c7341
Author: Andrew Ray <[email protected]>
Date:   2015-12-08T18:52:17Z

    [SPARK-12205][SQL] Pivot fails Analysis when aggregate is UnresolvedFunction
    
    Delays application of ResolvePivot until all aggregates are resolved to 
prevent problems with UnresolvedFunction and adds unit test
    
    Author: Andrew Ray <[email protected]>
    
    Closes #10202 from aray/sql-pivot-unresolved-function.
    
    (cherry picked from commit 4bcb894948c1b7294d84e2bf58abb1d79e6759c6)
    Signed-off-by: Yin Huai <[email protected]>

commit 7e45feb005966f6cdf66c4d19223286acf92cc28
Author: Yuhao Yang <[email protected]>
Date:   2015-12-08T19:46:26Z

    [SPARK-11605][MLLIB] ML 1.6 QA: API: Java compatibility, docs
    
    jira: https://issues.apache.org/jira/browse/SPARK-11605
    Check Java compatibility for MLlib for this release.
    
    fix:
    
    1. `StreamingTest.registerStream` needs java friendly interface.
    
    2. `GradientBoostedTreesModel.computeInitialPredictionAndError` and 
`GradientBoostedTreesModel.updatePredictionError` has java compatibility issue. 
Mark them as `developerAPI`.
    
    TBD:
    [updated] no fix for now per discussion.
    `org.apache.spark.mllib.classification.LogisticRegressionModel`
    `public scala.Option<java.lang.Object> getThreshold();` has wrong return 
type for Java invocation.
    `SVMModel` has the similar issue.
    
    Yet adding a `scala.Option<java.util.Double> getThreshold()` would result 
in an overloading error due to the same function signature. And adding a new 
function with different name seems to be not necessary.
    
    cc jkbradley feynmanliang
    
    Author: Yuhao Yang <[email protected]>
    
    Closes #10102 from hhbyyh/javaAPI.
    
    (cherry picked from commit 5cb4695051e3dac847b1ea14d62e54dcf672c31c)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 3e31e7e245dba2c16be044e2f13b786e8608bd07
Author: BenFradet <[email protected]>
Date:   2015-12-08T20:45:34Z

    [SPARK-12159][ML] Add user guide section for IndexToString transformer
    
    Documentation regarding the `IndexToString` label transformer with code 
snippets in Scala/Java/Python.
    
    Author: BenFradet <[email protected]>
    
    Closes #10166 from BenFradet/SPARK-12159.
    
    (cherry picked from commit 06746b3005e5e9892d0314bee3bfdfaebc36d3d4)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 25249d1ece9fe3a57b753e37bbbe0d3a957a8304
Author: Andrew Or <[email protected]>
Date:   2015-12-08T22:34:15Z

    [SPARK-12187] *MemoryPool classes should not be fully public
    
    This patch tightens them to `private[memory]`.
    
    Author: Andrew Or <[email protected]>
    
    Closes #10182 from andrewor14/memory-visibility.
    
    (cherry picked from commit 9494521695a1f1526aae76c0aea34a3bead96251)
    Signed-off-by: Josh Rosen <[email protected]>

commit 2a5e4d157c13e67d7301a8c1214accf31256cb9d
Author: Michael Armbrust <[email protected]>
Date:   2015-12-08T23:58:35Z

    [SPARK-12069][SQL] Update documentation with Datasets
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #10060 from marmbrus/docs.
    
    (cherry picked from commit 39594894232e0b70c5ca8b0df137da0d61223fd5)
    Signed-off-by: Michael Armbrust <[email protected]>

commit b1d5a7859546eabdc7cf070b3e78d91389a8fbd6
Author: Timothy Hunter <[email protected]>
Date:   2015-12-09T02:40:21Z

    [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide
    
    This PR moves pieces of the spark.ml user guide to reflect suggestions in 
SPARK-8517. It does not introduce new content, as requested.
    
    <img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" 
src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png";>
    
    Author: Timothy Hunter <[email protected]>
    
    Closes #10207 from thunterdb/spark-8517.
    
    (cherry picked from commit 765c67f5f2e0b1367e37883f662d313661e3a0d9)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 9e82273afc68947dc2a08315e0d42cfcedacaa2a
Author: Dominik Dahlem <[email protected]>
Date:   2015-12-09T02:54:10Z

    [SPARK-11343][ML] Documentation of float and double prediction/label 
columns in RegressionEvaluator
    
    felixcheung , mengxr
    
    Just added a message to require()
    
    Author: Dominik Dahlem <[email protected]>
    
    Closes #9598 from 
dahlem/ddahlem_regression_evaluator_double_predictions_message_04112015.
    
    (cherry picked from commit a0046e379bee0852c39ece4ea719cde70d350b0e)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 0be792aad5d01432e989a03969541f41a45281e2
Author: Fei Wang <[email protected]>
Date:   2015-12-09T05:32:31Z

    [SPARK-12222] [CORE] Deserialize RoaringBitmap using Kryo serializer throw 
Buffer underflow exception
    
    Jira: https://issues.apache.org/jira/browse/SPARK-12222
    
    Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow 
exception:
    ```
    com.esotericsoftware.kryo.KryoException: Buffer underflow.
        at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
        at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
        at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
    ```
    
    This is caused by a bug of kryo's `Input.skip(long 
count)`(https://github.com/EsotericSoftware/kryo/issues/119) and we call this 
method in `KryoInputDataInputBridge`.
    
    Instead of upgrade kryo's version, this pr bypass the  kryo's 
`Input.skip(long count)` by directly call another `skip` method in kryo's 
Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124),
 i.e. write the bug-fixed version of `Input.skip(long count)` in 
KryoInputDataInputBridge's `skipBytes` method.
    
    more detail link to 
https://github.com/apache/spark/pull/9748#issuecomment-162860246
    
    Author: Fei Wang <[email protected]>
    
    Closes #10213 from scwf/patch-1.
    
    (cherry picked from commit 3934562d34bbe08d91c54b4bbee27870e93d7571)
    Signed-off-by: Davies Liu <[email protected]>

commit b5a76b4a40e043c5384be7c620e7ca257b7ef2cd
Author: uncleGen <[email protected]>
Date:   2015-12-09T15:09:40Z

    [SPARK-12031][CORE][BUG] Integer overflow when do sampling
    
    Author: uncleGen <[email protected]>
    
    Closes #10023 from uncleGen/1.6-bugfix.
    
    (cherry picked from commit a113216865fd45ea39ae8f104e784af2cf667dcf)
    Signed-off-by: Sean Owen <[email protected]>

commit acd462420ab5565ba5bf098f399fb355da3d6139
Author: Holden Karau <[email protected]>
Date:   2015-12-09T16:45:13Z

    [SPARK-10299][ML] word2vec should allow users to specify the window size
    
    Currently word2vec has the window hard coded at 5, some users may want 
different sizes (for example if using on n-gram input or similar). User request 
comes from 
http://stackoverflow.com/questions/32231975/spark-word2vec-window-size .
    
    Author: Holden Karau <[email protected]>
    Author: Holden Karau <[email protected]>
    
    Closes #8513 from 
holdenk/SPARK-10299-word2vec-should-allow-users-to-specify-the-window-size.
    
    (cherry picked from commit 22b9a8740d51289434553d19b6b1ac34aecdc09a)
    Signed-off-by: Sean Owen <[email protected]>

commit 05e441e121a86e0c105ad25010e4678f2f9e73e3
Author: Josh Rosen <[email protected]>
Date:   2015-12-09T19:39:59Z

    [SPARK-12165][SPARK-12189] Fix bugs in eviction of storage memory by 
execution
    
    This patch fixes a bug in the eviction of storage memory by execution.
    
    ## The bug:
    
    In general, execution should be able to evict storage memory when the total 
storage memory usage is greater than `maxMemory * 
spark.memory.storageFraction`. Due to a bug, however, Spark might wind up 
evicting no storage memory in certain cases where the storage memory usage was 
between `maxMemory * spark.memory.storageFraction` and `maxMemory`. For 
example, here is a regression test which illustrates the bug:
    
    ```scala
        val maxMemory = 1000L
        val taskAttemptId = 0L
        val (mm, ms) = makeThings(maxMemory)
        // Since we used the default storage fraction (0.5), we should be able 
to allocate 500 bytes
        // of storage memory which are immune to eviction by execution memory 
pressure.
    
        // Acquire enough storage memory to exceed the storage region size
        assert(mm.acquireStorageMemory(dummyBlock, 750L, evictedBlocks))
        assertEvictBlocksToFreeSpaceNotCalled(ms)
        assert(mm.executionMemoryUsed === 0L)
        assert(mm.storageMemoryUsed === 750L)
    
        // At this point, storage is using 250 more bytes of memory than it is 
guaranteed, so execution
        // should be able to reclaim up to 250 bytes of storage memory.
        // Therefore, execution should now be able to require up to 500 bytes 
of memory:
        assert(mm.acquireExecutionMemory(500L, taskAttemptId, 
MemoryMode.ON_HEAP) === 500L) // <--- fails by only returning 250L
        assert(mm.storageMemoryUsed === 500L)
        assert(mm.executionMemoryUsed === 500L)
        assertEvictBlocksToFreeSpaceCalled(ms, 250L)
    ```
    
    The problem relates to the control flow / interaction between 
`StorageMemoryPool.shrinkPoolToReclaimSpace()` and 
`MemoryStore.ensureFreeSpace()`. While trying to allocate the 500 bytes of 
execution memory, the `UnifiedMemoryManager` discovers that it will need to 
reclaim 250 bytes of memory from storage, so it calls 
`StorageMemoryPool.shrinkPoolToReclaimSpace(250L)`. This method, in turn, calls 
`MemoryStore.ensureFreeSpace(250L)`. However, `ensureFreeSpace()` first checks 
whether the requested space is less than `maxStorageMemory - 
storageMemoryUsed`, which will be true if there is any free execution memory 
because it turns out that `MemoryStore.maxStorageMemory = (maxMemory - 
onHeapExecutionMemoryPool.memoryUsed)` when the `UnifiedMemoryManager` is used.
    
    The control flow here is somewhat confusing (it grew to be messy / 
confusing over time / as a result of the merging / refactoring of several 
components). In the pre-Spark 1.6 code, `ensureFreeSpace` was called directly 
by the `MemoryStore` itself, whereas in 1.6 it's involved in a confusing 
control flow where `MemoryStore` calls `MemoryManager.acquireStorageMemory`, 
which then calls back into `MemoryStore.ensureFreeSpace`, which, in turn, calls 
`MemoryManager.freeStorageMemory`.
    
    ## The solution:
    
    The solution implemented in this patch is to remove the confusing circular 
control flow between `MemoryManager` and `MemoryStore`, making the storage 
memory acquisition process much more linear / straightforward. The key changes:
    
    - Remove a layer of inheritance which made the memory manager code harder 
to understand (53841174760a24a0df3eb1562af1f33dbe340eb9).
    - Move some bounds checks earlier in the call chain 
(13ba7ada77f87ef1ec362aec35c89a924e6987cb).
    - Refactor `ensureFreeSpace()` so that the part which evicts blocks can be 
called independently from the part which checks whether there is enough free 
space to avoid eviction (7c68ca09cb1b12f157400866983f753ac863380e).
    - Realize that this lets us remove a layer of overloads from 
`ensureFreeSpace` (eec4f6c87423d5e482b710e098486b3bbc4daf06).
    - Realize that `ensureFreeSpace()` can simply be replaced with an 
`evictBlocksToFreeSpace()` method which is called [after we've already figured 
out](https://github.com/apache/spark/blob/2dc842aea82c8895125d46a00aa43dfb0d121de9/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L88)
 how much memory needs to be reclaimed via eviction; 
(2dc842aea82c8895125d46a00aa43dfb0d121de9).
    
    Along the way, I fixed some problems with the mocks in 
`MemoryManagerSuite`: the old mocks would 
[unconditionally](https://github.com/apache/spark/blob/80a824d36eec9d9a9f092ee1741453851218ec73/core/src/test/scala/org/apache/spark/memory/MemoryManagerSuite.scala#L84)
 report that a block had been evicted even if there was enough space in the 
storage pool such that eviction would be avoided.
    
    I also fixed a problem where `StorageMemoryPool._memoryUsed` might become 
negative due to freed memory being double-counted when excution evicts storage. 
The problem was that `StorageMemoryPoolshrinkPoolToFreeSpace` would [decrement 
`_memoryUsed`](https://github.com/apache/spark/commit/7c68ca09cb1b12f157400866983f753ac863380e#diff-935c68a9803be144ed7bafdd2f756a0fL133)
 even though `StorageMemoryPool.freeMemory` had already decremented it as each 
evicted block was freed. See SPARK-12189 for details.
    
    Author: Josh Rosen <[email protected]>
    Author: Andrew Or <[email protected]>
    
    Closes #10170 from JoshRosen/SPARK-12165.
    
    (cherry picked from commit aec5ea000ebb8921f42f006b694ef26f5df67d83)
    Signed-off-by: Andrew Or <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.6

Reply via email to