GitHub user magictips opened a pull request:
https://github.com/apache/spark/pull/11304
Branch 1.6
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was the this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-1.6
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11304.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11304
----
commit 82a71aba043a0b1ed50168d2b5b312c79b8c8fa3
Author: gatorsmile <[email protected]>
Date: 2015-12-06T19:15:02Z
[SPARK-12138][SQL] Escape \u in the generated comments of codegen
When \u appears in a comment block (i.e. in /**/), code gen will break. So,
in Expression and CodegenFallback, we escape \u to \\u.
yhuai Please review it. I did reproduce it and it works after the fix.
Thanks!
Author: gatorsmile <[email protected]>
Closes #10155 from gatorsmile/escapeU.
(cherry picked from commit 49efd03bacad6060d99ed5e2fe53ba3df1d1317e)
Signed-off-by: Yin Huai <[email protected]>
commit c54b698ecc284bce9b80c40ba46008bd6321c812
Author: Burak Yavuz <[email protected]>
Date: 2015-12-07T08:21:55Z
[SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when
Jenkins load is high
We need to make sure that the last entry is indeed the last entry in the
queue.
Author: Burak Yavuz <[email protected]>
Closes #10110 from brkyvz/batch-wal-test-fix.
(cherry picked from commit 6fd9e70e3ed43836a0685507fff9949f921234f4)
Signed-off-by: Tathagata Das <[email protected]>
commit 3f230f7b331cf6d67426cece570af3f1340f526e
Author: Sun Rui <[email protected]>
Date: 2015-12-07T18:38:17Z
[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.
This PR:
1. Suppress all known warnings.
2. Cleanup test cases and fix some errors in test cases.
3. Fix errors in HiveContext related test cases. These test cases are
actually not run previously due to a bug of creating TestHiveContext.
4. Support 'testthat' package version 0.11.0 which prefers that test cases
be under 'tests/testthat'
5. Make sure the default Hadoop file system is local when running test
cases.
6. Turn on warnings into errors.
Author: Sun Rui <[email protected]>
Closes #10030 from sun-rui/SPARK-12034.
(cherry picked from commit 39d677c8f1ee7ebd7e142bec0415cf8f90ac84b6)
Signed-off-by: Shivaram Venkataraman <[email protected]>
commit fed453821d81470b9035d33e36fa6ef1df99c0de
Author: Davies Liu <[email protected]>
Date: 2015-12-07T19:00:25Z
[SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler
Currently, the current line is not cleared by Cltr-C
After this patch
```
>>> asdfasdf^C
Traceback (most recent call last):
File "~/spark/python/pyspark/context.py", line 225, in signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt
```
It's still worse than 1.5 (and before).
Author: Davies Liu <[email protected]>
Closes #10134 from davies/fix_cltrc.
(cherry picked from commit ef3f047c07ef0ac4a3a97e6bc11e1c28c6c8f9a0)
Signed-off-by: Davies Liu <[email protected]>
commit 539914f1a8d3a0f59e67c178f86e741927e7a658
Author: Tathagata Das <[email protected]>
Date: 2015-12-07T19:03:59Z
[SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner
not present
The reason is that TrackStateRDDs generated by trackStateByKey expect the
previous batch's TrackStateRDDs to have a partitioner. However, when recovery
from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a
partitioner attached to it. This is because RDD checkpoints do not preserve the
partitioner (SPARK-12004).
While #9983 solves SPARK-12004 by preserving the partitioner through RDD
checkpoints, there may be a non-zero chance that the saving and recovery fails.
To be resilient, this PR repartitions the previous state RDD if the partitioner
is not detected.
Author: Tathagata Das <[email protected]>
Closes #9988 from tdas/SPARK-11932.
(cherry picked from commit 5d80d8c6a54b2113022eff31187e6d97521bd2cf)
Signed-off-by: Tathagata Das <[email protected]>
commit c8aa5f2011cf30a360d5206ee45202c4b1d61e21
Author: Xusen Yin <[email protected]>
Date: 2015-12-07T21:16:47Z
[SPARK-11963][DOC] Add docs for QuantileDiscretizer
https://issues.apache.org/jira/browse/SPARK-11963
Author: Xusen Yin <[email protected]>
Closes #9962 from yinxusen/SPARK-11963.
(cherry picked from commit 871e85d9c14c6b19068cc732951a8ae8db61b411)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit cdeb89b34614fb39062976c4796d187992333c88
Author: Andrew Ray <[email protected]>
Date: 2015-12-07T23:01:00Z
[SPARK-12184][PYTHON] Make python api doc for pivot consistant with scala
doc
In SPARK-11946 the API for pivot was changed a bit and got updated doc, the
doc changes were not made for the python api though. This PR updates the python
doc to be consistent.
Author: Andrew Ray <[email protected]>
Closes #10176 from aray/sql-pivot-python-doc.
(cherry picked from commit 36282f78b888743066843727426c6d806231aa97)
Signed-off-by: Yin Huai <[email protected]>
commit 115bfbdae82b1c2804ea501ffd420d0aa17aac45
Author: Joseph K. Bradley <[email protected]>
Date: 2015-12-08T00:37:09Z
[SPARK-12160][MLLIB] Use SQLContext.getOrCreate in MLlib
Switched from using SQLContext constructor to using getOrCreate, mainly in
model save/load methods.
This covers all instances in spark.mllib. There were no uses of the
constructor in spark.ml.
CC: mengxr yhuai
Author: Joseph K. Bradley <[email protected]>
Closes #10161 from jkbradley/mllib-sqlcontext-fix.
(cherry picked from commit 3e7e05f5ee763925ed60410d7de04cf36b723de1)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 3c683ed5ffe704a6fec7c6d434eeed784276470d
Author: somideshmukh <[email protected]>
Date: 2015-12-08T07:26:34Z
[SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using
include_example
Made new patch contaning only markdown examples moved to exmaple/folder.
Ony three java code were not shfted since they were contaning compliation
error ,these classes are
1)StandardScale 2)NormalizerExample 3)VectorIndexer
Author: Xusen Yin <[email protected]>
Author: somideshmukh <[email protected]>
Closes #10002 from somideshmukh/SomilBranch1.33.
(cherry picked from commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 8652fc03c21f79b41ce13f41991feba11fc7b29c
Author: Takahashi Hiroshi <[email protected]>
Date: 2015-12-08T07:46:55Z
[SPARK-10259][ML] Add @since annotation to ml.classification
Add since annotation to ml.classification
Author: Takahashi Hiroshi <[email protected]>
Closes #8534 from taishi-oss/issue10259.
(cherry picked from commit 7d05a624510f7299b3dd07f87c203db1ff7caa3e)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 5c8216920b4110d8fc4329e1fe52543ee17c4a54
Author: Yanbo Liang <[email protected]>
Date: 2015-12-08T07:50:57Z
[SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example
code
Add ```SQLTransformer``` user guide, example code and make Scala API doc
more clear.
Author: Yanbo Liang <[email protected]>
Closes #10006 from yanboliang/spark-11958.
(cherry picked from commit 4a39b5a1bee28cec792d509654f6236390cafdcb)
Signed-off-by: Xiangrui Meng <[email protected]>
commit c9e5274ae3d8e6967bee240ec0b7ba17cd15d34e
Author: cody koeninger <[email protected]>
Date: 2015-12-08T11:02:35Z
[SPARK-12103][STREAMING][KAFKA][DOC] document that K means Key and V â¦
â¦means Value
Author: cody koeninger <[email protected]>
Closes #10132 from koeninger/SPARK-12103.
(cherry picked from commit 48a9804b2ad89b3fb204c79f0dbadbcfea15d8dc)
Signed-off-by: Sean Owen <[email protected]>
commit 870f435628b7c0eac5f6c45fa19b14ab5289c657
Author: Jeff Zhang <[email protected]>
Date: 2015-12-08T11:05:06Z
[SPARK-12166][TEST] Unset hadoop related environment in testing
Author: Jeff Zhang <[email protected]>
Closes #10172 from zjffdu/SPARK-12166.
(cherry picked from commit 708129187a460aca30790281e9221c0cd5e271df)
Signed-off-by: Sean Owen <[email protected]>
commit 8a791a3273039602f91ae311b612eeaeca10ddc7
Author: Cheng Lian <[email protected]>
Date: 2015-12-08T11:18:59Z
[SPARK-11551][DOC][EXAMPLE] Revert PR #10002
This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819.
The original PR wasn't tested on Jenkins before being merged.
Author: Cheng Lian <[email protected]>
Closes #10200 from liancheng/revert-pr-10002.
(cherry picked from commit da2012a0e152aa078bdd19a5c7f91786a2dd7016)
Signed-off-by: Cheng Lian <[email protected]>
commit c8f9eb749afb825b99a04b0e8f1e9311c5c6c944
Author: Sean Owen <[email protected]>
Date: 2015-12-08T14:34:47Z
[SPARK-11652][CORE] Remote code execution with InvokerTransformer
Fix commons-collection group ID to commons-collections for version 3.x
Patches earlier PR at https://github.com/apache/spark/pull/9731
Author: Sean Owen <[email protected]>
Closes #10198 from srowen/SPARK-11652.2.
(cherry picked from commit e3735ce1602826f0a8e0ca9e08730923843449ee)
Signed-off-by: Sean Owen <[email protected]>
commit 8ef33aa1f6d3dc8772c9277a5372a991765af1b3
Author: Wenchen Fan <[email protected]>
Date: 2015-12-08T18:13:40Z
[SPARK-12201][SQL] add type coercion rule for greatest/least
checked with hive, greatest/least should cast their children to a tightest
common type,
i.e. `(int, long) => long`, `(int, string) => error`, `(decimal(10,5),
decimal(5, 10)) => error`
Author: Wenchen Fan <[email protected]>
Closes #10196 from cloud-fan/type-coercion.
(cherry picked from commit 381f17b540d92507cc07adf18bce8bc7e5ca5407)
Signed-off-by: Michael Armbrust <[email protected]>
commit 9eeb0f25abd4bd528a5363fda60b1cd1eb34d05b
Author: gatorsmile <[email protected]>
Date: 2015-12-08T18:15:58Z
[SPARK-12195][SQL] Adding BigDecimal, Date and Timestamp into Encoder
This PR is to add three more data types into Encoder, including
`BigDecimal`, `Date` and `Timestamp`.
marmbrus cloud-fan rxin Could you take a quick look at these three types?
Not sure if it can be merged to 1.6. Thank you very much!
Author: gatorsmile <[email protected]>
Closes #10188 from gatorsmile/dataTypesinEncoder.
(cherry picked from commit c0b13d5565c45ae2acbe8cfb17319c92b6a634e4)
Signed-off-by: Michael Armbrust <[email protected]>
commit be0fe9b450f1bb87b9ce2e0ea153dc496d66a664
Author: gatorsmile <[email protected]>
Date: 2015-12-08T18:25:57Z
[SPARK-12188][SQL] Code refactoring and comment correction in Dataset APIs
This PR contains the following updates:
- Created a new private variable `boundTEncoder` that can be shared by
multiple functions, `RDD`, `select` and `collect`.
- Replaced all the `queryExecution.analyzed` by the function call
`logicalPlan`
- A few API comments are using wrong class names (e.g., `DataFrame`) or
parameter names (e.g., `n`)
- A few API descriptions are wrong. (e.g., `mapPartitions`)
marmbrus rxin cloud-fan Could you take a look and check if they are
appropriate? Thank you!
Author: gatorsmile <[email protected]>
Closes #10184 from gatorsmile/datasetClean.
(cherry picked from commit 5d96a710a5ed543ec81e383620fc3b2a808b26a1)
Signed-off-by: Michael Armbrust <[email protected]>
commit 1c8451b5e71508b974db965502db6dc3b1b4b4c0
Author: Yuhao Yang <[email protected]>
Date: 2015-12-08T18:29:51Z
[SPARK-10393] use ML pipeline in LDA example
jira: https://issues.apache.org/jira/browse/SPARK-10393
Since the logic of the text processing part has been moved to ML
estimators/transformers, replace the related code in LDA Example with the ML
pipeline.
Author: Yuhao Yang <[email protected]>
Author: yuhaoyang <[email protected]>
Closes #8551 from hhbyyh/ldaExUpdate.
(cherry picked from commit 872a2ee281d84f40a786f765bf772cdb06e8c956)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 9145bfb814c5f53c5e9c0de7e0d6b7aca99c7341
Author: Andrew Ray <[email protected]>
Date: 2015-12-08T18:52:17Z
[SPARK-12205][SQL] Pivot fails Analysis when aggregate is UnresolvedFunction
Delays application of ResolvePivot until all aggregates are resolved to
prevent problems with UnresolvedFunction and adds unit test
Author: Andrew Ray <[email protected]>
Closes #10202 from aray/sql-pivot-unresolved-function.
(cherry picked from commit 4bcb894948c1b7294d84e2bf58abb1d79e6759c6)
Signed-off-by: Yin Huai <[email protected]>
commit 7e45feb005966f6cdf66c4d19223286acf92cc28
Author: Yuhao Yang <[email protected]>
Date: 2015-12-08T19:46:26Z
[SPARK-11605][MLLIB] ML 1.6 QA: API: Java compatibility, docs
jira: https://issues.apache.org/jira/browse/SPARK-11605
Check Java compatibility for MLlib for this release.
fix:
1. `StreamingTest.registerStream` needs java friendly interface.
2. `GradientBoostedTreesModel.computeInitialPredictionAndError` and
`GradientBoostedTreesModel.updatePredictionError` has java compatibility issue.
Mark them as `developerAPI`.
TBD:
[updated] no fix for now per discussion.
`org.apache.spark.mllib.classification.LogisticRegressionModel`
`public scala.Option<java.lang.Object> getThreshold();` has wrong return
type for Java invocation.
`SVMModel` has the similar issue.
Yet adding a `scala.Option<java.util.Double> getThreshold()` would result
in an overloading error due to the same function signature. And adding a new
function with different name seems to be not necessary.
cc jkbradley feynmanliang
Author: Yuhao Yang <[email protected]>
Closes #10102 from hhbyyh/javaAPI.
(cherry picked from commit 5cb4695051e3dac847b1ea14d62e54dcf672c31c)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 3e31e7e245dba2c16be044e2f13b786e8608bd07
Author: BenFradet <[email protected]>
Date: 2015-12-08T20:45:34Z
[SPARK-12159][ML] Add user guide section for IndexToString transformer
Documentation regarding the `IndexToString` label transformer with code
snippets in Scala/Java/Python.
Author: BenFradet <[email protected]>
Closes #10166 from BenFradet/SPARK-12159.
(cherry picked from commit 06746b3005e5e9892d0314bee3bfdfaebc36d3d4)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 25249d1ece9fe3a57b753e37bbbe0d3a957a8304
Author: Andrew Or <[email protected]>
Date: 2015-12-08T22:34:15Z
[SPARK-12187] *MemoryPool classes should not be fully public
This patch tightens them to `private[memory]`.
Author: Andrew Or <[email protected]>
Closes #10182 from andrewor14/memory-visibility.
(cherry picked from commit 9494521695a1f1526aae76c0aea34a3bead96251)
Signed-off-by: Josh Rosen <[email protected]>
commit 2a5e4d157c13e67d7301a8c1214accf31256cb9d
Author: Michael Armbrust <[email protected]>
Date: 2015-12-08T23:58:35Z
[SPARK-12069][SQL] Update documentation with Datasets
Author: Michael Armbrust <[email protected]>
Closes #10060 from marmbrus/docs.
(cherry picked from commit 39594894232e0b70c5ca8b0df137da0d61223fd5)
Signed-off-by: Michael Armbrust <[email protected]>
commit b1d5a7859546eabdc7cf070b3e78d91389a8fbd6
Author: Timothy Hunter <[email protected]>
Date: 2015-12-09T02:40:21Z
[SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide
This PR moves pieces of the spark.ml user guide to reflect suggestions in
SPARK-8517. It does not introduce new content, as requested.
<img width="192" alt="screen shot 2015-12-08 at 11 36 00 am"
src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png">
Author: Timothy Hunter <[email protected]>
Closes #10207 from thunterdb/spark-8517.
(cherry picked from commit 765c67f5f2e0b1367e37883f662d313661e3a0d9)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 9e82273afc68947dc2a08315e0d42cfcedacaa2a
Author: Dominik Dahlem <[email protected]>
Date: 2015-12-09T02:54:10Z
[SPARK-11343][ML] Documentation of float and double prediction/label
columns in RegressionEvaluator
felixcheung , mengxr
Just added a message to require()
Author: Dominik Dahlem <[email protected]>
Closes #9598 from
dahlem/ddahlem_regression_evaluator_double_predictions_message_04112015.
(cherry picked from commit a0046e379bee0852c39ece4ea719cde70d350b0e)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 0be792aad5d01432e989a03969541f41a45281e2
Author: Fei Wang <[email protected]>
Date: 2015-12-09T05:32:31Z
[SPARK-12222] [CORE] Deserialize RoaringBitmap using Kryo serializer throw
Buffer underflow exception
Jira: https://issues.apache.org/jira/browse/SPARK-12222
Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow
exception:
```
com.esotericsoftware.kryo.KryoException: Buffer underflow.
at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
```
This is caused by a bug of kryo's `Input.skip(long
count)`(https://github.com/EsotericSoftware/kryo/issues/119) and we call this
method in `KryoInputDataInputBridge`.
Instead of upgrade kryo's version, this pr bypass the kryo's
`Input.skip(long count)` by directly call another `skip` method in kryo's
Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124),
i.e. write the bug-fixed version of `Input.skip(long count)` in
KryoInputDataInputBridge's `skipBytes` method.
more detail link to
https://github.com/apache/spark/pull/9748#issuecomment-162860246
Author: Fei Wang <[email protected]>
Closes #10213 from scwf/patch-1.
(cherry picked from commit 3934562d34bbe08d91c54b4bbee27870e93d7571)
Signed-off-by: Davies Liu <[email protected]>
commit b5a76b4a40e043c5384be7c620e7ca257b7ef2cd
Author: uncleGen <[email protected]>
Date: 2015-12-09T15:09:40Z
[SPARK-12031][CORE][BUG] Integer overflow when do sampling
Author: uncleGen <[email protected]>
Closes #10023 from uncleGen/1.6-bugfix.
(cherry picked from commit a113216865fd45ea39ae8f104e784af2cf667dcf)
Signed-off-by: Sean Owen <[email protected]>
commit acd462420ab5565ba5bf098f399fb355da3d6139
Author: Holden Karau <[email protected]>
Date: 2015-12-09T16:45:13Z
[SPARK-10299][ML] word2vec should allow users to specify the window size
Currently word2vec has the window hard coded at 5, some users may want
different sizes (for example if using on n-gram input or similar). User request
comes from
http://stackoverflow.com/questions/32231975/spark-word2vec-window-size .
Author: Holden Karau <[email protected]>
Author: Holden Karau <[email protected]>
Closes #8513 from
holdenk/SPARK-10299-word2vec-should-allow-users-to-specify-the-window-size.
(cherry picked from commit 22b9a8740d51289434553d19b6b1ac34aecdc09a)
Signed-off-by: Sean Owen <[email protected]>
commit 05e441e121a86e0c105ad25010e4678f2f9e73e3
Author: Josh Rosen <[email protected]>
Date: 2015-12-09T19:39:59Z
[SPARK-12165][SPARK-12189] Fix bugs in eviction of storage memory by
execution
This patch fixes a bug in the eviction of storage memory by execution.
## The bug:
In general, execution should be able to evict storage memory when the total
storage memory usage is greater than `maxMemory *
spark.memory.storageFraction`. Due to a bug, however, Spark might wind up
evicting no storage memory in certain cases where the storage memory usage was
between `maxMemory * spark.memory.storageFraction` and `maxMemory`. For
example, here is a regression test which illustrates the bug:
```scala
val maxMemory = 1000L
val taskAttemptId = 0L
val (mm, ms) = makeThings(maxMemory)
// Since we used the default storage fraction (0.5), we should be able
to allocate 500 bytes
// of storage memory which are immune to eviction by execution memory
pressure.
// Acquire enough storage memory to exceed the storage region size
assert(mm.acquireStorageMemory(dummyBlock, 750L, evictedBlocks))
assertEvictBlocksToFreeSpaceNotCalled(ms)
assert(mm.executionMemoryUsed === 0L)
assert(mm.storageMemoryUsed === 750L)
// At this point, storage is using 250 more bytes of memory than it is
guaranteed, so execution
// should be able to reclaim up to 250 bytes of storage memory.
// Therefore, execution should now be able to require up to 500 bytes
of memory:
assert(mm.acquireExecutionMemory(500L, taskAttemptId,
MemoryMode.ON_HEAP) === 500L) // <--- fails by only returning 250L
assert(mm.storageMemoryUsed === 500L)
assert(mm.executionMemoryUsed === 500L)
assertEvictBlocksToFreeSpaceCalled(ms, 250L)
```
The problem relates to the control flow / interaction between
`StorageMemoryPool.shrinkPoolToReclaimSpace()` and
`MemoryStore.ensureFreeSpace()`. While trying to allocate the 500 bytes of
execution memory, the `UnifiedMemoryManager` discovers that it will need to
reclaim 250 bytes of memory from storage, so it calls
`StorageMemoryPool.shrinkPoolToReclaimSpace(250L)`. This method, in turn, calls
`MemoryStore.ensureFreeSpace(250L)`. However, `ensureFreeSpace()` first checks
whether the requested space is less than `maxStorageMemory -
storageMemoryUsed`, which will be true if there is any free execution memory
because it turns out that `MemoryStore.maxStorageMemory = (maxMemory -
onHeapExecutionMemoryPool.memoryUsed)` when the `UnifiedMemoryManager` is used.
The control flow here is somewhat confusing (it grew to be messy /
confusing over time / as a result of the merging / refactoring of several
components). In the pre-Spark 1.6 code, `ensureFreeSpace` was called directly
by the `MemoryStore` itself, whereas in 1.6 it's involved in a confusing
control flow where `MemoryStore` calls `MemoryManager.acquireStorageMemory`,
which then calls back into `MemoryStore.ensureFreeSpace`, which, in turn, calls
`MemoryManager.freeStorageMemory`.
## The solution:
The solution implemented in this patch is to remove the confusing circular
control flow between `MemoryManager` and `MemoryStore`, making the storage
memory acquisition process much more linear / straightforward. The key changes:
- Remove a layer of inheritance which made the memory manager code harder
to understand (53841174760a24a0df3eb1562af1f33dbe340eb9).
- Move some bounds checks earlier in the call chain
(13ba7ada77f87ef1ec362aec35c89a924e6987cb).
- Refactor `ensureFreeSpace()` so that the part which evicts blocks can be
called independently from the part which checks whether there is enough free
space to avoid eviction (7c68ca09cb1b12f157400866983f753ac863380e).
- Realize that this lets us remove a layer of overloads from
`ensureFreeSpace` (eec4f6c87423d5e482b710e098486b3bbc4daf06).
- Realize that `ensureFreeSpace()` can simply be replaced with an
`evictBlocksToFreeSpace()` method which is called [after we've already figured
out](https://github.com/apache/spark/blob/2dc842aea82c8895125d46a00aa43dfb0d121de9/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L88)
how much memory needs to be reclaimed via eviction;
(2dc842aea82c8895125d46a00aa43dfb0d121de9).
Along the way, I fixed some problems with the mocks in
`MemoryManagerSuite`: the old mocks would
[unconditionally](https://github.com/apache/spark/blob/80a824d36eec9d9a9f092ee1741453851218ec73/core/src/test/scala/org/apache/spark/memory/MemoryManagerSuite.scala#L84)
report that a block had been evicted even if there was enough space in the
storage pool such that eviction would be avoided.
I also fixed a problem where `StorageMemoryPool._memoryUsed` might become
negative due to freed memory being double-counted when excution evicts storage.
The problem was that `StorageMemoryPoolshrinkPoolToFreeSpace` would [decrement
`_memoryUsed`](https://github.com/apache/spark/commit/7c68ca09cb1b12f157400866983f753ac863380e#diff-935c68a9803be144ed7bafdd2f756a0fL133)
even though `StorageMemoryPool.freeMemory` had already decremented it as each
evicted block was freed. See SPARK-12189 for details.
Author: Josh Rosen <[email protected]>
Author: Andrew Or <[email protected]>
Closes #10170 from JoshRosen/SPARK-12165.
(cherry picked from commit aec5ea000ebb8921f42f006b694ef26f5df67d83)
Signed-off-by: Andrew Or <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]