GitHub user mukulmurthy opened a pull request:
https://github.com/apache/spark/pull/21559
[SPARK-24525][SS] Provide an option to limit number of rows in a MemorySink
## What changes were proposed in this pull request?
Provide an option to limit number of rows in a MemorySink. Currently,
MemorySink and MemorySinkV2 have unbounded size, meaning that if they're used
(including under the hood during display()) on big data, they can OOM the
stream. This change adds a maxMemorySinkRows option to limit how many rows
MemorySink and MemorySinkV2 can hold. By default, they are still unbounded.
## How was this patch tested?
Added new unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mukulmurthy/oss-spark SPARK-24525
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21559.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21559
----
commit ac7eb2f3cf4cca8ee5d64f90f71c6c0d14931c52
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T00:38:38Z
Add in logic to determine the max rows a sink can have
commit 8dc89cca9129b25ad8f5f4cda856e5b594f53e52
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T18:55:32Z
Make MemorySink and MemorySinkV2 respect row and byte limits
commit 8ddf566259016e4ce727eabb3206fd65303c5580
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T19:20:44Z
Make tests compile
commit d82c7d5ee84b25e968f705aded2f2c04edc5c140
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T20:26:56Z
Make microbatch memory writer work with limits
commit 7fefe877b03fe4ad522275780a64425b58bf5bb0
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T20:27:03Z
Test MemorySinkV2 with limits
commit 58c5044ca2e62ca825df3a4e88c4b4f6d697461e
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T22:08:49Z
Add MemorySink test with limit
commit 392f05f4c1d008493220f59ff7a4d4b948fdfc4b
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-12T22:23:27Z
rename method
commit 9097dd52bf654d7de059a0a0eaca961bd424f3cd
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-13T20:36:08Z
Don't use byte limit, and log if we truncate rows
commit a28fb38053395c04a72b5d79f1f12a3aa5d49972
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-13T20:36:21Z
Update tests
commit f981cb818ffc95ddce2b59fcd64142615037b6a3
Author: Mukul Murthy <mukul.murthy@...>
Date: 2018-06-13T20:50:43Z
minor refactor
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]