GitHub user mukulmurthy opened a pull request: https://github.com/apache/spark/pull/21559
[SPARK-24525][SS] Provide an option to limit number of rows in a MemorySink ## What changes were proposed in this pull request? Provide an option to limit number of rows in a MemorySink. Currently, MemorySink and MemorySinkV2 have unbounded size, meaning that if they're used (including under the hood during display()) on big data, they can OOM the stream. This change adds a maxMemorySinkRows option to limit how many rows MemorySink and MemorySinkV2 can hold. By default, they are still unbounded. ## How was this patch tested? Added new unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mukulmurthy/oss-spark SPARK-24525 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21559 ---- commit ac7eb2f3cf4cca8ee5d64f90f71c6c0d14931c52 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T00:38:38Z Add in logic to determine the max rows a sink can have commit 8dc89cca9129b25ad8f5f4cda856e5b594f53e52 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T18:55:32Z Make MemorySink and MemorySinkV2 respect row and byte limits commit 8ddf566259016e4ce727eabb3206fd65303c5580 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T19:20:44Z Make tests compile commit d82c7d5ee84b25e968f705aded2f2c04edc5c140 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T20:26:56Z Make microbatch memory writer work with limits commit 7fefe877b03fe4ad522275780a64425b58bf5bb0 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T20:27:03Z Test MemorySinkV2 with limits commit 58c5044ca2e62ca825df3a4e88c4b4f6d697461e Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T22:08:49Z Add MemorySink test with limit commit 392f05f4c1d008493220f59ff7a4d4b948fdfc4b Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-12T22:23:27Z rename method commit 9097dd52bf654d7de059a0a0eaca961bd424f3cd Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-13T20:36:08Z Don't use byte limit, and log if we truncate rows commit a28fb38053395c04a72b5d79f1f12a3aa5d49972 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-13T20:36:21Z Update tests commit f981cb818ffc95ddce2b59fcd64142615037b6a3 Author: Mukul Murthy <mukul.murthy@...> Date: 2018-06-13T20:50:43Z minor refactor ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org