GitHub user mukulmurthy opened a pull request:

    https://github.com/apache/spark/pull/21559

    [SPARK-24525][SS] Provide an option to limit number of rows in a MemorySink

    ## What changes were proposed in this pull request?
    
    Provide an option to limit number of rows in a MemorySink. Currently, 
MemorySink and MemorySinkV2 have unbounded size, meaning that if they're used 
(including under the hood during display()) on big data, they can OOM the 
stream. This change adds a maxMemorySinkRows option to limit how many rows 
MemorySink and MemorySinkV2 can hold. By default, they are still unbounded.
    
    ## How was this patch tested?
    
    Added new unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mukulmurthy/oss-spark SPARK-24525

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21559
    
----
commit ac7eb2f3cf4cca8ee5d64f90f71c6c0d14931c52
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T00:38:38Z

    Add in logic to determine the max rows a sink can have

commit 8dc89cca9129b25ad8f5f4cda856e5b594f53e52
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T18:55:32Z

    Make MemorySink and MemorySinkV2 respect row and byte limits

commit 8ddf566259016e4ce727eabb3206fd65303c5580
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T19:20:44Z

    Make tests compile

commit d82c7d5ee84b25e968f705aded2f2c04edc5c140
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T20:26:56Z

    Make microbatch memory writer work with limits

commit 7fefe877b03fe4ad522275780a64425b58bf5bb0
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T20:27:03Z

    Test MemorySinkV2 with limits

commit 58c5044ca2e62ca825df3a4e88c4b4f6d697461e
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T22:08:49Z

    Add MemorySink test with limit

commit 392f05f4c1d008493220f59ff7a4d4b948fdfc4b
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-12T22:23:27Z

    rename method

commit 9097dd52bf654d7de059a0a0eaca961bd424f3cd
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-13T20:36:08Z

    Don't use byte limit, and log if we truncate rows

commit a28fb38053395c04a72b5d79f1f12a3aa5d49972
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-13T20:36:21Z

    Update tests

commit f981cb818ffc95ddce2b59fcd64142615037b6a3
Author: Mukul Murthy <mukul.murthy@...>
Date:   2018-06-13T20:50:43Z

    minor refactor

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to