GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/2187

    [SPARK-3277] Fix external spilling with LZ4 assertion error

    The bulk of this PR is comprised of tests and documentation; the actual fix 
is really just adding 1 line of code (see `BlockObjectWriter.scala`). We 
currently do not run the `External*` test suites with different compression 
codecs, and this would have caught the bug reported in SPARK-3277. This PR 
extends the existing code to test spilling using all compression codecs known 
to Spark, including `LZ4`.
    
    **The actual bug**
    
    In `DiskBlockObjectWriter`, we only report the shuffle bytes written before 
we close the streams. With `LZ4`, all the bytes written in our metrics were 0 
because `flush()` was not taking effect for some reason. In general, 
compression codecs may write additional bytes to the file after we call 
`close()`, and so we must also capture those bytes in our shuffle write metrics.
    
    Thanks @mridulm and @pwendell for help with debugging.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark fix-lz4-spilling

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2187
    
----
commit 1bfa7438f8cb5ef00842e67f343dd02eb9679c8a
Author: Andrew Or <[email protected]>
Date:   2014-08-28T20:15:32Z

    Add more information to assert for better debugging

commit b264a84bcd848609f26ba80f23391c439369180e
Author: Andrew Or <[email protected]>
Date:   2014-08-28T20:59:07Z

    ExternalAppendOnlyMapSuite code style fixes (minor)

commit 4bbcf68cad63d6f787a445e2eb724fcd6e6acb7c
Author: Andrew Or <[email protected]>
Date:   2014-08-28T21:38:25Z

    Update tests to actually test all compression codecs

commit 089593f6d388980694f031641091b58e1d8dcfc7
Author: Andrew Or <[email protected]>
Date:   2014-08-28T21:57:59Z

    Actually fix SPARK-3277 (tests still fail)

commit a1ad53620d209837cc456e5789c7b73d7e1b8b80
Author: Andrew Or <[email protected]>
Date:   2014-08-28T22:42:11Z

    Fix tests
    
    We need to stop the SparkContexts before creating a new one.
    Otherwise the tests get into bad states.

commit 92e251bae0f354cfe8350497d2ca0bb2bdd8028b
Author: Patrick Wendell <[email protected]>
Date:   2014-08-28T20:54:01Z

    Better documentation for BlockObjectWriter.

commit 6b2e7d155457043b967e03743c0d22556d818a3e
Author: Andrew Or <[email protected]>
Date:   2014-08-28T22:45:26Z

    Fix compilation error

commit 1c4624ed0d351375d4ff3bcb6384a27fe2b98fd5
Author: Andrew Or <[email protected]>
Date:   2014-08-28T22:45:48Z

    Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to