GitHub user pwendell opened a pull request:

    https://github.com/apache/spark/pull/1991

    SPARK-2881: Avoid collisisions in Snappy staging directory.

    By default Snappy uses java.io.tempdir for copying the Snappy native 
library.
    If two users run Spark jobs on the same machine it can cause an exception 
when
    the second user tries to access or overwrite the snappy file created by the
    first user. This will fail Spark jobs out-of-the-box if they are run on a
    machine shared by different users.
    
    Snappy does expose a mechanism to customize the temp directory via a system
    property. This system property is read in a static block inside of Snappy 
code.
    
    I've added a "best effort" fix for this where we try to set the system
    property in a static block before Snappy reads it. I've tested it and it 
does
    work, but it relies on static initialization order which is brittle. I.e.
    if user code accesses Snappy libraries first this could not-work.
    
    An alternative work-around for users is to explicitly set
    
        org.xerial.snappy.tempdir
    
    themselves through Spark's java options. I also filed a bug upstream with
    Snappy-java to ask them for better behavior here:
    
    https://github.com/xerial/snappy-java/issues/84
    
    I think this is worth merging because in many cases it will fix the issue 
and
    at worst it's a no-op.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/spark snappy

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1991
    
----
commit 96e5e6a9e86f38b5b2f45f4832d6b5b572e82373
Author: Patrick Wendell <[email protected]>
Date:   2014-08-17T02:24:18Z

    SPARK-2881: Avoid collisisions in Snappy staging directory.
    
    By default Snappy uses java.io.tempdir for copying the Snappy native 
library.
    If two users run Spark jobs on the same machine it can cause an exception 
when
    the second user tries to access or overwrite the snappy file created by the
    first user. This will fail Spark jobs out-of-the-box if they are run on a
    machine shared by different users.
    
    Snappy does expose a mechanism to customize the temp directory via a system
    property. This system property is read in a static block inside of Snappy 
code.
    
    I've added a "best effort" fix for this where we try to set the system
    property in a static block before Snappy reads it. I've tested it and it 
does
    work, but it relies on static initialization order which is brittle. I.e.
    if user code accesses Snappy libraries first this could not-work.
    
    An alternative work-around for users is to explicitly set
    
        org.xerial.snappy.tempdir
    
    themselves through Spark's java options. I also filed a bug upstream with
    Snappy-java to ask them for better behavior here:
    
    https://github.com/xerial/snappy-java/issues/84
    
    I think this is worth merging because in many cases it will fix the issue 
and
    at worst it's a no-op.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to