GitHub user pwendell opened a pull request:
https://github.com/apache/spark/pull/1994
SPARK-2881: Avoid collisisions in Snappy staging directory.
By default Snappy uses java.io.tempdir for copying the Snappy native
library.
If two users run Spark jobs on the same machine it can cause an exception
when
the second user tries to access or overwrite the snappy file created by the
first user. This will fail Spark jobs out-of-the-box if they are run on a
machine shared by different users.
Snappy does expose a mechanism to customize the temp directory via a system
property. This system property is read in a static block inside of Snappy
code.
Snappy-java fixes this in newer versions. We can upgrade snappy-java in
master
but this proposes a "best effort" fix for 1.1 where we try to set the system
property in a static block before Snappy reads it. I've tested it and it
does
work, but it relies on static initialization order which is brittle. I.e.
if user code accesses Snappy libraries first this could not-work.
An alternative work-around for users is to explicitly set
org.xerial.snappy.tempdir
themselves through Spark's java options. I also filed a bug upstream with
Snappy-java to ask them for better behavior here:
https://github.com/xerial/snappy-java/issues/84
I think this is worth merging because in many cases it will fix the issue
and
at worst it's a no-op.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pwendell/spark snappy
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1994.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1994
----
commit 96e5e6a9e86f38b5b2f45f4832d6b5b572e82373
Author: Patrick Wendell <[email protected]>
Date: 2014-08-17T02:24:18Z
SPARK-2881: Avoid collisisions in Snappy staging directory.
By default Snappy uses java.io.tempdir for copying the Snappy native
library.
If two users run Spark jobs on the same machine it can cause an exception
when
the second user tries to access or overwrite the snappy file created by the
first user. This will fail Spark jobs out-of-the-box if they are run on a
machine shared by different users.
Snappy does expose a mechanism to customize the temp directory via a system
property. This system property is read in a static block inside of Snappy
code.
I've added a "best effort" fix for this where we try to set the system
property in a static block before Snappy reads it. I've tested it and it
does
work, but it relies on static initialization order which is brittle. I.e.
if user code accesses Snappy libraries first this could not-work.
An alternative work-around for users is to explicitly set
org.xerial.snappy.tempdir
themselves through Spark's java options. I also filed a bug upstream with
Snappy-java to ask them for better behavior here:
https://github.com/xerial/snappy-java/issues/84
I think this is worth merging because in many cases it will fix the issue
and
at worst it's a no-op.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]