How to use a custom filesystem provider?

Jean-Philippe Martin Wed, 21 Sep 2016 12:10:41 -0700

The full source for my example is available on github
<https://github.com/jean-philippe-martin/SparkRepro>.


I'm using maven to depend on gcloud-java-nio
<https://mvnrepository.com/artifact/com.google.cloud/gcloud-java-nio/0.2.5>,
which provides a Java FileSystem for Google Cloud Storage, via "gs://"
URLs. My Spark project uses maven-shade-plugin to create one big jar with
all the source in it.

The big jar correctly includes a
META-INF/services/java.nio.file.spi.FileSystemProviderfile, containing the
correct name for the class (
com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider). I
checked and that class is also correctly included in the jar file.

The program uses FileSystemProvider.installedProviders() to list the
filesystem providers it finds. "gs" should be listed (and it is if I run
the same function in a non-Spark context), but when running with Spark on
Dataproc, that provider's gone.

I'd like to know: *How can I use a custom filesystem in my Spark program*?
(asked earlier
<http://stackoverflow.com/questions/39500445/filesystem-provider-disappearing-in-spark>
in Stackoverflow but I didn't get any traction there)

How to use a custom filesystem provider?

Reply via email to