On 21 Sep 2016, at 20:10, Jean-Philippe Martin
The full source for my example is available on
I'm using maven to depend on
which provides a Java FileSystem for Google Cloud Storage, via "gs://" URLs.
My Spark project uses maven-shade-plugin to create one big jar with all the
source in it.
The big jar correctly includes a
META-INF/services/java.nio.file.spi.FileSystemProviderfile, containing the
correct name for the class
checked and that class is also correctly included in the jar file.
The program uses FileSystemProvider.installedProviders() to list the filesystem
providers it finds. "gs" should be listed (and it is if I run the same function
in a non-Spark context), but when running with Spark on Dataproc, that
I'd like to know: How can I use a custom filesystem in my Spark program?
There's a bit of confusion setting in here; the FileSystem implementations
spark uses are subclasses of org.apache.hadoop.fs.FileSystem; the nio class
with the same name is different.
grab the google cloud storage connector and put it on your classpath