The FileSystems API uses a ServiceLoader[1] to find Apache Beam FileSystem
implementations. The ServiceLoader works by finding "service" files on the
classpath containing a list of classes implementing the Apache Beam
FileSystem API. The way in which your creating an executable jar is likely
dropping or incorrectly merging service files. The most common case is that
your using the Maven shade plugin and you haven't configured it to use the
services file resource transformer[2]. If you are packaging your executable
jar a different way, you'll want to lookup the documentation for your tool
and see how it can properly deal with the service files.

1: https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html
2:
https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

On Thu, Jun 21, 2018 at 12:06 PM Sameer Abhyankar <[email protected]>
wrote:

> Hello!
>
> I am trying to package a Beam Dataflow pipeline as a self executing jar
> using these
> <https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar> 
> instructions.
> However, I am running into a weird issue when attempting to execute this
> jar.
>
> My pipeline needs to read a file (avro schema .avsc) from GCS outside of a
> PCollection before starting to work with PCollections. In order to do that
> I use the FileSystems API. This works perfectly fine when I execute the
> pipeline via mvn compile exec:java ..
>
> However, if I attempt to run this as a jar, it appears to treat the GCS
> path as local and fails with a FileNotFoundException.
>
> *Exception in thread "main" java.io.FileNotFoundException:
> /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc
> (No such file or directory)*
> * at java.io.FileInputStream.open0(Native Method)*
> * at java.io.FileInputStream.open(FileInputStream.java:195)*
> * at java.io.FileInputStream.<init>(FileInputStream.java:138)*
> * at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:113)*
> * at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:78)*
> * at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:262)*
>
> (Note that the input path is correct with the double slash but the error
> seems to strip that out
> e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc)
>
> Any pointers on what might be causing this?
>
> Thanks,
> - Sameer
>

Reply via email to