Hello!

I am trying to package a Beam Dataflow pipeline as a self executing jar
using these
<https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar>
instructions.
However, I am running into a weird issue when attempting to execute this
jar.

My pipeline needs to read a file (avro schema .avsc) from GCS outside of a
PCollection before starting to work with PCollections. In order to do that
I use the FileSystems API. This works perfectly fine when I execute the
pipeline via mvn compile exec:java ..

However, if I attempt to run this as a jar, it appears to treat the GCS
path as local and fails with a FileNotFoundException.

*Exception in thread "main" java.io.FileNotFoundException:
/some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc
(No such file or directory)*
* at java.io.FileInputStream.open0(Native Method)*
* at java.io.FileInputStream.open(FileInputStream.java:195)*
* at java.io.FileInputStream.<init>(FileInputStream.java:138)*
* at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:113)*
* at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:78)*
* at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:262)*

(Note that the input path is correct with the double slash but the error
seems to strip that out
e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc)

Any pointers on what might be causing this?

Thanks,
- Sameer

Reply via email to