I just wanted to thank you again. I split up my project in a beam core stuff and my plugin. This got rid of a number of circular dependency issues and lib conflicts. I also gave the Dataflow PipelineOptions the list of files to stage.
That has made things work and much quicker than I anticipated I must admit. I'm in awe of how clean and intuitive the Beam API is (once you get the hang of it). Thanks for everything! https://github.com/mattcasters/kettle-beam-core https://github.com/mattcasters/kettle-beam Cheers, Matt --- Matt Casters <m <[email protected]>[email protected]> Senior Solution Architect, Kettle Project Founder Op do 29 nov. 2018 om 19:03 schreef Matt Casters <[email protected]>: > Thanks a lot for the replies. The problem is not that the jar files aren't > in the classloader, it's that something somewhere insists on using the > parent classloader. > I guess it makes sense since I noticed that running in my IDEA Beam copied > all required runtime binaries into GCP Storage so it must have an idea of > what to pick up. > I'm guessing it tries to pick up everything in the classpath. > > Throwing all the generated maven jar files into the main classpath of > Kettle in this case is a bit messy I'm going to look for an alternative > like an application alongside to communicate with. > > I'll report back once I get a bit further along. > > Cheers, > Matt > > Op do 29 nov. 2018 om 17:10 schreef Juan Carlos Garcia < > [email protected]>: > >> If you are using Gradle for packaging, make sure your final jar (fat-jar) >> contains all the services files merged. >> >> Using the Gradle shadowJar plugin include "*mergeServiceFiles()*" >> instruction like: >> >> apply plugin: 'com.github.johnrengelman.shadow' >> shadowJar { >> mergeServiceFiles() >> >> zip64 true >> classifier = 'bundled' >> } >> >> If you are using Maven then use the Shade plugin. >> >> On Thu, Nov 29, 2018 at 4:50 PM Robert Bradshaw <[email protected]> >> wrote: >> >>> BeamJava uses com.google.auto.service.AutoService which, at the end of >>> the day, is shorthand for Java's standard ServiceLoader mechanisms >>> (e.g. see [1]). I'm not an expert on the details of how this works, >>> but you'll probably have to make sure these filesystem dependencies >>> are in your custom classloader's jar. >>> >>> [1] >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystemRegistrar.java >>> On Thu, Nov 29, 2018 at 3:57 PM Matt Casters <[email protected]> >>> wrote: >>> > >>> > Hello Beam, >>> > >>> > I've been taking great steps forward in having Kettle generate Beam >>> pipelines and they actually execute just find in unit testing in IntelliJ. >>> > The problem starts when I collect all the libraries needed for Beam >>> and the Runners and throw them into the Kettle project as a plugin. >>> > >>> > Caused by: java.lang.IllegalArgumentException: No filesystem found for >>> scheme gs >>> > at org.apache.beam.sdk.io >>> .FileSystems.getFileSystemInternal(FileSystems.java:456) >>> > at org.apache.beam.sdk.io >>> .FileSystems.matchNewResource(FileSystems.java:526) >>> > at org.apache.beam.sdk.io >>> .FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:213) >>> > at org.apache.beam.sdk.io.TextIO$TypedWrite.to(TextIO.java:700) >>> > at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:1028) >>> > at >>> org.kettle.beam.core.transform.BeamOutputTransform.expand(BeamOutputTransform.java:87) >>> > ... 32 more >>> > >>> > This also happens for local file execution ("scheme file" in that >>> case). >>> > >>> > So the questions are: how is Beam bootstrapped? How does Beam >>> determine which libraries to use and what is the recommended way for >>> packaging things up properly? >>> > The Beam plugin is running in a separate URLClassloader so I think >>> something is going awry there. >>> > >>> > Thanks a lot for any answers or tips you might have! >>> > >>> > Matt >>> > --- >>> > Matt Casters <[email protected]> >>> > Senior Solution Architect, Kettle Project Founder >>> > >>> > >>> >> >> >> -- >> >> JC >> >>
