I just wanted to thank you again.  I split up my project in a beam core
stuff and my plugin.  This got rid of a number of circular dependency
issues and lib conflicts.
I also gave the Dataflow PipelineOptions the list of files to stage.

That has made things work and much quicker than I anticipated I must admit.
I'm in awe of how clean and intuitive the Beam API is (once you get the
hang of it).
Thanks for everything!

https://github.com/mattcasters/kettle-beam-core
https://github.com/mattcasters/kettle-beam

Cheers,

Matt
---
Matt Casters <m <[email protected]>[email protected]>
Senior Solution Architect, Kettle Project Founder


Op do 29 nov. 2018 om 19:03 schreef Matt Casters <[email protected]>:

> Thanks a lot for the replies. The problem is not that the jar files aren't
> in the classloader, it's that something somewhere insists on using the
> parent classloader.
> I guess it makes sense since I noticed that running in my IDEA Beam copied
> all required runtime binaries into GCP Storage so it must have an idea of
> what to pick up.
> I'm guessing it tries to pick up everything in the classpath.
>
> Throwing all the generated maven jar files into the main classpath of
> Kettle in this case is a bit messy I'm going to look for an alternative
> like an application alongside to communicate with.
>
> I'll report back once I get a bit further along.
>
> Cheers,
> Matt
>
> Op do 29 nov. 2018 om 17:10 schreef Juan Carlos Garcia <
> [email protected]>:
>
>> If you are using Gradle for packaging, make sure your final jar (fat-jar)
>> contains all the services files merged.
>>
>> Using the Gradle shadowJar plugin include "*mergeServiceFiles()*"
>> instruction like:
>>
>> apply plugin: 'com.github.johnrengelman.shadow'
>> shadowJar {
>>     mergeServiceFiles()
>>
>>     zip64 true
>>     classifier = 'bundled'
>> }
>>
>> If you are using Maven then use the Shade plugin.
>>
>> On Thu, Nov 29, 2018 at 4:50 PM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> BeamJava uses com.google.auto.service.AutoService which, at the end of
>>> the day, is shorthand for Java's standard ServiceLoader mechanisms
>>> (e.g. see [1]). I'm not an expert on the details of how this works,
>>> but you'll probably have to make sure these filesystem dependencies
>>> are in your custom classloader's jar.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystemRegistrar.java
>>> On Thu, Nov 29, 2018 at 3:57 PM Matt Casters <[email protected]>
>>> wrote:
>>> >
>>> > Hello Beam,
>>> >
>>> > I've been taking great steps forward in having Kettle generate Beam
>>> pipelines and they actually execute just find in unit testing in IntelliJ.
>>> > The problem starts when I collect all the libraries needed for Beam
>>> and the Runners and throw them into the Kettle project as a plugin.
>>> >
>>> > Caused by: java.lang.IllegalArgumentException: No filesystem found for
>>> scheme gs
>>> > at org.apache.beam.sdk.io
>>> .FileSystems.getFileSystemInternal(FileSystems.java:456)
>>> > at org.apache.beam.sdk.io
>>> .FileSystems.matchNewResource(FileSystems.java:526)
>>> > at org.apache.beam.sdk.io
>>> .FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:213)
>>> > at org.apache.beam.sdk.io.TextIO$TypedWrite.to(TextIO.java:700)
>>> > at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:1028)
>>> > at
>>> org.kettle.beam.core.transform.BeamOutputTransform.expand(BeamOutputTransform.java:87)
>>> > ... 32 more
>>> >
>>> > This also happens for local file execution ("scheme file" in that
>>> case).
>>> >
>>> > So the questions are: how is Beam bootstrapped? How does Beam
>>> determine which libraries to use and what is the recommended way for
>>> packaging things up properly?
>>> > The Beam plugin is running in a separate URLClassloader so I think
>>> something is going awry there.
>>> >
>>> > Thanks a lot for any answers or tips you might have!
>>> >
>>> > Matt
>>> > ---
>>> > Matt Casters <[email protected]>
>>> > Senior Solution Architect, Kettle Project Founder
>>> >
>>> >
>>>
>>
>>
>> --
>>
>> JC
>>
>>
  • [no subject] Matt Casters
    • Re: Robert Bradshaw
      • Re: Juan Carlos Garcia
        • Re: Matt Casters
          • Re: Matt Casters
            • Re: Matt Casters
              • Re: Andrew Pilloud

Reply via email to