Re: Getting "java.lang.NoSuchFieldError: TRUNCATE_SIZED_RESTRICTION" in Flink cluster

2020-10-02 Thread Tomo Suzuki
I suspect your dependencies have conflict. I develop Linkage Checker
enforcer rule to identify incompatible dependencies. Do you want to give it
a try?
https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/Linkage-Checker-Enforcer-Rule

Regards,
Tomo

On Fri, Oct 2, 2020 at 21:34 Praveen K Viswanathan 
wrote:

> Hi - We have a beam pipeline reading and writing using an SDF based IO
> connector working fine in a local machine using Direct Runner or Flink
> Runner. However when we build an image of that pipeline along with Flink
> and deploy in a cluster we get below exception.
>
> ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler-
>> Unhandled exception.
>> org.apache.flink.client.program.ProgramInvocationException: The program
>> caused an error:
>>
>> Classpath:
>> [file:/opt/flink/flink-web-upload/Booster-bundled-1.0-SNAPSHOT.jar]
>> System.out: (none)
>> System.err: (none)
>> at
>> org.apache.flink.client.program.OptimizerPlanEnvironment.generateException(OptimizerPlanEnvironment.java:149)
>> at
>> org.apache.flink.client.program.OptimizerPlanEnvironment.getPipeline(OptimizerPlanEnvironment.java:89)
>> at
>> org.apache.flink.client.program.PackagedProgramUtils.getPipelineFromProgram(PackagedProgramUtils.java:101)
>> at
>> org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:56)
>> at
>> org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.toJobGraph(JarHandlerUtils.java:128)
>> at
>> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$getJobGraphAsync$6(JarRunHandler.java:138)
>> at
>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.NoSuchFieldError: TRUNCATE_SIZED_RESTRICTION
>> at
>> org.apache.beam.runners.core.construction.PTransformTranslation.(PTransformTranslation.java:199)
>> at
>> org.apache.beam.runners.core.construction.PTransformMatchers$6.matches(PTransformMatchers.java:264)
>> at
>> org.apache.beam.sdk.Pipeline$2.enterCompositeTransform(Pipeline.java:272)
>> at
>> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:653)
>> at
>> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>> at
>> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
>> at
>> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
>> at
>> org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:463)
>> at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:262)
>> at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:212)
>> at
>> org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:115)
>> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:82)
>> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>> at com.org.cx.signals.Booster.main(Booster.java:278)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>> at
>> org.apache.flink.client.program.OptimizerPlanEnvironment.getPipeline(OptimizerPlanEnvironment.java:79)
>> ... 8 more
>
>
> In our pom.xml we have created a profile for flink-runner as shown below.
>
> 
>>
>>   flink-runner
>>  
>>  
>>   
>>org.apache.beam
>>beam-runners-flink-1.10
>>2.21.0
>>   
>>   
>>   
>>   
>> 
>
>
> And the docker image has below flink version
>
> FROM flink:1.10.0-scala_2.12
>
>
> Both our pipeline and SDF based IO connector are on Beam 2.23.0 version.
> Appreciate if you can guide us on what is causing this exception.
>
> --
> Thanks,
> Praveen K Viswanathan
>
-- 
Regards,
Tomo


Re: Upload third party runtime dependencies for expanding transform like KafkaIO.Read in Python Portable Runner

2020-10-02 Thread Kobe Feng
Thanks Robert, yes, I'm also thinking of own expansion service and trying
it, so:

[grpc-default-executor-3] INFO
io.x.kafka.security.token.RefreshableTokenLoginModule - starting
renewal task and exposing its jmx metrics
[grpc-default-executor-3] INFO
io..kafka.security.token.TokenRenewalTask - IAF Token renewal started
[grpc-default-executor-3] INFO
org.apache.kafka.common.security.authenticator.AbstractLogin - Successfully
logged in.
[grpc-default-executor-3] INFO
io..kafka.security.token.TokenRenewalTask - proposed next checkpoint
time Sat Oct 03 11:38:29 UTC 2020, now is Sat Oct 03 02:02:24 UTC 2020, min
expiration Sat Oct 03 14:02:30 UTC 2020

I much agree with your last statement!

Happy weekend!


On Fri, Oct 2, 2020 at 6:31 PM Robert Bradshaw  wrote:

> If you make sure that these extra jars are in your path when you
> execute your pipeline, they should get picked up when invoking the
> expansion service (though this may not be the case long term).
>
> The cleanest way would be to provide your own expansion service. If
> you build a jar that consists of Beam's IO expansion service plus any
> necessary dependencies, you should be able to do
>
> ReadFromKafka(
> [ordinary params],
> expansion_service=BeamJarExpansionService('path/to/your/jar'))
>
> to use this "custom" expansion service. See
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py
> An alternative is to pass a pipeline option
>
> --beam_services={"sdks:java:io:expansion-service:shadowJar":
> "path/to/your/jar"}
>
> which will override the default. (You can pass "host:port" rather than
> a path as well if you manually start the expansion service.)
>
> Exactly how to specify at a top level a set of extra dependencies to
> be applied to a particular subset of other-language transforms is
> still an open problem. Alternatively we could try to make expansion
> services themselves trivially easy to build, customize, and use.
>
> Hopefully that helps.
>
> - Robert
>
>
>
>
> On Fri, Oct 2, 2020 at 5:57 PM Kobe Feng  wrote:
> >
> > Thanks Rober, yes, our Kafka requires JAAS configuration
> (sasl.jaas.config) at the client side for security check with the
> corresponding LoginModule which requires additional classes:
> >
> ==
> > Caused by: javax.security.auth.login.LoginException: unable to find
> LoginModule class: io.${}.kafka.security.iaf.IAFLoginModule
> > at
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:794)
> > at
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> > at
> javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> > at
> javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
> > at
> javax.security.auth.login.LoginContext.login(LoginContext.java:587)
> > at
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:52)
> > at
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:53)
> > at
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:76)
> > at
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:103)
> > ... 42 more
> >
> > at
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:177)
> > at
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:157)
> >
> > On Fri, Oct 2, 2020 at 5:14 PM Robert Bradshaw 
> wrote:
> >>
> >> Could you clarify a bit exactly what you're trying to do? When using
> KafkaIO, the provided jar should have all the necessary dependencies to
> construct and execute the kafka read/write. Is there some reason you need
> to inject additional dependencies into the environment provided by kafka?
> >>
> >> On Fri, Oct 2, 2020 at 3:20 PM Kobe Feng  wrote:
> >>>
> >>> Just a followup since no one replied it.
> >>> My understanding is for any expanded transforms beam wants the
> environment self-described.
> >>> So I updated boot and dockerfile for the java harness environment and
> use --sdk_harness_container_image_overrides in portable runner but fail to
> see the updated image loaded (default still), I guess only dataflow runner
> support it by glancing the code, but I believe it's the correct way and
> just need to deep dive the codes here when I turn back, then I will update
> this thread too.
> >>>
> >>> Kobe
> >>> On Wed, Sep 30, 2020 at 1:26 PM Kobe Feng 
> wrote:
> 
>  Hi everyone,
>  Is there any recommended way to 

Getting "java.lang.NoSuchFieldError: TRUNCATE_SIZED_RESTRICTION" in Flink cluster

2020-10-02 Thread Praveen K Viswanathan
Hi - We have a beam pipeline reading and writing using an SDF based IO
connector working fine in a local machine using Direct Runner or Flink
Runner. However when we build an image of that pipeline along with Flink
and deploy in a cluster we get below exception.

ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler-
> Unhandled exception.
> org.apache.flink.client.program.ProgramInvocationException: The program
> caused an error:
>
> Classpath:
> [file:/opt/flink/flink-web-upload/Booster-bundled-1.0-SNAPSHOT.jar]
> System.out: (none)
> System.err: (none)
> at
> org.apache.flink.client.program.OptimizerPlanEnvironment.generateException(OptimizerPlanEnvironment.java:149)
> at
> org.apache.flink.client.program.OptimizerPlanEnvironment.getPipeline(OptimizerPlanEnvironment.java:89)
> at
> org.apache.flink.client.program.PackagedProgramUtils.getPipelineFromProgram(PackagedProgramUtils.java:101)
> at
> org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:56)
> at
> org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.toJobGraph(JarHandlerUtils.java:128)
> at
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$getJobGraphAsync$6(JarRunHandler.java:138)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NoSuchFieldError: TRUNCATE_SIZED_RESTRICTION
> at
> org.apache.beam.runners.core.construction.PTransformTranslation.(PTransformTranslation.java:199)
> at
> org.apache.beam.runners.core.construction.PTransformMatchers$6.matches(PTransformMatchers.java:264)
> at
> org.apache.beam.sdk.Pipeline$2.enterCompositeTransform(Pipeline.java:272)
> at
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:653)
> at
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
> at
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
> at
> org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:463)
> at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:262)
> at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:212)
> at
> org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:115)
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:82)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
> at com.org.cx.signals.Booster.main(Booster.java:278)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
> at
> org.apache.flink.client.program.OptimizerPlanEnvironment.getPipeline(OptimizerPlanEnvironment.java:79)
> ... 8 more


In our pom.xml we have created a profile for flink-runner as shown below.


>
>   flink-runner
>  
>  
>   
>org.apache.beam
>beam-runners-flink-1.10
>2.21.0
>   
>   
>   
>   
> 


And the docker image has below flink version

FROM flink:1.10.0-scala_2.12


Both our pipeline and SDF based IO connector are on Beam 2.23.0 version.
Appreciate if you can guide us on what is causing this exception.

-- 
Thanks,
Praveen K Viswanathan


Re: Upload third party runtime dependencies for expanding transform like KafkaIO.Read in Python Portable Runner

2020-10-02 Thread Robert Bradshaw
If you make sure that these extra jars are in your path when you
execute your pipeline, they should get picked up when invoking the
expansion service (though this may not be the case long term).

The cleanest way would be to provide your own expansion service. If
you build a jar that consists of Beam's IO expansion service plus any
necessary dependencies, you should be able to do

ReadFromKafka(
[ordinary params],
expansion_service=BeamJarExpansionService('path/to/your/jar'))

to use this "custom" expansion service. See
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py
An alternative is to pass a pipeline option

--beam_services={"sdks:java:io:expansion-service:shadowJar":
"path/to/your/jar"}

which will override the default. (You can pass "host:port" rather than
a path as well if you manually start the expansion service.)

Exactly how to specify at a top level a set of extra dependencies to
be applied to a particular subset of other-language transforms is
still an open problem. Alternatively we could try to make expansion
services themselves trivially easy to build, customize, and use.

Hopefully that helps.

- Robert




On Fri, Oct 2, 2020 at 5:57 PM Kobe Feng  wrote:
>
> Thanks Rober, yes, our Kafka requires JAAS configuration (sasl.jaas.config) 
> at the client side for security check with the corresponding LoginModule 
> which requires additional classes:
> ==
> Caused by: javax.security.auth.login.LoginException: unable to find 
> LoginModule class: io.${}.kafka.security.iaf.IAFLoginModule
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:794)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
> at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
> at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:52)
> at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:53)
> at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:76)
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:103)
> ... 42 more
>
> at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:177)
> at 
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:157)
>
> On Fri, Oct 2, 2020 at 5:14 PM Robert Bradshaw  wrote:
>>
>> Could you clarify a bit exactly what you're trying to do? When using 
>> KafkaIO, the provided jar should have all the necessary dependencies to 
>> construct and execute the kafka read/write. Is there some reason you need to 
>> inject additional dependencies into the environment provided by kafka?
>>
>> On Fri, Oct 2, 2020 at 3:20 PM Kobe Feng  wrote:
>>>
>>> Just a followup since no one replied it.
>>> My understanding is for any expanded transforms beam wants the environment 
>>> self-described.
>>> So I updated boot and dockerfile for the java harness environment and use 
>>> --sdk_harness_container_image_overrides in portable runner but fail to see 
>>> the updated image loaded (default still), I guess only dataflow runner 
>>> support it by glancing the code, but I believe it's the correct way and 
>>> just need to deep dive the codes here when I turn back, then I will update 
>>> this thread too.
>>>
>>> Kobe
>>> On Wed, Sep 30, 2020 at 1:26 PM Kobe Feng  wrote:

 Hi everyone,
 Is there any recommended way to upload a third party jar (runtime scope) 
 for expanding transform like KafkaIO.Read when using the python portable 
 runner? Thank you!

 I tried --experiments=jar_packages=abc.jar,d.jar but just found those 
 artifacts in python harness with provision info, and the java harness just 
 uses the default environment for dependencies after expanding 
 transformation from the grpc server upon expansion jar for reading Kafka 
 messages.

 Also noticed above option will be removed in the future then tried 
 --files_to_stage but this option only exists in Java SDK pipeline options.

 --
 Yours Sincerely
 Kobe Feng
>>>
>>>
>>>
>>> --
>>> Yours Sincerely
>>> Kobe Feng
>
>
>
> --
> Yours Sincerely
> Kobe Feng


Re: Upload third party runtime dependencies for expanding transform like KafkaIO.Read in Python Portable Runner

2020-10-02 Thread Kobe Feng
Thanks Rober, yes, our Kafka requires JAAS configuration (sasl.jaas.config)
at the client side for security check with the corresponding LoginModule
which requires additional classes:
==
Caused by: javax.security.auth.login.LoginException: unable to find
LoginModule class: io.${}.kafka.security.iaf.IAFLoginModule
at
javax.security.auth.login.LoginContext.invoke(LoginContext.java:794)
at
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at
javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at
javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at
javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at
org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:52)
at
org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:53)
at
org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:76)
at
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:103)
... 42 more

at
org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:177)
at
org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:157)

On Fri, Oct 2, 2020 at 5:14 PM Robert Bradshaw  wrote:

> Could you clarify a bit exactly what you're trying to do? When using
> KafkaIO, the provided jar should have all the necessary dependencies to
> construct and execute the kafka read/write. Is there some reason you need
> to inject additional dependencies into the environment provided by kafka?
>
> On Fri, Oct 2, 2020 at 3:20 PM Kobe Feng  wrote:
>
>> Just a followup since no one replied it.
>> My understanding is for any expanded transforms beam wants the
>> environment self-described.
>> So I updated boot and dockerfile for the java harness environment and use
>> --sdk_harness_container_image_overrides in portable runner but fail to see
>> the updated image loaded (default still), I guess only dataflow runner
>> support it by glancing the code, but I believe it's the correct way and
>> just need to deep dive the codes here when I turn back, then I will update
>> this thread too.
>>
>> Kobe
>> On Wed, Sep 30, 2020 at 1:26 PM Kobe Feng  wrote:
>>
>>> Hi everyone,
>>> Is there any recommended way to upload a third party jar (runtime scope)
>>> for expanding transform like KafkaIO.Read when using the python portable
>>> runner? Thank you!
>>>
>>> I tried --experiments=jar_packages=abc.jar,d.jar but just found those
>>> artifacts in python harness with provision info, and the java harness just
>>> uses the default environment for dependencies after expanding
>>> transformation from the grpc server upon expansion jar for reading Kafka
>>> messages.
>>>
>>> Also noticed above option will be removed in the future then tried
>>> --files_to_stage but this option only exists in Java SDK pipeline options.
>>>
>>> --
>>> Yours Sincerely
>>> Kobe Feng
>>>
>>
>>
>> --
>> Yours Sincerely
>> Kobe Feng
>>
>

-- 
Yours Sincerely
Kobe Feng


Re: Upload third party runtime dependencies for expanding transform like KafkaIO.Read in Python Portable Runner

2020-10-02 Thread Kobe Feng
Just a followup since no one replied it.
My understanding is for any expanded transforms beam wants the
environment self-described.
So I updated boot and dockerfile for the java harness environment and use
--sdk_harness_container_image_overrides in portable runner but fail to see
the updated image loaded (default still), I guess only dataflow runner
support it by glancing the code, but I believe it's the correct way and
just need to deep dive the codes here when I turn back, then I will update
this thread too.

Kobe
On Wed, Sep 30, 2020 at 1:26 PM Kobe Feng  wrote:

> Hi everyone,
> Is there any recommended way to upload a third party jar (runtime scope)
> for expanding transform like KafkaIO.Read when using the python portable
> runner? Thank you!
>
> I tried --experiments=jar_packages=abc.jar,d.jar but just found those
> artifacts in python harness with provision info, and the java harness just
> uses the default environment for dependencies after expanding
> transformation from the grpc server upon expansion jar for reading Kafka
> messages.
>
> Also noticed above option will be removed in the future then tried
> --files_to_stage but this option only exists in Java SDK pipeline options.
>
> --
> Yours Sincerely
> Kobe Feng
>


-- 
Yours Sincerely
Kobe Feng


Re: SqsIO exception when moving to AWS2 SDK

2020-10-02 Thread tclemons
The app itself is developed in Clojure, but here's the gist of how it's getting 
configured:

    AwsCredentialsProvider credProvider = 
EnvrionmentVariableCredentialsProvider.create();
    
    pipeline.apply(
  SqsIO.read()
    .withQueueUrl(url)
    .withSqsClientProvider(credProvider, region, endpoint));

Oct 1, 2020, 08:48 by aromanenko@gmail.com:

> Could you send a code snippet of your pipeline with SqsIO v2 Read transform 
> configuration?
>
>> On 30 Sep 2020, at 22:56, tclem...@tutanota.com wrote:
>>
>> I've been attempting to migrate to the AWS2 SDK (version 2.24.0) on Apache 
>> Spark 2.4.7.  However,
>> when switching over to the new API and running it I keep getting the 
>> following exceptions:
>>
>> 2020-09-30 20:38:32,428 ERROR streaming.StreamingContext: Error starting the 
>> context, marking it as stopped
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
>> stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 
>> (TID 6, 10.42.180.18, executor 0): java.lang.NullPointerException
>>  at 
>> org.apache.beam.sdk.io.aws2.sqs.SqsUnboundedSource.lambda$new$e39e7721$1(SqsUnboundedSource.java:41)
>>  at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers$MemoizingSupplier.get(Suppliers.java:129)
>>  at 
>> org.apache.beam.sdk.io.aws2.sqs.SqsUnboundedSource.getSqs(SqsUnboundedSource.java:74)
>>  at 
>> org.apache.beam.sdk.io.aws2.sqs.SqsUnboundedReader.pull(SqsUnboundedReader.java:165)
>>  at 
>> org.apache.beam.sdk.io.aws2.sqs.SqsUnboundedReader.advance(SqsUnboundedReader.java:108)
>>  at 
>> org.apache.beam.runners.spark.io.MicrobatchSource$Reader.advanceWithBackoff(MicrobatchSource.java:246)
>>  at 
>> org.apache.beam.runners.spark.io.MicrobatchSource$Reader.start(MicrobatchSource.java:224)
>>  at 
>> org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:168)
>>  at 
>> org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:107)
>>  at 
>> org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:181)
>>  at 
>> org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:180)
>>  at 
>> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:57)
>>  
>> Examining the source of SqsUnboundedSource reveals a lambda where it's 
>> trying to chain a few references:
>>  read.sqsClientProvider().getSqsClient()
>>
>> Which is odd as I explicitly set the client provider on the read transform.  
>> This was working well enough with the old SqsIO API to connect and process 
>> messages off the queue.
>>
>> Any thoughts on why this might be happening?  Or avenues to pursue in 
>> debugging this?
>>
>> Thanks.
>>



Re: Support streaming side-inputs in the Spark runner

2020-10-02 Thread tclemons
For clarification, is it just streaming side inputs that present an issue for 
SparkRunner or are there other areas that need work?  We've started work on a 
Beam-based project that includes both streaming and batch oriented work and a 
Spark cluster was our choice due to the perception that it could handle both 
types of applications.

However, that would have to be reevaluated if SparkRunner isn't up for 
streaming deployments.  And it seems that SparkStructuredStreamingRunner still 
needs some time before it's a fully-featured solution.  I guess I'm trying to 
get a sense of whether these runners are still being actively developed or were 
they donated by a third-party and are now suffering from bit-rot.

Oct 1, 2020, 10:54 by lc...@google.com:

> I would suggest trying FlinkRunner as it is a much more complete streaming 
> implementation.
> SparkRunner has several key things that are missing that won't allow your 
> pipeline to function correctly.
> If you're really invested in getting SparkRunner working though feel free to 
> contribute the necessary implementations for watermark holds and broadcast 
> state necessary for side inputs.
>
> On Tue, Sep 29, 2020 at 9:06 AM Rajagopal, Viswanathan <> rajagop...@dnb.com> 
> > wrote:
>
>>
>> Hi Team,
>>
>>
>>  
>>
>>
>> I have a streaming pipeline (built using Apache Beam with Spark Runner)which 
>> consumes events tagged with timestamps from Unbounded source (Kinesis 
>> Stream) and batch them into FixedWindows of 5 mins each and then, write all 
>> events in a window into a single / multiple files based on shards.
>>
>>
>> We are trying to achieve the following through Apache Beam constructs
>>
>>
>> 1.>>    >> Create a PCollectionView from unbounded source and pass it as 
>> a side-input to our main pipeline.
>>
>>
>> 2.>>    >> Have a hook method that invokes per window that enables us to 
>> do some operational activities per window.
>>
>>
>> 3.>>    >> Stop the stream processor (graceful stop) from external 
>> system.
>>
>>
>>  
>>
>>
>> Approaches that we tried for 1).
>>
>>
>> ·>>     >> Creating a PCollectionView from unbounded source and pass it 
>> as a side-input to our main pipeline.
>>
>>
>> ·>>     >> Input Pcollection goes through FixedWindow transform.
>>
>>
>> ·>>     >> Created custom CombineFn that takes combines all inputs for a 
>> window and produce single value Pcollection.
>>
>>
>> ·>>     >> Output of Window transform >>  it goes to CombineFn (custom 
>> fn) and creates a PCollectionView from CombineFn (using 
>> Combine.Globally().asSingletonView() as this output would be passed as a 
>> side-input for our main pipeline.
>>
>>
>> o>>    >> Getting the following exception (while running with streaming 
>> option set to true)
>>
>>
>> ·>>     >> java.lang.IllegalStateException: No TransformEvaluator 
>> registered for UNBOUNDED transform View.CreatePCollectionView
>>
>> Noticed that SparkRunner doesn’t support the streaming side-inputs in the 
>> Spark runner
>> https://www.javatips.net/api/beam-master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java>>
>>   (View.CreatePCollectionView.class not added to EVALUATORS Map)
>> https://issues.apache.org/jira/browse/BEAM-2112
>> https://issues.apache.org/jira/browse/BEAM-1564
>>
>> So would like to understand on this BEAM-1564 ticket.
>>
>>
>>  
>>
>>
>> Approaches that we tried for 2).
>>
>>
>> Tried to implement the operational activities in extractOutput() of 
>> CombineFn as extractOutput() called once per window. We hadn’t tested this 
>> as this is blocked by Issue 1).
>>
>>
>> Is there any other recommended approaches to implement this feature?
>>
>>
>>  
>>
>>
>> Looking for recommended approaches to implement feature 3).
>>
>>
>>  
>>
>>
>> Many Thanks,
>>
>>
>> Viswa.
>>
>>
>>  
>>
>>
>>  
>>
>>
>>  
>>
>>
>>  
>>
>>



Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Reuven Lax
Have you considered using Session windows? The window would start at the
timestamp of the article, and the Session gap duration would be the
(event-time) timeout after which you stop waiting for assets to join that
article.

On Fri, Oct 2, 2020 at 3:05 AM Kaymak, Tobias 
wrote:

> Hello,
>
> In chapter 4 of the Streaming Systems book by Tyler, Slava and Reuven
> there is an example 4-6 on page 111 about custom windowing that deals with
> UnalignedFixedWindows:
>
> https://www.oreilly.com/library/view/streaming-systems/9781491983867/ch04.html
>
> Unfortunately that example is abbreviated and the full source code is not
> published in this repo:
> https://github.com/takidau/streamingbook
>
> I am joining two Kafka Streams and I am currently windowing them by fixed
> time intervals. However the elements in stream one ("articles") are
> published first, then the assets for those articles are being published in
> the "assets" topic. Articles event timestamps are therefore slightly before
> those of assets.
>
> Now when doing a CoGroupByKey this can lead to a situation where an
> article is not being processed together with its assets, as
>
> - the article has a timestamp of 2020-10-02T00:30:29.997Z
> - the assets have a timestamp of 2020-10-02T00:30:30.001Z
>
> This is a must in my pipeline as I am relying on them to be processed
> together - otherwise I am publishing an article without it's assets.
>
> My idea was therefore to apply UnalignedFixedWindows instead of fixed
> ones to the streams to circumvent this. What I am currently missing is the
> mergeWindows() implementation or the full source code to understand it. I
> am currently facing a java.lang.IllegalStateException
>
> TimestampCombiner moved element from 2020-10-02T09:32:36.079Z to earlier
> time 2020-10-02T09:32:03.365Z for window
> [2020-10-02T09:31:03.366Z..2020-10-02T09:32:03.366Z)
>
> Which gives me the impression that I am doing something wrong or have not
> fully understood the custom windowing topic.
>
> Am I on the wrong track here?
>
>
>
>


Re: Support streaming side-inputs in the Spark runner

2020-10-02 Thread Luke Cwik
Support for watermark holds is missing for both Spark streaming
implementations (DStream and structured streaming) so watermark based
triggers don't produce the correct output.

Excluding the direct runner, Flink is the OSS runner with the most people
working on it adding features and fixing bugs in it.
Spark batch is in a good state but streaming development is still ongoing
and also has a small group of folks.


On Fri, Oct 2, 2020 at 10:16 AM  wrote:

> For clarification, is it just streaming side inputs that present an issue
> for SparkRunner or are there other areas that need work?  We've started
> work on a Beam-based project that includes both streaming and batch
> oriented work and a Spark cluster was our choice due to the perception that
> it could handle both types of applications.
>
> However, that would have to be reevaluated if SparkRunner isn't up for
> streaming deployments.  And it seems that SparkStructuredStreamingRunner
> still needs some time before it's a fully-featured solution.  I guess I'm
> trying to get a sense of whether these runners are still being actively
> developed or were they donated by a third-party and are now suffering from
> bit-rot.
>
> Oct 1, 2020, 10:54 by lc...@google.com:
>
> I would suggest trying FlinkRunner as it is a much more complete streaming
> implementation.
> SparkRunner has several key things that are missing that won't allow your
> pipeline to function correctly.
> If you're really invested in getting SparkRunner working though feel free
> to contribute the necessary implementations for watermark holds and
> broadcast state necessary for side inputs.
>
> On Tue, Sep 29, 2020 at 9:06 AM Rajagopal, Viswanathan 
> wrote:
>
> Hi Team,
>
>
>
> I have a streaming pipeline (built using Apache Beam with Spark
> Runner)which consumes events tagged with timestamps from Unbounded source
> (Kinesis Stream) and batch them into FixedWindows of 5 mins each and then,
> write all events in a window into a single / multiple files based on shards.
>
> We are trying to achieve the following through Apache Beam constructs
>
> *1.   **Create a PCollectionView from unbounded source and pass it as
> a side-input to our main pipeline.*
>
> *2.   **Have a hook method that invokes per window that enables us to
> do some operational activities per window.*
>
> *3.   **Stop the stream processor (graceful stop) from external
> system.*
>
>
>
> *Approaches that we tried for 1).*
>
> ·Creating a PCollectionView from unbounded source and pass it as
> a side-input to our main pipeline.
>
> ·Input Pcollection goes through FixedWindow transform.
>
> ·Created custom CombineFn that takes combines all inputs for a
> window and produce single value Pcollection.
>
> ·Output of Window transform it goes to CombineFn (custom fn) and
> creates a PCollectionView from CombineFn (using
> Combine.Globally().asSingletonView() as this output would be passed as a
> side-input for our main pipeline.
>
> o   Getting the following exception (while running with streaming option
> set to true)
>
> ·java.lang.IllegalStateException: No TransformEvaluator
> registered for UNBOUNDED transform View.CreatePCollectionView
>
>- Noticed that SparkRunner doesn’t support the streaming side-inputs
>   in the Spark runner
>   -
>  
> https://www.javatips.net/api/beam-master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java
>  (View.CreatePCollectionView.class not added to EVALUATORS Map)
>  - https://issues.apache.org/jira/browse/BEAM-2112
>  - https://issues.apache.org/jira/browse/BEAM-1564
>
> So would like to understand on this BEAM-1564 ticket.
>
>
>
> *Approaches that we tried for 2).*
>
> Tried to implement the operational activities in extractOutput() of
> CombineFn as extractOutput() called once per window. We hadn’t tested this
> as this is blocked by Issue 1).
>
> Is there any other recommended approaches to implement this feature?
>
>
>
> *Looking for recommended approaches to implement feature 3).*
>
>
>
> Many Thanks,
>
> Viswa.
>
>
>
>
>
>
>
>
>
>
>


Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Kaymak, Tobias
Well. Of course this is not fixing the core problem.
What I can do is extend the FixedWindows class and make sure that for my
real recorded "system latency" the values still get put into the previous
window. Or is there a smarter way to deal with this?

On Fri, Oct 2, 2020 at 4:11 PM Kaymak, Tobias 
wrote:

> This is what I came up with:
>
> https://gist.github.com/tkaymak/1f5eccf8633c18ab7f46f8ad01527630
>
> The first run looks okay (in my use case size and offset are the same),
> but I will need to add tests to prove my understanding of this.
>
> On Fri, Oct 2, 2020 at 12:05 PM Kaymak, Tobias 
> wrote:
>
>> Hello,
>>
>> In chapter 4 of the Streaming Systems book by Tyler, Slava and Reuven
>> there is an example 4-6 on page 111 about custom windowing that deals with
>> UnalignedFixedWindows:
>>
>> https://www.oreilly.com/library/view/streaming-systems/9781491983867/ch04.html
>>
>> Unfortunately that example is abbreviated and the full source code is not
>> published in this repo:
>> https://github.com/takidau/streamingbook
>>
>> I am joining two Kafka Streams and I am currently windowing them by fixed
>> time intervals. However the elements in stream one ("articles") are
>> published first, then the assets for those articles are being published in
>> the "assets" topic. Articles event timestamps are therefore slightly before
>> those of assets.
>>
>> Now when doing a CoGroupByKey this can lead to a situation where an
>> article is not being processed together with its assets, as
>>
>> - the article has a timestamp of 2020-10-02T00:30:29.997Z
>> - the assets have a timestamp of 2020-10-02T00:30:30.001Z
>>
>> This is a must in my pipeline as I am relying on them to be processed
>> together - otherwise I am publishing an article without it's assets.
>>
>> My idea was therefore to apply UnalignedFixedWindows instead of fixed
>> ones to the streams to circumvent this. What I am currently missing is the
>> mergeWindows() implementation or the full source code to understand it.
>> I am currently facing a java.lang.IllegalStateException
>>
>> TimestampCombiner moved element from 2020-10-02T09:32:36.079Z to earlier
>> time 2020-10-02T09:32:03.365Z for window
>> [2020-10-02T09:31:03.366Z..2020-10-02T09:32:03.366Z)
>>
>> Which gives me the impression that I am doing something wrong or have not
>> fully understood the custom windowing topic.
>>
>> Am I on the wrong track here?
>>
>>
>>
>>


Scio 0.9.5 released

2020-10-02 Thread Neville Li
Hi all,

We just released Scio 0.9.5. This release upgrades Beam to the latest
2.24.0 and includes several improvements and bug fixes, including Parquet
Avro dynamic destinations, Scalable Bloom Filter and many others. This will
also likely be the last 0.9.x release before we start working on the next
0.10 branch.

Join #scio in our channel (invite here ) for
questions and discussions.

Cheers
Neville

https://github.com/spotify/scio/releases/tag/v0.9.5

*"Colovaria"*

There are no breaking changes in this release, but some were introduced
with v0.9.0:

See v0.9.0 Migration Guide
 for
detailed instructions.
Improvements

   - Add custom GenericJson pretty print (#3367
   )
   - scio-parquet to support dynamic destinations for windowed scollections
   (#3356 )
   - Support $LATEST replacement for Query (#3357
   )
   - Mutable ScalableBloomFilter (#3339
   )
   - Add specialized TupleCoders (#3350
   )
   - Add nullCoder on Record and Disjunction coders (#3349
   )

Bug Fixes

   - Support null-key records in smb writes (#3359
   )
   - Fix serialization struggles in SMB transform API (#3342
   )
   - Grammar / spelling fixes in migration guides (#3358
   )
   - Remove unused macro import (#3353
   )
   - Remove unused BaseSeqLikeCoder implicit (#3344
   )
   - Filter out potentially included env directories (#3322
   )
   - Simplify LowPriorityCoders (#3320
   )
   - Remove unused and not useful Coder implicit trait (#3319
   )
   - Make javaBeanCoder lower prio (#3318
   )

Dependency Updates

   - Update Beam to 2.24.0 (#3325
   )
   - Update scalafmt-core to 2.7.3 (#3364
   )
   - Update elasticsearch-rest-client, ... to 7.9.2 (#3347
   )
   - Update hadoop libs to 2.8.5 (#3337
   )
   - Update sbt-scalafix to 0.9.21 (#3335
   )
   - Update sbt-mdoc to 2.2.9 (#3327
   )
   - Update sbt-avro to 3.1.0 (#3323
   )
   - Update mysql-socket-factory to 1.1.0 (#3321
   )
   - Update scala-collection-compat to 2.2.0 (#3312
   )
   - Update sbt-mdoc to 2.2.8 (#3313
   )


Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Luke Cwik
I have seen NoClassDefFoundErrors even when the class is there if there is
an issue loading the class (usually related to JNI failing to load or a
static block failing). Try to find the first linkage error
(ExceptionInInitializer / UnsatisifedLinkError / ...) in the logs as it
typically has more details as to why loading failed.

On Fri, Oct 2, 2020 at 8:30 AM Tomo Suzuki  wrote:

> I suspected that io.grpc:grpc-netty-shaded:jar:1.27.2 was incorrectly
> shaded, but the JAR file contains the
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2 which is
> reported as missing. Strange.
>
> suztomo-macbookpro44% jar tf grpc-netty-shaded-1.27.2.jar |grep
> IntObjectHashMap
> *io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2*.class
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$KeySet.class
>
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$MapIterator.class
>
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$MapEntry.class
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2$1.class
>
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$PrimitiveIterator.class
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap.class
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$1.class
>
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$KeySet$1.class
>
> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$EntrySet.class
>
> On Fri, Oct 2, 2020 at 6:37 AM Kaymak, Tobias 
> wrote:
>
>> No, that was not the case. I'm still seeing this message when canceling a
>> pipeline. Sorry the spam.
>>
>> On Fri, Oct 2, 2020 at 12:22 PM Kaymak, Tobias 
>> wrote:
>>
>>> I think this was caused by having the flink-runner defined twice in my
>>> pom. Oo
>>> (one time as defined with scope runtime, and one time without)
>>>
>>>
>>> On Fri, Oct 2, 2020 at 9:38 AM Kaymak, Tobias 
>>> wrote:
>>>
 Sorry that I forgot to include the versions, currently I'm on Beam
 2.23.0 / Flink 1.10.2 - I have a test dependency for cassandra (archinnov)
 which should *not *be available at runtime, refers to netty and is
 included in this tree, but the other two places where I find netty is in
 Flink and the beam-sdks-java-io-google-cloud-platform ->
 io.grpc:grpc-netty 1.27.2

 Stupid question: How can I check which version Flink 1.10.2 is
 expecting in the runtime?

 output of mvn -Pflink-runner dependency:tree

 --- maven-dependency-plugin:2.8:tree (default-cli) ---
 ch.ricardo.di:di-beam:jar:2.11.0
 +- org.apache.beam:beam-sdks-java-core:jar:2.23.0:compile
 |  +- org.apache.beam:beam-model-pipeline:jar:2.23.0:compile
 |  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
 |  |  +- commons-logging:commons-logging:jar:1.2:compile
 |  |  +- org.apache.logging.log4j:log4j-api:jar:2.6.2:compile
 |  |  \- org.conscrypt:conscrypt-openjdk-uber:jar:1.3.0:compile
 |  +- org.apache.beam:beam-model-job-management:jar:2.23.0:compile
 |  +- org.apache.beam:beam-vendor-bytebuddy-1_10_8:jar:0.1:compile
 |  +- org.apache.beam:beam-vendor-grpc-1_26_0:jar:0.3:compile
 |  +- org.apache.beam:beam-vendor-guava-26_0-jre:jar:0.1:compile
 |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
 |  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.2:compile
 |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.2:compile
 |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.2:compile
 |  +- org.apache.avro:avro:jar:1.8.2:compile
 |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
 |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
 |  |  \- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
 |  +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
 |  \- org.tukaani:xz:jar:1.8:compile
 +-
 org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.23.0:compile
 |  +-
 org.apache.beam:beam-sdks-java-expansion-service:jar:2.23.0:compile
 |  |  \- org.apache.beam:beam-model-fn-execution:jar:2.23.0:compile
 |  +-
 org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.23.0:compile
 |  |  +- com.google.cloud.bigdataoss:gcsio:jar:2.1.3:compile
 |  |  \-
 com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev20200311-1.30.9:compile
 |  +- com.google.cloud.bigdataoss:util:jar:2.1.3:compile
 |  |  +-
 com.google.api-client:google-api-client-java6:jar:1.30.9:compile
 |  |  +-
 com.google.api-client:google-api-client-jackson2:jar:1.30.9:compile
 |  |  +-
 com.google.oauth-client:google-oauth-client-java6:jar:1.30.6:compile
 |  |  +- com.google.flogger:google-extensions:jar:0.5.1:compile
 |  |  |  \- com.google.flogger:flogger:jar:0.5.1:compile
 |  |  \- com.google.flogger:flogger-system-backend:jar:0.5.1:runtime
 |  |

Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Tomo Suzuki
I suspected that io.grpc:grpc-netty-shaded:jar:1.27.2 was incorrectly
shaded, but the JAR file contains the
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2 which is
reported as missing. Strange.

suztomo-macbookpro44% jar tf grpc-netty-shaded-1.27.2.jar |grep
IntObjectHashMap
*io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2*.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$KeySet.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$MapIterator.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$MapEntry.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2$1.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$PrimitiveIterator.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$1.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$KeySet$1.class
io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$EntrySet.class

On Fri, Oct 2, 2020 at 6:37 AM Kaymak, Tobias 
wrote:

> No, that was not the case. I'm still seeing this message when canceling a
> pipeline. Sorry the spam.
>
> On Fri, Oct 2, 2020 at 12:22 PM Kaymak, Tobias 
> wrote:
>
>> I think this was caused by having the flink-runner defined twice in my
>> pom. Oo
>> (one time as defined with scope runtime, and one time without)
>>
>>
>> On Fri, Oct 2, 2020 at 9:38 AM Kaymak, Tobias 
>> wrote:
>>
>>> Sorry that I forgot to include the versions, currently I'm on Beam
>>> 2.23.0 / Flink 1.10.2 - I have a test dependency for cassandra (archinnov)
>>> which should *not *be available at runtime, refers to netty and is
>>> included in this tree, but the other two places where I find netty is in
>>> Flink and the beam-sdks-java-io-google-cloud-platform ->
>>> io.grpc:grpc-netty 1.27.2
>>>
>>> Stupid question: How can I check which version Flink 1.10.2 is expecting
>>> in the runtime?
>>>
>>> output of mvn -Pflink-runner dependency:tree
>>>
>>> --- maven-dependency-plugin:2.8:tree (default-cli) ---
>>> ch.ricardo.di:di-beam:jar:2.11.0
>>> +- org.apache.beam:beam-sdks-java-core:jar:2.23.0:compile
>>> |  +- org.apache.beam:beam-model-pipeline:jar:2.23.0:compile
>>> |  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
>>> |  |  +- commons-logging:commons-logging:jar:1.2:compile
>>> |  |  +- org.apache.logging.log4j:log4j-api:jar:2.6.2:compile
>>> |  |  \- org.conscrypt:conscrypt-openjdk-uber:jar:1.3.0:compile
>>> |  +- org.apache.beam:beam-model-job-management:jar:2.23.0:compile
>>> |  +- org.apache.beam:beam-vendor-bytebuddy-1_10_8:jar:0.1:compile
>>> |  +- org.apache.beam:beam-vendor-grpc-1_26_0:jar:0.3:compile
>>> |  +- org.apache.beam:beam-vendor-guava-26_0-jre:jar:0.1:compile
>>> |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
>>> |  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.2:compile
>>> |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.2:compile
>>> |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.2:compile
>>> |  +- org.apache.avro:avro:jar:1.8.2:compile
>>> |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
>>> |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
>>> |  |  \- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
>>> |  +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
>>> |  \- org.tukaani:xz:jar:1.8:compile
>>> +-
>>> org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.23.0:compile
>>> |  +- org.apache.beam:beam-sdks-java-expansion-service:jar:2.23.0:compile
>>> |  |  \- org.apache.beam:beam-model-fn-execution:jar:2.23.0:compile
>>> |  +-
>>> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.23.0:compile
>>> |  |  +- com.google.cloud.bigdataoss:gcsio:jar:2.1.3:compile
>>> |  |  \-
>>> com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev20200311-1.30.9:compile
>>> |  +- com.google.cloud.bigdataoss:util:jar:2.1.3:compile
>>> |  |  +- com.google.api-client:google-api-client-java6:jar:1.30.9:compile
>>> |  |  +-
>>> com.google.api-client:google-api-client-jackson2:jar:1.30.9:compile
>>> |  |  +-
>>> com.google.oauth-client:google-oauth-client-java6:jar:1.30.6:compile
>>> |  |  +- com.google.flogger:google-extensions:jar:0.5.1:compile
>>> |  |  |  \- com.google.flogger:flogger:jar:0.5.1:compile
>>> |  |  \- com.google.flogger:flogger-system-backend:jar:0.5.1:runtime
>>> |  | \- org.checkerframework:checker-compat-qual:jar:2.5.3:runtime
>>> |  +- com.google.api:gax:jar:1.54.0:compile
>>> |  |  \- org.threeten:threetenbp:jar:1.4.0:compile
>>> |  +- com.google.api:gax-grpc:jar:1.54.0:compile
>>> |  |  \- io.grpc:grpc-protobuf:jar:1.27.2:compile
>>> |  | \- io.grpc:grpc-protobuf-lite:jar:1.27.2:compile
>>> |  +-
>>> com.google.apis:google-api-services-healthcare:jar:v1beta1-rev20200525-1.30.9:compile
>>> |  +- 

Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Kaymak, Tobias
This is what I came up with:

https://gist.github.com/tkaymak/1f5eccf8633c18ab7f46f8ad01527630

The first run looks okay (in my use case size and offset are the same), but
I will need to add tests to prove my understanding of this.

On Fri, Oct 2, 2020 at 12:05 PM Kaymak, Tobias 
wrote:

> Hello,
>
> In chapter 4 of the Streaming Systems book by Tyler, Slava and Reuven
> there is an example 4-6 on page 111 about custom windowing that deals with
> UnalignedFixedWindows:
>
> https://www.oreilly.com/library/view/streaming-systems/9781491983867/ch04.html
>
> Unfortunately that example is abbreviated and the full source code is not
> published in this repo:
> https://github.com/takidau/streamingbook
>
> I am joining two Kafka Streams and I am currently windowing them by fixed
> time intervals. However the elements in stream one ("articles") are
> published first, then the assets for those articles are being published in
> the "assets" topic. Articles event timestamps are therefore slightly before
> those of assets.
>
> Now when doing a CoGroupByKey this can lead to a situation where an
> article is not being processed together with its assets, as
>
> - the article has a timestamp of 2020-10-02T00:30:29.997Z
> - the assets have a timestamp of 2020-10-02T00:30:30.001Z
>
> This is a must in my pipeline as I am relying on them to be processed
> together - otherwise I am publishing an article without it's assets.
>
> My idea was therefore to apply UnalignedFixedWindows instead of fixed
> ones to the streams to circumvent this. What I am currently missing is the
> mergeWindows() implementation or the full source code to understand it. I
> am currently facing a java.lang.IllegalStateException
>
> TimestampCombiner moved element from 2020-10-02T09:32:36.079Z to earlier
> time 2020-10-02T09:32:03.365Z for window
> [2020-10-02T09:31:03.366Z..2020-10-02T09:32:03.366Z)
>
> Which gives me the impression that I am doing something wrong or have not
> fully understood the custom windowing topic.
>
> Am I on the wrong track here?
>
>
>
>


Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Kaymak, Tobias
No, that was not the case. I'm still seeing this message when canceling a
pipeline. Sorry the spam.

On Fri, Oct 2, 2020 at 12:22 PM Kaymak, Tobias 
wrote:

> I think this was caused by having the flink-runner defined twice in my
> pom. Oo
> (one time as defined with scope runtime, and one time without)
>
>
> On Fri, Oct 2, 2020 at 9:38 AM Kaymak, Tobias 
> wrote:
>
>> Sorry that I forgot to include the versions, currently I'm on Beam 2.23.0
>> / Flink 1.10.2 - I have a test dependency for cassandra (archinnov) which
>> should *not *be available at runtime, refers to netty and is included in
>> this tree, but the other two places where I find netty is in Flink and the
>> beam-sdks-java-io-google-cloud-platform -> io.grpc:grpc-netty 1.27.2
>>
>> Stupid question: How can I check which version Flink 1.10.2 is expecting
>> in the runtime?
>>
>> output of mvn -Pflink-runner dependency:tree
>>
>> --- maven-dependency-plugin:2.8:tree (default-cli) ---
>> ch.ricardo.di:di-beam:jar:2.11.0
>> +- org.apache.beam:beam-sdks-java-core:jar:2.23.0:compile
>> |  +- org.apache.beam:beam-model-pipeline:jar:2.23.0:compile
>> |  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
>> |  |  +- commons-logging:commons-logging:jar:1.2:compile
>> |  |  +- org.apache.logging.log4j:log4j-api:jar:2.6.2:compile
>> |  |  \- org.conscrypt:conscrypt-openjdk-uber:jar:1.3.0:compile
>> |  +- org.apache.beam:beam-model-job-management:jar:2.23.0:compile
>> |  +- org.apache.beam:beam-vendor-bytebuddy-1_10_8:jar:0.1:compile
>> |  +- org.apache.beam:beam-vendor-grpc-1_26_0:jar:0.3:compile
>> |  +- org.apache.beam:beam-vendor-guava-26_0-jre:jar:0.1:compile
>> |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
>> |  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.2:compile
>> |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.2:compile
>> |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.2:compile
>> |  +- org.apache.avro:avro:jar:1.8.2:compile
>> |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
>> |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
>> |  |  \- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
>> |  +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
>> |  \- org.tukaani:xz:jar:1.8:compile
>> +-
>> org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.23.0:compile
>> |  +- org.apache.beam:beam-sdks-java-expansion-service:jar:2.23.0:compile
>> |  |  \- org.apache.beam:beam-model-fn-execution:jar:2.23.0:compile
>> |  +-
>> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.23.0:compile
>> |  |  +- com.google.cloud.bigdataoss:gcsio:jar:2.1.3:compile
>> |  |  \-
>> com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev20200311-1.30.9:compile
>> |  +- com.google.cloud.bigdataoss:util:jar:2.1.3:compile
>> |  |  +- com.google.api-client:google-api-client-java6:jar:1.30.9:compile
>> |  |  +-
>> com.google.api-client:google-api-client-jackson2:jar:1.30.9:compile
>> |  |  +-
>> com.google.oauth-client:google-oauth-client-java6:jar:1.30.6:compile
>> |  |  +- com.google.flogger:google-extensions:jar:0.5.1:compile
>> |  |  |  \- com.google.flogger:flogger:jar:0.5.1:compile
>> |  |  \- com.google.flogger:flogger-system-backend:jar:0.5.1:runtime
>> |  | \- org.checkerframework:checker-compat-qual:jar:2.5.3:runtime
>> |  +- com.google.api:gax:jar:1.54.0:compile
>> |  |  \- org.threeten:threetenbp:jar:1.4.0:compile
>> |  +- com.google.api:gax-grpc:jar:1.54.0:compile
>> |  |  \- io.grpc:grpc-protobuf:jar:1.27.2:compile
>> |  | \- io.grpc:grpc-protobuf-lite:jar:1.27.2:compile
>> |  +-
>> com.google.apis:google-api-services-healthcare:jar:v1beta1-rev20200525-1.30.9:compile
>> |  +- com.google.auth:google-auth-library-credentials:jar:0.19.0:compile
>> |  +- com.google.auth:google-auth-library-oauth2-http:jar:0.19.0:compile
>> |  +-
>> com.google.cloud:google-cloud-bigquerystorage:jar:0.125.0-beta:compile
>> |  |  +-
>> com.google.api.grpc:proto-google-cloud-bigquerystorage-v1alpha2:jar:0.90.0:compile
>> |  |  +-
>> com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta2:jar:0.90.0:compile
>> |  |  \-
>> com.google.api.grpc:proto-google-cloud-bigquerystorage-v1:jar:0.90.0:compile
>> |  +- com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0:compile
>> |  |  +- com.google.cloud:google-cloud-bigtable:jar:1.9.1:compile
>> |  |  +- com.google.api.grpc:grpc-google-common-protos:jar:1.17.0:compile
>> |  |  +-
>> com.google.api.grpc:grpc-google-cloud-bigtable-v2:jar:1.9.1:compile
>> |  |  +-
>> com.google.api.grpc:proto-google-cloud-bigtable-admin-v2:jar:1.9.1:compile
>> |  |  +-
>> com.google.api.grpc:grpc-google-cloud-bigtable-admin-v2:jar:1.9.1:compile
>> |  |  +- com.google.api.grpc:proto-google-iam-v1:jar:0.13.0:compile
>> |  |  +- io.opencensus:opencensus-contrib-grpc-util:jar:0.24.0:compile
>> |  |  +- io.dropwizard.metrics:metrics-core:jar:3.2.6:compile
>> |  |  \- 

Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Kaymak, Tobias
I think this was caused by having the flink-runner defined twice in my pom.
Oo
(one time as defined with scope runtime, and one time without)


On Fri, Oct 2, 2020 at 9:38 AM Kaymak, Tobias 
wrote:

> Sorry that I forgot to include the versions, currently I'm on Beam 2.23.0
> / Flink 1.10.2 - I have a test dependency for cassandra (archinnov) which
> should *not *be available at runtime, refers to netty and is included in
> this tree, but the other two places where I find netty is in Flink and the
> beam-sdks-java-io-google-cloud-platform -> io.grpc:grpc-netty 1.27.2
>
> Stupid question: How can I check which version Flink 1.10.2 is expecting
> in the runtime?
>
> output of mvn -Pflink-runner dependency:tree
>
> --- maven-dependency-plugin:2.8:tree (default-cli) ---
> ch.ricardo.di:di-beam:jar:2.11.0
> +- org.apache.beam:beam-sdks-java-core:jar:2.23.0:compile
> |  +- org.apache.beam:beam-model-pipeline:jar:2.23.0:compile
> |  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
> |  |  +- commons-logging:commons-logging:jar:1.2:compile
> |  |  +- org.apache.logging.log4j:log4j-api:jar:2.6.2:compile
> |  |  \- org.conscrypt:conscrypt-openjdk-uber:jar:1.3.0:compile
> |  +- org.apache.beam:beam-model-job-management:jar:2.23.0:compile
> |  +- org.apache.beam:beam-vendor-bytebuddy-1_10_8:jar:0.1:compile
> |  +- org.apache.beam:beam-vendor-grpc-1_26_0:jar:0.3:compile
> |  +- org.apache.beam:beam-vendor-guava-26_0-jre:jar:0.1:compile
> |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
> |  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.2:compile
> |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.2:compile
> |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.2:compile
> |  +- org.apache.avro:avro:jar:1.8.2:compile
> |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
> |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
> |  |  \- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
> |  +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
> |  \- org.tukaani:xz:jar:1.8:compile
> +-
> org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.23.0:compile
> |  +- org.apache.beam:beam-sdks-java-expansion-service:jar:2.23.0:compile
> |  |  \- org.apache.beam:beam-model-fn-execution:jar:2.23.0:compile
> |  +-
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.23.0:compile
> |  |  +- com.google.cloud.bigdataoss:gcsio:jar:2.1.3:compile
> |  |  \-
> com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev20200311-1.30.9:compile
> |  +- com.google.cloud.bigdataoss:util:jar:2.1.3:compile
> |  |  +- com.google.api-client:google-api-client-java6:jar:1.30.9:compile
> |  |  +-
> com.google.api-client:google-api-client-jackson2:jar:1.30.9:compile
> |  |  +-
> com.google.oauth-client:google-oauth-client-java6:jar:1.30.6:compile
> |  |  +- com.google.flogger:google-extensions:jar:0.5.1:compile
> |  |  |  \- com.google.flogger:flogger:jar:0.5.1:compile
> |  |  \- com.google.flogger:flogger-system-backend:jar:0.5.1:runtime
> |  | \- org.checkerframework:checker-compat-qual:jar:2.5.3:runtime
> |  +- com.google.api:gax:jar:1.54.0:compile
> |  |  \- org.threeten:threetenbp:jar:1.4.0:compile
> |  +- com.google.api:gax-grpc:jar:1.54.0:compile
> |  |  \- io.grpc:grpc-protobuf:jar:1.27.2:compile
> |  | \- io.grpc:grpc-protobuf-lite:jar:1.27.2:compile
> |  +-
> com.google.apis:google-api-services-healthcare:jar:v1beta1-rev20200525-1.30.9:compile
> |  +- com.google.auth:google-auth-library-credentials:jar:0.19.0:compile
> |  +- com.google.auth:google-auth-library-oauth2-http:jar:0.19.0:compile
> |  +-
> com.google.cloud:google-cloud-bigquerystorage:jar:0.125.0-beta:compile
> |  |  +-
> com.google.api.grpc:proto-google-cloud-bigquerystorage-v1alpha2:jar:0.90.0:compile
> |  |  +-
> com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta2:jar:0.90.0:compile
> |  |  \-
> com.google.api.grpc:proto-google-cloud-bigquerystorage-v1:jar:0.90.0:compile
> |  +- com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0:compile
> |  |  +- com.google.cloud:google-cloud-bigtable:jar:1.9.1:compile
> |  |  +- com.google.api.grpc:grpc-google-common-protos:jar:1.17.0:compile
> |  |  +-
> com.google.api.grpc:grpc-google-cloud-bigtable-v2:jar:1.9.1:compile
> |  |  +-
> com.google.api.grpc:proto-google-cloud-bigtable-admin-v2:jar:1.9.1:compile
> |  |  +-
> com.google.api.grpc:grpc-google-cloud-bigtable-admin-v2:jar:1.9.1:compile
> |  |  +- com.google.api.grpc:proto-google-iam-v1:jar:0.13.0:compile
> |  |  +- io.opencensus:opencensus-contrib-grpc-util:jar:0.24.0:compile
> |  |  +- io.dropwizard.metrics:metrics-core:jar:3.2.6:compile
> |  |  \- commons-codec:commons-codec:jar:1.13:compile
> |  +- com.google.cloud:google-cloud-core:jar:1.92.2:compile
> |  +- com.google.cloud:google-cloud-core-grpc:jar:1.92.2:compile
> |  +-
> com.google.cloud.datastore:datastore-v1-proto-client:jar:1.6.3:compile
> |  |  \-
> 

UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Kaymak, Tobias
Hello,

In chapter 4 of the Streaming Systems book by Tyler, Slava and Reuven there
is an example 4-6 on page 111 about custom windowing that deals with
UnalignedFixedWindows:
https://www.oreilly.com/library/view/streaming-systems/9781491983867/ch04.html

Unfortunately that example is abbreviated and the full source code is not
published in this repo:
https://github.com/takidau/streamingbook

I am joining two Kafka Streams and I am currently windowing them by fixed
time intervals. However the elements in stream one ("articles") are
published first, then the assets for those articles are being published in
the "assets" topic. Articles event timestamps are therefore slightly before
those of assets.

Now when doing a CoGroupByKey this can lead to a situation where an article
is not being processed together with its assets, as

- the article has a timestamp of 2020-10-02T00:30:29.997Z
- the assets have a timestamp of 2020-10-02T00:30:30.001Z

This is a must in my pipeline as I am relying on them to be processed
together - otherwise I am publishing an article without it's assets.

My idea was therefore to apply UnalignedFixedWindows instead of fixed ones
to the streams to circumvent this. What I am currently missing is the
mergeWindows() implementation or the full source code to understand it. I
am currently facing a java.lang.IllegalStateException

TimestampCombiner moved element from 2020-10-02T09:32:36.079Z to earlier
time 2020-10-02T09:32:03.365Z for window
[2020-10-02T09:31:03.366Z..2020-10-02T09:32:03.366Z)

Which gives me the impression that I am doing something wrong or have not
fully understood the custom windowing topic.

Am I on the wrong track here?


Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Kaymak, Tobias
Sorry that I forgot to include the versions, currently I'm on Beam 2.23.0 /
Flink 1.10.2 - I have a test dependency for cassandra (archinnov) which
should *not *be available at runtime, refers to netty and is included in
this tree, but the other two places where I find netty is in Flink and the
beam-sdks-java-io-google-cloud-platform -> io.grpc:grpc-netty 1.27.2

Stupid question: How can I check which version Flink 1.10.2 is expecting in
the runtime?

output of mvn -Pflink-runner dependency:tree

--- maven-dependency-plugin:2.8:tree (default-cli) ---
ch.ricardo.di:di-beam:jar:2.11.0
+- org.apache.beam:beam-sdks-java-core:jar:2.23.0:compile
|  +- org.apache.beam:beam-model-pipeline:jar:2.23.0:compile
|  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
|  |  +- commons-logging:commons-logging:jar:1.2:compile
|  |  +- org.apache.logging.log4j:log4j-api:jar:2.6.2:compile
|  |  \- org.conscrypt:conscrypt-openjdk-uber:jar:1.3.0:compile
|  +- org.apache.beam:beam-model-job-management:jar:2.23.0:compile
|  +- org.apache.beam:beam-vendor-bytebuddy-1_10_8:jar:0.1:compile
|  +- org.apache.beam:beam-vendor-grpc-1_26_0:jar:0.3:compile
|  +- org.apache.beam:beam-vendor-guava-26_0-jre:jar:0.1:compile
|  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
|  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.2:compile
|  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.2:compile
|  +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.2:compile
|  +- org.apache.avro:avro:jar:1.8.2:compile
|  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
|  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
|  |  \- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
|  +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
|  \- org.tukaani:xz:jar:1.8:compile
+-
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.23.0:compile
|  +- org.apache.beam:beam-sdks-java-expansion-service:jar:2.23.0:compile
|  |  \- org.apache.beam:beam-model-fn-execution:jar:2.23.0:compile
|  +-
org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.23.0:compile
|  |  +- com.google.cloud.bigdataoss:gcsio:jar:2.1.3:compile
|  |  \-
com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev20200311-1.30.9:compile
|  +- com.google.cloud.bigdataoss:util:jar:2.1.3:compile
|  |  +- com.google.api-client:google-api-client-java6:jar:1.30.9:compile
|  |  +- com.google.api-client:google-api-client-jackson2:jar:1.30.9:compile
|  |  +-
com.google.oauth-client:google-oauth-client-java6:jar:1.30.6:compile
|  |  +- com.google.flogger:google-extensions:jar:0.5.1:compile
|  |  |  \- com.google.flogger:flogger:jar:0.5.1:compile
|  |  \- com.google.flogger:flogger-system-backend:jar:0.5.1:runtime
|  | \- org.checkerframework:checker-compat-qual:jar:2.5.3:runtime
|  +- com.google.api:gax:jar:1.54.0:compile
|  |  \- org.threeten:threetenbp:jar:1.4.0:compile
|  +- com.google.api:gax-grpc:jar:1.54.0:compile
|  |  \- io.grpc:grpc-protobuf:jar:1.27.2:compile
|  | \- io.grpc:grpc-protobuf-lite:jar:1.27.2:compile
|  +-
com.google.apis:google-api-services-healthcare:jar:v1beta1-rev20200525-1.30.9:compile
|  +- com.google.auth:google-auth-library-credentials:jar:0.19.0:compile
|  +- com.google.auth:google-auth-library-oauth2-http:jar:0.19.0:compile
|  +- com.google.cloud:google-cloud-bigquerystorage:jar:0.125.0-beta:compile
|  |  +-
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1alpha2:jar:0.90.0:compile
|  |  +-
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta2:jar:0.90.0:compile
|  |  \-
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1:jar:0.90.0:compile
|  +- com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0:compile
|  |  +- com.google.cloud:google-cloud-bigtable:jar:1.9.1:compile
|  |  +- com.google.api.grpc:grpc-google-common-protos:jar:1.17.0:compile
|  |  +- com.google.api.grpc:grpc-google-cloud-bigtable-v2:jar:1.9.1:compile
|  |  +-
com.google.api.grpc:proto-google-cloud-bigtable-admin-v2:jar:1.9.1:compile
|  |  +-
com.google.api.grpc:grpc-google-cloud-bigtable-admin-v2:jar:1.9.1:compile
|  |  +- com.google.api.grpc:proto-google-iam-v1:jar:0.13.0:compile
|  |  +- io.opencensus:opencensus-contrib-grpc-util:jar:0.24.0:compile
|  |  +- io.dropwizard.metrics:metrics-core:jar:3.2.6:compile
|  |  \- commons-codec:commons-codec:jar:1.13:compile
|  +- com.google.cloud:google-cloud-core:jar:1.92.2:compile
|  +- com.google.cloud:google-cloud-core-grpc:jar:1.92.2:compile
|  +- com.google.cloud.datastore:datastore-v1-proto-client:jar:1.6.3:compile
|  |  \-
com.google.http-client:google-http-client-protobuf:jar:1.33.0:compile
|  +- com.google.cloud:google-cloud-spanner:jar:1.49.1:compile
|  |  +-
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:1.49.1:compile
|  |  \-
com.google.api.grpc:proto-google-cloud-spanner-v1:jar:1.49.1:compile
|  +- com.google.http-client:google-http-client-jackson2:jar:1.34.0:compile
|  +-