Re: Thoughts from a first time contributor

2022-02-08 Thread Robert Burke
Glad to hear the readme was useful!

I lean towards keeping it in GitHub, but agree there should be more cross
linking with the Wiki.

Reason being is that README content shows up in the Go Package Doc site:
eg. https://pkg.go.dev/github.com/apache/beam/sdks/v2/go

Granted we don't make 100% the best use of this at present.

On Tue, Feb 8, 2022, 2:23 PM Danny McCormick 
wrote:

> Thanks!
>
> > I'm curious if you found this lengthy contributing guide daunting,
> especially for non-java development. (I know I would.)
>
> To be honest, that wasn't a huge issue for me -  I think there's certainly
> room for making it better, but I actually appreciated the level of detail,
> and didn't really have much of an issue getting going with development once
> I found the guide. Some of that is almost certainly related to my
> background - I have a bunch of experience ramping up in unfamiliar repos
> from my time with GitHub (internal and external repos) and am probably
> undeterred by longer guides as a result (a longer guide actually is often a
> plus in my mind). I can definitely imagine that being a blocker to someone
> who has less experience ramping up on open source repos.
>
> You didn't really ask for this, but if I were to try to give more focused
> feedback on the guide (and related resources) specifically, I'd say:
>
> 1. The `local-env-setup.sh` script worked perfectly and was a big boost -
> continuing to maintain that well will make getting started a lot easier for
> new contributors.
> 2. It took me a long time to find the docs that are in the confluence
> wiki. From https://beam.apache.org/contribute/ you kinda just need to
> know that the side panel links to that and that there are a lot more docs
> available there than are rendering in that side panel (e.g. Go Tips with
> instructions on building/testing would have been really helpful, but I had
> no clue it existed for a bit - the same goes for Java/Python tips I think).
> I think our cross linking could be better on that initial landing page to
> expose more of those confluence docs (or at least the relevant ones).
> 3. The instructions on joining the slack channel didn't work for me here -
> https://beam.apache.org/community/contact-us/
> 4. The Go Sdk has some (pretty helpful) instructions in its root readme -
> https://github.com/apache/beam/tree/master/sdks/go - Java and Python
> don't. I lean towards moving the Go docs into the wiki and leaving behind a
> link, but if not we probably need to do something to make that more
> discoverable.
>
> Again though, I generally found the guide helpful if imperfect and the
> hard pieces were not central to getting started (except maybe the
> confluence discoverability bit).
>
> Thanks,
> Danny
>
>
> On Tue, Feb 8, 2022 at 4:56 PM Robert Bradshaw 
> wrote:
>
>> Thanks for contributing, and welcome!
>>
>> Good feedback. One question inline below.
>>
>> On Mon, Jan 31, 2022 at 12:17 PM Danny McCormick
>>  wrote:
>> >
>> >  Hey folks, my name is Danny - I recently completed my first Beam
>> PR[1] (a small extension to the Go Dataflow runner) and am planning on
>> becoming a more regular part of the community. As such, I wanted to use my
>> fresh newbie eyes and share some of what was nice and where there was
>> friction about getting started.
>> >
>> > Disclaimer: this is coming from the perspective of someone who is
>> pretty used to open source development, but has minimal experience with the
>> Apache way, Beam, and the languages my change came in. I'm hoping my
>> experience is helpful to those of you who have been around for a while and
>> haven't seen things as a newcomer in a long time, but it may not be
>> reflective of the experience of others.
>> >
>> > Things that were really nice:
>> >
>> > - The community has been really welcoming and encouraging of
>> contributions, something I saw in my first code review, my first pr, and
>> even the tone of the docs. Special thanks to @lostluck and @jrmccluskey for
>> making my first interactions welcoming and prompt. That experience can be
>> the difference between one time and repeat contributors.
>> >
>> > - Getting started writing my first pipeline, and then ramping up to
>> more complex concepts was surprisingly easy - in particular, the docs,
>> examples, and Katas made for a reasonably smooth process. It wasn't always
>> clear how to go from that to more complex transforms and there's of course
>> room for more clarity, but I appreciate the work that's gone into the
>> getting started experience.
>> >
>> > - Overall, the code base is pretty easily understood/reasoned about,
>> and the high quality of code made it pretty easy to make my first change.
>> I'm pretty impressed at how simple/well composed this system is even as it
>> approaches a tricky problem space (hopefully I'm saying the same thing
>> after I make some bigger changes :))
>> >
>> > Friction Points:
>> >
>> > - It was harder than expected for me to figure out what made Beam
>> 

Re: Bean 2.36.0 + Flink 1.13 appears to be broken

2022-02-08 Thread Cristian Constantinescu
Hey Tomo,

Thanks for the tip! It turns out my deployment project (the one that
creates the fat jar) and my pipelines project (the one with actual code)
had mismatching Beam versions.

User error, sorry about that.

Thanks for your help,
Cristian

On Tue, Feb 8, 2022 at 3:32 PM Tomo Suzuki  wrote:

> IncompatibleClassChangeError usually occurs when some a class comes from
> an older version of JAR file.
>
> Do you know which JAR file provides
> "org.apache.beam.model.pipeline.v1.RunnerApi$StandardResourceHints$Enum"
> when the exception happens?
>
> I often use "getProtectionDomain()"
> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
> class.
>
>
> On Tue, Feb 8, 2022 at 3:18 PM Cristian Constantinescu 
> wrote:
>
>> Hi everyone,
>>
>> I am very excited with the 2.36 release, especially the stopReadOffset
>> addition to the KafkaSourceDescriptors. With it, I can read sections of a
>> topic and create state,effectively having a bounded kafka source, before
>> reading new items that need processing.
>>
>> Unfortunately, running the pipeline from the Flink CLI produces the
>> following error:
>>
>> Pretty printing Flink args:
>> --detached
>> --class=namespace.pipeline.App
>> /opt/bns/etf/data-platform/user_code/pipelines/cerberus-etl-pipelines.jar
>> --configFilePath=/path/to/config.properties
>> --runner=FlinkRunner
>> --streaming
>> --checkpointingInterval=3
>> --stateBackend=filesystem
>> --stateBackendStoragePath=file:///path/to/state
>> --numberOfExecutionRetries=2
>> --fasterCopy
>> --debugThrowExceptions
>> java.lang.IncompatibleClassChangeError: Class
>> org.apache.beam.model.pipeline.v1.RunnerApi$StandardResourceHints$Enum does
>> not implement the requested interface
>> org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ProtocolMessageEnum
>> at
>> org.apache.beam.sdk.transforms.resourcehints.ResourceHints.getUrn(ResourceHints.java:50)
>> at
>> org.apache.beam.sdk.transforms.resourcehints.ResourceHints.(ResourceHints.java:54)
>> at org.apache.beam.sdk.Pipeline.(Pipeline.java:523)
>> at org.apache.beam.sdk.Pipeline.create(Pipeline.java:157)
>> at lines containing Pipeline.create(options) <--- my code
>> at namespace.pipeline.App.main(App.java:42) <-- my code
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
>> Source)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>> Source)
>> at java.base/java.lang.reflect.Method.invoke(Unknown Source)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>> at
>> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
>> at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
>> at
>> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
>> at
>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
>> at
>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>> at
>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>> at
>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>>
>> Any advice would be appreciated.
>>
>> Thank you,
>> Cristian
>>
>
>
> --
> Regards,
> Tomo
>


Bean 2.36.0 + Flink 1.13 appears to be broken

2022-02-08 Thread Cristian Constantinescu
Hi everyone,

I am very excited with the 2.36 release, especially the stopReadOffset
addition to the KafkaSourceDescriptors. With it, I can read sections of a
topic and create state,effectively having a bounded kafka source, before
reading new items that need processing.

Unfortunately, running the pipeline from the Flink CLI produces the
following error:

Pretty printing Flink args:
--detached
--class=namespace.pipeline.App
/opt/bns/etf/data-platform/user_code/pipelines/cerberus-etl-pipelines.jar
--configFilePath=/path/to/config.properties
--runner=FlinkRunner
--streaming
--checkpointingInterval=3
--stateBackend=filesystem
--stateBackendStoragePath=file:///path/to/state
--numberOfExecutionRetries=2
--fasterCopy
--debugThrowExceptions
java.lang.IncompatibleClassChangeError: Class
org.apache.beam.model.pipeline.v1.RunnerApi$StandardResourceHints$Enum does
not implement the requested interface
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ProtocolMessageEnum
at
org.apache.beam.sdk.transforms.resourcehints.ResourceHints.getUrn(ResourceHints.java:50)
at
org.apache.beam.sdk.transforms.resourcehints.ResourceHints.(ResourceHints.java:54)
at org.apache.beam.sdk.Pipeline.(Pipeline.java:523)
at org.apache.beam.sdk.Pipeline.create(Pipeline.java:157)
at lines containing Pipeline.create(options) <--- my code
at namespace.pipeline.App.main(App.java:42) <-- my code
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

Any advice would be appreciated.

Thank you,
Cristian


Beam Website Feedback

2022-02-08 Thread Kirill Love
Adapt documentation:

https://beam.apache.org/documentation/transforms/python/elementwise/pardo/#example-1-pardo-with-a-simple-dofn
 


## Can be removed

> [BEAM-7885]  DoFn.setup() 
> doesn’t run for streaming jobs running in the DirectRunner.


Issue has been resolved.


## Can be changed

> self.window = beam.window.GlobalWindow()

New path is:

> beam.transforms.window.GlobalWindow()

Sincerely


Kirill Ism




P1 issues report (70)

2022-02-08 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13830: XVR Direct/Spark/Flink 
tests are timing out (created 2022-02-04)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing 
examples tests (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13798: Upgrade Kubernetes 
Clusters (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13769: 
beam_PreCommit_Python_Cron failing on test_create_uses_coder_for_pickling 
(created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13763: Rotate credentials for 
'io-datastores' Kubernetes cluster (created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13694: 
beam_PostCommit_Java_Hadoop_Versions failing with ClassDefNotFoundError 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13668: Java Spanner IO Request 
Count metrics broke backwards compatibility (created 2022-01-15)
https://issues.apache.org/jira/browse/BEAM-13615: Bumping up FnApi 
environment version to 9 in Java, Python SDK (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13579: Cannot run 
python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 
2021-12-29)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13376: Missing error for 
nonexistent column family BigTable (created 2021-12-03)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH (created 
2021-08-26)

Flaky test issue report (50)

2022-02-08 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing 
examples tests (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13783: 
apache_beam.transforms.combinefn_lifecycle_test.LocalCombineFnLifecycleTest.test_combine
 is flaky (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13525: Java VR (Dataflow, V2, 
Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests (created 
2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12673: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey (created 2021-07-28)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)

Re: KafkaIO.write and Avro

2022-02-08 Thread Matt Casters
Thanks a lot Moritz.  Your suggestion worked immediately.

You sort of get on the wrong track since my favorite IDE suggests:

.withValueSerializer((Class>)
KafkaAvroSerializer.class)

... which simply doesn't even compile for me.

 incompatible types:
java.lang.Class cannot
be converted to java.lang.Class>

It sort of puts you on the wrong footing hence my question.
If you don't mind I'll simply create a PR to amend the Javadoc for KafkaIO.

https://issues.apache.org/jira/browse/BEAM-13854

Easier to figure out was AvroCoder.of(schema) but it might make sense to
document that in the same context as well.

Thanks again!

Cheers,
Matt


On Tue, Feb 8, 2022 at 4:09 PM Moritz Mack  wrote:

> Just having a quick look, it looks like the respective interface in
> KafkaIO should rather look like this to support KafkaAvroSerializer, which
> is a Serializer:
>
>
>
> public Write  withValueSerializer(Class V>> valueSerializer)
>
>
>
> Thoughts?
>
> Cheers, Moritz
>
>
>
> *From: *Moritz Mack 
> *Date: *Tuesday, 8. February 2022 at 15:55
> *To: *dev@beam.apache.org , matt.cast...@neo4j.com <
> matt.cast...@neo4j.com>
> *Subject: *Re: KafkaIO.write and Avro
>
> Hi Matt, Unfortunately, the types don’t play well when using
> KafkaAvroSerializer. It currently requires a cast :/ The following will
> work: write.withValueSerializer((Class)KafkaAvroSerializer.class)) ‍ ‍ ‍ ‍
> ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
> ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Hi Matt,
>
>
>
> Unfortunately, the types don’t play well when using KafkaAvroSerializer.
> It currently requires a cast :/
>
> The following will work:
>
> write.withValueSerializer((Class)KafkaAvroSerializer.class))
>
>
>
> This seems to be the cause of repeated confusion, so probably worth
> improving the user experience here!
>
>
>
> Cheers,
>
> Moritz
>
>
>
> *From: *Matt Casters 
> *Date: *Tuesday, 8. February 2022 at 14:17
> *To: *Beam Development List 
> *Subject: *KafkaIO.write and Avro
>
> Dear Beams, When sending Avro values to Kafka, say GenericRecord, you
> typically specify option value.serializer as
> "io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a
> bunch of other options for authentication ZjQcmQRYFpfptBannerStart
>
> This Message Is From an External Sender
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Dear Beams,
>
>
>
> When sending Avro values to Kafka, say GenericRecord, you typically
> specify option value.serializer as
> "io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a
> bunch of other options for authentication and so on verifies the schema
> stored in the Avro record with a schema registry.   Unfortunately, I
> couldn't figure out how to pass this serializer class to KafkaIO.write() as
> it's not acceptable to the withValueSerializer() method.
>
>
>
> For KafkaIO.read() we made a specific provision in the form of
> class ConfluentSchemaRegistryDeserializer but it doesn't look like we
> covered the producer side of Avro values yet.
>
>
>
> I'd be happy to dive into the code to add proper support for a Confluent
> schema registry in KafkaIO.write() but I was just wondering if there was
> something I might have overlooked.  It's hard to find samples or
> documentation on producing Avro messages with Beam.
>
>
>
> Thanks in advance,
>
> Matt
>
>
>
> *As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice (updated August 2020) at
> Talend, Inc. *
>
>
>
> *As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice (updated August 2020) at
> Talend, Inc. *
>
>
>

-- 
Neo4j Chief Solutions Architect
*✉   *matt.cast...@neo4j.com


Re: KafkaIO.write and Avro

2022-02-08 Thread Moritz Mack
Just having a quick look, it looks like the respective interface in KafkaIO 
should rather look like this to support KafkaAvroSerializer, which is a 
Serializer:

public Write  withValueSerializer(Class> 
valueSerializer)

Thoughts?
Cheers, Moritz

From: Moritz Mack 
Date: Tuesday, 8. February 2022 at 15:55
To: dev@beam.apache.org , matt.cast...@neo4j.com 

Subject: Re: KafkaIO.write and Avro
Hi Matt, Unfortunately, the types don’t play well when using 
KafkaAvroSerializer. It currently requires a cast :/ The following will work: 
write.withValueSerializer((Class)KafkaAvroSerializer.class)) ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ 
‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Exercise caution when opening attachments or clicking any links.
ZjQcmQRYFpfptBannerEnd
Hi Matt,

Unfortunately, the types don’t play well when using KafkaAvroSerializer. It 
currently requires a cast :/
The following will work:
write.withValueSerializer((Class)KafkaAvroSerializer.class))

This seems to be the cause of repeated confusion, so probably worth improving 
the user experience here!

Cheers,
Moritz

From: Matt Casters 
Date: Tuesday, 8. February 2022 at 14:17
To: Beam Development List 
Subject: KafkaIO.write and Avro
Dear Beams, When sending Avro values to Kafka, say GenericRecord, you typically 
specify option value.serializer as 
"io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a bunch 
of other options for authentication ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Exercise caution when opening attachments or clicking any links.
ZjQcmQRYFpfptBannerEnd
Dear Beams,

When sending Avro values to Kafka, say GenericRecord, you typically specify 
option value.serializer as 
"io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a bunch 
of other options for authentication and so on verifies the schema stored in the 
Avro record with a schema registry.   Unfortunately, I couldn't figure out how 
to pass this serializer class to KafkaIO.write() as it's not acceptable to the 
withValueSerializer() method.

For KafkaIO.read() we made a specific provision in the form of class 
ConfluentSchemaRegistryDeserializer but it doesn't look like we covered the 
producer side of Avro values yet.

I'd be happy to dive into the code to add proper support for a Confluent schema 
registry in KafkaIO.write() but I was just wondering if there was something I 
might have overlooked.  It's hard to find samples or documentation on producing 
Avro messages with Beam.

Thanks in advance,

Matt


As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice (updated August 2020) at Talend, 
Inc. 


As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice (updated August 2020) at Talend, 
Inc. 




Re: Developing on an M1 Mac

2022-02-08 Thread Robert Burke
Go supports ARM64 on Darwin since 1.16, which is the minimum version of Go
we currently support.

See https://go.dev/blog/ports

There are definitely some hardcoded paths we'd need to adjust to build boot
containers though.

Go 1.18 improves things, and since it has the initial run of Go Generics,
we'll likely move to support it pretty quickly.

On Tue, Feb 8, 2022, 6:18 AM Jarek Potiuk  wrote:

> Just for your information: Thanks to that change - i will soon be adding
> ARM support for Apache Airflow - including building and publishing the
> images and running our tests (using self-hosted runners).
> As soon as I get it I will be able to share the code/experiences with you.
>
> J
>
> On Tue, Feb 8, 2022 at 2:50 PM Ismaël Mejía  wrote:
>
>> For awareness with the just released Beam 2.36.0 Beam works out of the
>> box to develop on a Mac M1.
>>
>> I tried Java and Python pipelines with success running locally on both
>> Flink/Spark runner.
>> I found one issue using zstd and created [1] that was merged today,
>> with this the sdks:core tests and Spark runner tests fully pass.
>>
>> I would see 2.36.0 is the first good enough release for someone
>> working on a Mac M1 or ARM64 processor.
>>
>> There are still some missing steps to have full ARM64 [apart of testing
>> it :)]
>>
>> 1. In theory we could run docker x86 images on ARM but those would be
>> emulated so way slower so it is probably better to support 'native'
>> CPUs) via multiarchitecture docker images [2].
>> BEAM-11704 Support Beam docker images on ARM64
>>
>> I could create the runners images from master, for the SDK containers
>> there are some issues with hardcoded paths [2] and virtualenv that
>> probably will be solved once we move to venv, and we will need to
>> upgrade our release process to include multiarch images (for user
>> friendliness).
>>
>> Also golang only supports officially ARM64 starting with version
>> 1.18.0 so we need to move up to that version.
>>
>> Anyway Beam is in a waaay better shape for ARM64 now than 1y ago when
>> I created the initial JIRAs.
>>
>> Ismaël
>>
>> [1] https://github.com/apache/beam/pull/16755
>> [2] https://issues.apache.org/jira/browse/BEAM-11704
>> [3]
>> https://github.com/apache/beam/blob/d1b8e569fd651975f08823a3db49dbee56d491b5/sdks/python/container/Dockerfile#L79
>>
>>
>>
>>> Could not find protoc-3.14.0-osx-aarch_64.exe
>> (com.google.protobuf:protoc:3.14.0).
>>  Searched in the following locations:
>>
>> https://jcenter.bintray.com/com/google/protobuf/protoc/3.14.0/protoc-3.14.0-osx-aarch_64.exe
>>
>>
>>
>>
>>
>> On Wed, Jan 12, 2022 at 9:53 PM Luke Cwik  wrote:
>> >
>> > The docker container running in an x86 based cloud machine should work
>> pretty well. This is what Apache Beam's Jenkins setup effectively does.
>> >
>> > No experience with developing on an ARM based CPU.
>> >
>> > On Wed, Jan 12, 2022 at 9:28 AM Jarek Potiuk  wrote:
>> >>
>> >> Comment from the side - If you use Docker - experience from Airflow -
>> >> until we will get ARM images, docker experience is next to unusable
>> >> (docker filesystem slowness + emulation).
>> >>
>> >> J.
>> >>
>> >> On Wed, Jan 12, 2022 at 6:21 PM Daniel Collins 
>> wrote:
>> >> >
>> >> > I regularly develop on a non-m1 mac using intellij, which mostly
>> works out of the box. Are you running into any particular issues building
>> or just looking for advice?
>> >> >
>> >> > -Daniel
>> >> >
>> >> > On Wed, Jan 12, 2022 at 12:16 PM Matt Rudary <
>> matt.rud...@twosigma.com> wrote:
>> >> >>
>> >> >> Does anyone do Beam development on an M1 Mac? Any tips to getting
>> things up and running?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Alternatively, does anyone have a good “workstation in the cloud”
>> setup?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks
>> >> >>
>> >> >> Matt
>>
>


Re: KafkaIO.write and Avro

2022-02-08 Thread Moritz Mack
Hi Matt,

Unfortunately, the types don’t play well when using KafkaAvroSerializer. It 
currently requires a cast :/
The following will work:
write.withValueSerializer((Class)KafkaAvroSerializer.class))

This seems to be the cause of repeated confusion, so probably worth improving 
the user experience here!

Cheers,
Moritz

From: Matt Casters 
Date: Tuesday, 8. February 2022 at 14:17
To: Beam Development List 
Subject: KafkaIO.write and Avro
Dear Beams, When sending Avro values to Kafka, say GenericRecord, you typically 
specify option value.serializer as 
"io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a bunch 
of other options for authentication ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Exercise caution when opening attachments or clicking any links.
ZjQcmQRYFpfptBannerEnd
Dear Beams,

When sending Avro values to Kafka, say GenericRecord, you typically specify 
option value.serializer as 
"io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a bunch 
of other options for authentication and so on verifies the schema stored in the 
Avro record with a schema registry.   Unfortunately, I couldn't figure out how 
to pass this serializer class to KafkaIO.write() as it's not acceptable to the 
withValueSerializer() method.

For KafkaIO.read() we made a specific provision in the form of class 
ConfluentSchemaRegistryDeserializer but it doesn't look like we covered the 
producer side of Avro values yet.

I'd be happy to dive into the code to add proper support for a Confluent schema 
registry in KafkaIO.write() but I was just wondering if there was something I 
might have overlooked.  It's hard to find samples or documentation on producing 
Avro messages with Beam.

Thanks in advance,

Matt


As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice (updated August 2020) at Talend, 
Inc. 




Re: Developing on an M1 Mac

2022-02-08 Thread Jarek Potiuk
Just for your information: Thanks to that change - i will soon be adding
ARM support for Apache Airflow - including building and publishing the
images and running our tests (using self-hosted runners).
As soon as I get it I will be able to share the code/experiences with you.

J

On Tue, Feb 8, 2022 at 2:50 PM Ismaël Mejía  wrote:

> For awareness with the just released Beam 2.36.0 Beam works out of the
> box to develop on a Mac M1.
>
> I tried Java and Python pipelines with success running locally on both
> Flink/Spark runner.
> I found one issue using zstd and created [1] that was merged today,
> with this the sdks:core tests and Spark runner tests fully pass.
>
> I would see 2.36.0 is the first good enough release for someone
> working on a Mac M1 or ARM64 processor.
>
> There are still some missing steps to have full ARM64 [apart of testing it
> :)]
>
> 1. In theory we could run docker x86 images on ARM but those would be
> emulated so way slower so it is probably better to support 'native'
> CPUs) via multiarchitecture docker images [2].
> BEAM-11704 Support Beam docker images on ARM64
>
> I could create the runners images from master, for the SDK containers
> there are some issues with hardcoded paths [2] and virtualenv that
> probably will be solved once we move to venv, and we will need to
> upgrade our release process to include multiarch images (for user
> friendliness).
>
> Also golang only supports officially ARM64 starting with version
> 1.18.0 so we need to move up to that version.
>
> Anyway Beam is in a waaay better shape for ARM64 now than 1y ago when
> I created the initial JIRAs.
>
> Ismaël
>
> [1] https://github.com/apache/beam/pull/16755
> [2] https://issues.apache.org/jira/browse/BEAM-11704
> [3]
> https://github.com/apache/beam/blob/d1b8e569fd651975f08823a3db49dbee56d491b5/sdks/python/container/Dockerfile#L79
>
>
>
>> Could not find protoc-3.14.0-osx-aarch_64.exe
> (com.google.protobuf:protoc:3.14.0).
>  Searched in the following locations:
>
> https://jcenter.bintray.com/com/google/protobuf/protoc/3.14.0/protoc-3.14.0-osx-aarch_64.exe
>
>
>
>
>
> On Wed, Jan 12, 2022 at 9:53 PM Luke Cwik  wrote:
> >
> > The docker container running in an x86 based cloud machine should work
> pretty well. This is what Apache Beam's Jenkins setup effectively does.
> >
> > No experience with developing on an ARM based CPU.
> >
> > On Wed, Jan 12, 2022 at 9:28 AM Jarek Potiuk  wrote:
> >>
> >> Comment from the side - If you use Docker - experience from Airflow -
> >> until we will get ARM images, docker experience is next to unusable
> >> (docker filesystem slowness + emulation).
> >>
> >> J.
> >>
> >> On Wed, Jan 12, 2022 at 6:21 PM Daniel Collins 
> wrote:
> >> >
> >> > I regularly develop on a non-m1 mac using intellij, which mostly
> works out of the box. Are you running into any particular issues building
> or just looking for advice?
> >> >
> >> > -Daniel
> >> >
> >> > On Wed, Jan 12, 2022 at 12:16 PM Matt Rudary <
> matt.rud...@twosigma.com> wrote:
> >> >>
> >> >> Does anyone do Beam development on an M1 Mac? Any tips to getting
> things up and running?
> >> >>
> >> >>
> >> >>
> >> >> Alternatively, does anyone have a good “workstation in the cloud”
> setup?
> >> >>
> >> >>
> >> >>
> >> >> Thanks
> >> >>
> >> >> Matt
>


Re: Developing on an M1 Mac

2022-02-08 Thread Ismaël Mejía
For awareness with the just released Beam 2.36.0 Beam works out of the
box to develop on a Mac M1.

I tried Java and Python pipelines with success running locally on both
Flink/Spark runner.
I found one issue using zstd and created [1] that was merged today,
with this the sdks:core tests and Spark runner tests fully pass.

I would see 2.36.0 is the first good enough release for someone
working on a Mac M1 or ARM64 processor.

There are still some missing steps to have full ARM64 [apart of testing it :)]

1. In theory we could run docker x86 images on ARM but those would be
emulated so way slower so it is probably better to support 'native'
CPUs) via multiarchitecture docker images [2].
BEAM-11704 Support Beam docker images on ARM64

I could create the runners images from master, for the SDK containers
there are some issues with hardcoded paths [2] and virtualenv that
probably will be solved once we move to venv, and we will need to
upgrade our release process to include multiarch images (for user
friendliness).

Also golang only supports officially ARM64 starting with version
1.18.0 so we need to move up to that version.

Anyway Beam is in a waaay better shape for ARM64 now than 1y ago when
I created the initial JIRAs.

Ismaël

[1] https://github.com/apache/beam/pull/16755
[2] https://issues.apache.org/jira/browse/BEAM-11704
[3] 
https://github.com/apache/beam/blob/d1b8e569fd651975f08823a3db49dbee56d491b5/sdks/python/container/Dockerfile#L79



   > Could not find protoc-3.14.0-osx-aarch_64.exe
(com.google.protobuf:protoc:3.14.0).
 Searched in the following locations:
 
https://jcenter.bintray.com/com/google/protobuf/protoc/3.14.0/protoc-3.14.0-osx-aarch_64.exe





On Wed, Jan 12, 2022 at 9:53 PM Luke Cwik  wrote:
>
> The docker container running in an x86 based cloud machine should work pretty 
> well. This is what Apache Beam's Jenkins setup effectively does.
>
> No experience with developing on an ARM based CPU.
>
> On Wed, Jan 12, 2022 at 9:28 AM Jarek Potiuk  wrote:
>>
>> Comment from the side - If you use Docker - experience from Airflow -
>> until we will get ARM images, docker experience is next to unusable
>> (docker filesystem slowness + emulation).
>>
>> J.
>>
>> On Wed, Jan 12, 2022 at 6:21 PM Daniel Collins  wrote:
>> >
>> > I regularly develop on a non-m1 mac using intellij, which mostly works out 
>> > of the box. Are you running into any particular issues building or just 
>> > looking for advice?
>> >
>> > -Daniel
>> >
>> > On Wed, Jan 12, 2022 at 12:16 PM Matt Rudary  
>> > wrote:
>> >>
>> >> Does anyone do Beam development on an M1 Mac? Any tips to getting things 
>> >> up and running?
>> >>
>> >>
>> >>
>> >> Alternatively, does anyone have a good “workstation in the cloud” setup?
>> >>
>> >>
>> >>
>> >> Thanks
>> >>
>> >> Matt


Re: [ANNOUNCE] Apache Beam 2.36.0 Release

2022-02-08 Thread Ismaël Mejía
Great work Emily and everyone!

I am glad to see that with the dependency updates this is the first
Beam release that works correctly out of the box on ARM64, I tried
some helloword examples on a Mac M1 with both Java and Python and it
works ok.

Ismaël




On Tue, Feb 8, 2022 at 9:49 AM Jarek Potiuk  wrote:
>
> Thanks a lot for that Emily!
>
> It's been a release we were waiting for at Apache Airflow.
> I believe It will unblock a number of "modernizations" in our pipeline - 
> Python 3.10, ARM support were quite a bit depending on it (mostly through 
> numpy transitive dependency limitation). Great to see this one out!
>
> J.
>
> On Tue, Feb 8, 2022 at 3:39 AM Emily Ye  wrote:
>>
>> The Apache Beam team is pleased to announce the release of version 2.36.0.
>>
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>>
>> You can download the release here:
>> https://beam.apache.org/get-started/downloads/
>>
>> This release includes bug fixes, features, and improvements detailed
>> on the Beam blog: https://beam.apache.org/blog/beam-2.36.0/
>>
>> Thank you to everyone who contributed to this release, and we hope you
>> enjoy using Beam 2.36.0
>>
>> - Emily, on behalf of the Apache Beam community.


KafkaIO.write and Avro

2022-02-08 Thread Matt Casters
Dear Beams,

When sending Avro values to Kafka, say GenericRecord, you typically specify
option value.serializer as
"io.confluent.kafka.serializers.KafkaAvroSerializer".  This along with a
bunch of other options for authentication and so on verifies the schema
stored in the Avro record with a schema registry.   Unfortunately, I
couldn't figure out how to pass this serializer class to KafkaIO.write() as
it's not acceptable to the withValueSerializer() method.

For KafkaIO.read() we made a specific provision in the form of
class ConfluentSchemaRegistryDeserializer but it doesn't look like we
covered the producer side of Avro values yet.

I'd be happy to dive into the code to add proper support for a Confluent
schema registry in KafkaIO.write() but I was just wondering if there was
something I might have overlooked.  It's hard to find samples or
documentation on producing Avro messages with Beam.

Thanks in advance,

Matt


Re: [ANNOUNCE] Apache Beam 2.36.0 Release

2022-02-08 Thread Jarek Potiuk
Thanks a lot for that Emily!

It's been a release we were waiting for at Apache Airflow.
I believe It will unblock a number of "modernizations" in our pipeline -
Python 3.10, ARM support were quite a bit depending on it (mostly through
numpy transitive dependency limitation). Great to see this one out!

J.

On Tue, Feb 8, 2022 at 3:39 AM Emily Ye  wrote:

> The Apache Beam team is pleased to announce the release of version 2.36.0.
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
> https://beam.apache.org/get-started/downloads/
>
> This release includes bug fixes, features, and improvements detailed
> on the Beam blog: https://beam.apache.org/blog/beam-2.36.0/
>
> Thank you to everyone who contributed to this release, and we hope you
> enjoy using Beam 2.36.0
>
> - Emily, on behalf of the Apache Beam community.
>