Re: [Question] Best Practice of Handling Null Key for KafkaRecordCoder

2021-06-02 Thread Chamikara Jayalath
I think we should make NullableCoder a standard coder for Beam [1] and use a standard Nullablecoder(KeyCoder) for Kafka keys (where KeyCoder might be the standard ByteArrayCoder for example) I think we have compatible Java and Python NullableCoder implementations already so implementing this

Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-02 Thread Ke Wu
Very good point. We are actually talking about the same high level approach where Task Manager Pod has two containers inside running, one is task manager container while the other is worker pool service container. I believe the disconnect probably lies in how a job is being deployed/started.

Re: [Question] Best Practice of Handling Null Key for KafkaRecordCoder

2021-06-02 Thread Ahmet Altay
/cc folks who commented on the issue: @Robin Qiu @Chamikara Jayalath @Alexey Romanenko @Daniel Collins On Tue, Jun 1, 2021 at 2:03 PM Weiwen Xu wrote: > Hello, > > I'm working on [this issue]( > https://issues.apache.org/jira/browse/BEAM-12008) with Boyuan. She was > very helpful in

Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-02 Thread Kyle Weaver
> > Therefore, if we bring up the external worker pool container together with > the runner container, which is one the supported approach by Flink Runner > on K8s Exactly which approach are you talking about in the doc? I feel like there could be some misunderstanding here. Here is the

Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-02 Thread Ke Wu
I do agree that it usually takes longer for runner before tries to connect than external worker to become available, I suppose that is probably why we have the external service pool in the current way. However, I am not 100% confident to say it won’t happen in practice because the design does

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
> One thing that's been on the back burner for a long time is making CoderProperties into a CoderTester like Guava's EqualityTester. Reuven's point still applies here though. This issue is not due to a bug in SchemaCoder, it's a problem with the Row we gave SchemaCoder to encode. I'm assuming a

Re: Allyship workshops for open source contributors

2021-06-02 Thread Aizhamal Nurmamat kyzy
> > If we have a good number of people who express interest in this thread, I > will set up training for the Airflow community. > I meant Beam ^^' I am organizing it for the Airflow community as well.

Allyship workshops for open source contributors

2021-06-02 Thread Aizhamal Nurmamat kyzy
Hi Beamers, Would this community be interested in taking the Allyship Training? It requires a 90min commitment for remote session learning. If we have a good number of people who express interest in this thread, I will set up training for the Airflow community. If we don't have the critical mass,

Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-02 Thread Kyle Weaver
As far as I'm aware there's nothing strictly guaranteeing the worker pool has been started. But in practice it takes a while for the job to start up - the pipeline needs to be constructed, sent to the job server, translated, and then the runner needs to start the job, etc. before the external

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Kenneth Knowles
Mutability checking might catch that. I meant to suggest not putting the check in the pipeline, but offering a testing discipline that will catch such issues. One thing that's been on the back burner for a long time is making CoderProperties into a CoderTester like Guava's EqualityTester. Then it

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
Could the DirectRunner just do an equality check whenever it does an encode/decode? It sounds like it's already effectively performing a CoderProperties.coderDecodeEncodeEqual for every element, just omitting the equality check. On Wed, Jun 2, 2021 at 12:04 PM Reuven Lax wrote: > There is no

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Reuven Lax
There is no bug in the Coder itself, so that wouldn't catch it. We could insert CoderProperties.coderDecodeEncodeEqual in a subsequent ParDo, but if the Direct runner already does an encode/decode before that ParDo, then that would have fixed the problem before we could see it. On Wed, Jun 2,

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Kenneth Knowles
Would it be caught by CoderProperties? Kenn On Wed, Jun 2, 2021 at 8:16 AM Reuven Lax wrote: > I don't think this bug is schema specific - we created a Java object that > is inconsistent with its encoded form, which could happen to any transform. > > This does seem to be a gap in DirectRunner

Flaky test issue report (40)

2021-06-02 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake) These are P1 issues because they have a major negative impact on the community and make it hard to

P1 issues report (43)

2021-06-02 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake). See

Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-02 Thread Ke Wu
Hi Kyle, Thanks for reviewing https://github.com/apache/beam/pull/14923 . I would like to follow up with the deadline & waitForReady on ExternalEnvironment here. In Kubernetes, if my understanding is correct, there is no ordering support when

Re: KafkaIO SSL issue

2021-06-02 Thread Ilya Kozyrev
Hi Brain, We’re using consumerFactoryFn that reads certs from GCP and copying those to local FS on each Dataflow worker. Exception raised after consumerFactoryFn when Kafka tries to read certs from local fs using KeyStore.load(InputStream is, String pass). This code we using in

Re: Unsuscribe

2021-06-02 Thread Pasan Kamburugamuwa
Thank you On Wed, Jun 2, 2021 at 8:58 PM Evan Galpin wrote: > Hi there, > > You can unsubscribe by sending an empty email to > dev-unsubscr...@beam.apache.org and likewise > user-unsubscr...@beam.apache.org > > Thanks, > Evan > > On Wed, Jun 2, 2021 at 11:17 Pasan Kamburugamuwa < >

Re: Unsuscribe

2021-06-02 Thread Evan Galpin
Hi there, You can unsubscribe by sending an empty email to dev-unsubscr...@beam.apache.org and likewise user-unsubscr...@beam.apache.org Thanks, Evan On Wed, Jun 2, 2021 at 11:17 Pasan Kamburugamuwa < pasankamburugamu...@gmail.com> wrote: > Hi, > Please unsubscribe me > > Thank you >

portable runner - spark streaming support

2021-06-02 Thread Moshe Hoadley
Hi I would like to run beam written in python SDK on spark. I need to read from kafka, so, I need streaming functionality. Does the spark portable runner support streaming? If not, is there a roadmap for it? Thanks Moshe This email and the information contained herein is proprietary and

Unsuscribe

2021-06-02 Thread Pasan Kamburugamuwa
Hi, Please unsubscribe me Thank you

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Reuven Lax
I don't think this bug is schema specific - we created a Java object that is inconsistent with its encoded form, which could happen to any transform. This does seem to be a gap in DirectRunner testing though. It also makes it hard to test using PAssert, as I believe that puts everything in a side

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
+dev > I bet the DirectRunner is encoding and decoding in between, which fixes the object. Do we need better testing of schema-aware (and potentially other built-in) transforms in the face of fusion to root out issues like this? Brian On Wed, Jun 2, 2021 at 5:13 AM Matthew Ouyang wrote: > I

Re: Contributor permission for beam jira

2021-06-02 Thread Alexey Romanenko
Hi Johan, Done, welcome to Beam! — Alexey > On 1 Jun 2021, at 10:37, Johan Sternby wrote: > > Hi > I'm Johan Sternby working at Axis Communications. I would like to be added as > a contributor to the beam jira issue tracker in order to assign myself to a > ticket. > My jira Id is hoshimura

Re: New member(Assign access)

2021-06-02 Thread Alexey Romanenko
Hi Kanthi, Done. Welcome to Beam! — Alexey > On 1 Jun 2021, at 12:49, Kanthi Subramanian wrote: > > Hi, > I would like to start contributing to the project, I was looking at the > SpannerIO task, please add my username ‘subkanthi’ so I can start assigning > tasks. > > Thanks, > Kanthi.

Re: Contributor permission for Beam Jira tickets

2021-06-02 Thread Alexey Romanenko
Hi Varunkumar, I added you to Contributors list. Welcome to Beam! — Alexey > On 31 May 2021, at 07:35, Varunkumar Nagarajan wrote: > > Hi, > > This is Varunkumar Nagarajan from Arcesium India. I would like to add support > for S3 bucket keys. Can someone please add me as a contributor for