question regarding checkpoint timeout

2022-03-07 Thread Antonio Si
Hi Beam community, We have a simple pipeline which consumes data from a kafka source topic, runs some transformations, and outputs to a kafka output topic. When we have a high traffic, we observe that sometimes the checkpoint fails due to timeout (>10 min). We are using flinkrunner 1.12. We have

Re:

2021-11-22 Thread Antonio Si
riction)).shuffle().FlatMap(someSdfInstance.process) > > On Mon, Nov 22, 2021 at 5:28 PM Antonio Si wrote: > > > > Thanks Robert. Do you mind pointing me to the code that shuffled and > assigned to the workers? > > I trace through the code where it loops through the topic par

Re:

2021-11-22 Thread Antonio Si
; In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled > and assigned to the workers randomly, each of which gets processed by > a SplittableDoFn that emits the actual data on that topic/partition. > > On Mon, Nov 22, 2021 at 12:07 PM Antonio Si wrote: > > >

[no subject]

2021-11-22 Thread Antonio Si
Hello Beam community, I am experimenting with the new SDF KafkaIO in a bit more detail. I have a quick question. How are the topics and partitions got assigned to each Task Manager? Can someone point me to the code? Thanks in advance. Antonio.

Question regarding error raised in KafkaWriter processElement() and finishBundle()

2021-03-04 Thread Antonio Si
Hi all, I have a question regarding error being thrown in KafkaWriter.processElement(). Let's say my pipeline eventually reached KafkaWriter.processElement() twice. The first time is successful and the second time, for some reason is not successful and set numSendFailures to 1. After that

Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-07 Thread Antonio Si
Hi Jan, I create this jira: https://issues.apache.org/jira/browse/BEAM-11583 Thanks. Antonio. On 2021/01/07 08:43:34, Jan Lukavský wrote: > Hi Antonio, > > can you please create one? > > Thanks, > >  Jan > > On 1/6/21 10:31 PM, Antonio Si wrote: > >

Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-06 Thread Antonio Si
ave tests for 1.? > > > > Kenn > > > > On Tue, Jan 5, 2021 at 4:05 PM Boyuan Zhang wrote: > > > >> https://github.com/apache/beam/pull/13240 seems suspicious to me. > >> > >> +Maximilian Michels Any insights here? > >> > >

Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-05 Thread Antonio Si
Hi, I would like to followup with this question to see if there is a solution/workaround for this issue. Thanks. Antonio. On 2020/12/19 18:33:48, Antonio Si wrote: > Hi, > > We were using Beam v2.23 and recently, we are testing upgrade to Beam v2.26. > For Beam v2.26, we

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2021-01-05 Thread Antonio Si
an the total number of partitions? Say 5 times > higher? Would that have impact on throughput in your case? > > Jan > > On 12/23/20 7:03 PM, Antonio Si wrote: > > Hi Jan, > > > > The performance data that I reported was run with parallelism = 8. We also > >

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-23 Thread Antonio Si
ou use for your sources > >  b) how many partitions there is in your topic(s) > > Thanks, > >  Jan > > On 12/22/20 10:07 PM, Antonio Si wrote: > > Hi Boyuan, > > > > Let me clarify, I have tried with and without using > > --experiments=beam_

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-22 Thread Antonio Si
ader when resuming. So a single UnboundedSource could be mapped to > > multiple readers because of different instances of CheckpointMarl. That's > > also the reason why we use CheckpointMark as the restriction. > > > > Please let me know if I misunderstand your suggestio

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-21 Thread Antonio Si
ce regression you are noticing? Is it slower to output the same > amount of records? > > On Fri, Dec 11, 2020 at 1:31 PM Antonio Si wrote: > > > Hi Boyuan, > > > > This is Antonio. I reported the KafkaIO.read() performance issue on the > > slack channel a few day

Compatibility between Beam v2.23 and Beam v2.26

2020-12-21 Thread Antonio Si
Hi, We were using Beam v2.23 and recently, we are testing upgrade to Beam v2.26. For Beam v2.26, we are passing --experiments=use_deprecated_read and --fasterCopy=true. We run into this exception when we resume our pipeline: Caused by: java.io.InvalidClassException:

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-11 Thread Antonio Si
Hi Boyuan, This is Antonio. I reported the KafkaIO.read() performance issue on the slack channel a few days ago. I am not sure if this is helpful, but I have been doing some debugging on the SDK KafkaIO performance issue for our pipeline and I would like to provide some observations. It

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-11-23 Thread Antonio Si
Hi all, Our team recently did a similar experiment and came to a similar observation as what Teodor did. The Beam slack channel points me to email thread discussion. It seems like there is a jira issue created and Teodor had a PR. May I ask what is the decision on that and if the PR is