Hi Beam community,
We have a simple pipeline which consumes data from a kafka source topic,
runs some transformations, and outputs to a kafka output topic. When we
have a high traffic, we observe that sometimes the checkpoint fails due to
timeout (>10 min). We are using flinkrunner 1.12. We have
riction)).shuffle().FlatMap(someSdfInstance.process)
>
> On Mon, Nov 22, 2021 at 5:28 PM Antonio Si wrote:
> >
> > Thanks Robert. Do you mind pointing me to the code that shuffled and
> assigned to the workers?
> > I trace through the code where it loops through the topic par
; In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled
> and assigned to the workers randomly, each of which gets processed by
> a SplittableDoFn that emits the actual data on that topic/partition.
>
> On Mon, Nov 22, 2021 at 12:07 PM Antonio Si wrote:
> >
>
Hello Beam community,
I am experimenting with the new SDF KafkaIO in a bit more detail. I have a
quick question. How are the topics and partitions got assigned to each Task
Manager?
Can someone point me to the code?
Thanks in advance.
Antonio.
Hi all,
I have a question regarding error being thrown in KafkaWriter.processElement().
Let's say my pipeline eventually reached KafkaWriter.processElement() twice.
The first time is successful and the second time, for some reason is not
successful and set numSendFailures to 1.
After that
Hi Jan,
I create this jira: https://issues.apache.org/jira/browse/BEAM-11583
Thanks.
Antonio.
On 2021/01/07 08:43:34, Jan Lukavský wrote:
> Hi Antonio,
>
> can you please create one?
>
> Thanks,
>
> Jan
>
> On 1/6/21 10:31 PM, Antonio Si wrote:
> >
ave tests for 1.?
> >
>
> Kenn
>
>
> > On Tue, Jan 5, 2021 at 4:05 PM Boyuan Zhang wrote:
> >
> >> https://github.com/apache/beam/pull/13240 seems suspicious to me.
> >>
> >> +Maximilian Michels Any insights here?
> >>
> >
Hi,
I would like to followup with this question to see if there is a
solution/workaround for this issue.
Thanks.
Antonio.
On 2020/12/19 18:33:48, Antonio Si wrote:
> Hi,
>
> We were using Beam v2.23 and recently, we are testing upgrade to Beam v2.26.
> For Beam v2.26, we
an the total number of partitions? Say 5 times
> higher? Would that have impact on throughput in your case?
>
> Jan
>
> On 12/23/20 7:03 PM, Antonio Si wrote:
> > Hi Jan,
> >
> > The performance data that I reported was run with parallelism = 8. We also
> >
ou use for your sources
>
> b) how many partitions there is in your topic(s)
>
> Thanks,
>
> Jan
>
> On 12/22/20 10:07 PM, Antonio Si wrote:
> > Hi Boyuan,
> >
> > Let me clarify, I have tried with and without using
> > --experiments=beam_
ader when resuming. So a single UnboundedSource could be mapped to
> > multiple readers because of different instances of CheckpointMarl. That's
> > also the reason why we use CheckpointMark as the restriction.
> >
> > Please let me know if I misunderstand your suggestio
ce regression you are noticing? Is it slower to output the same
> amount of records?
>
> On Fri, Dec 11, 2020 at 1:31 PM Antonio Si wrote:
>
> > Hi Boyuan,
> >
> > This is Antonio. I reported the KafkaIO.read() performance issue on the
> > slack channel a few day
Hi,
We were using Beam v2.23 and recently, we are testing upgrade to Beam v2.26.
For Beam v2.26, we are passing --experiments=use_deprecated_read and
--fasterCopy=true.
We run into this exception when we resume our pipeline:
Caused by: java.io.InvalidClassException:
Hi Boyuan,
This is Antonio. I reported the KafkaIO.read() performance issue on the slack
channel a few days ago.
I am not sure if this is helpful, but I have been doing some debugging on the
SDK KafkaIO performance issue for our pipeline and I would like to provide some
observations.
It
Hi all,
Our team recently did a similar experiment and came to a similar observation as
what Teodor did.
The Beam slack channel points me to email thread discussion.
It seems like there is a jira issue created and Teodor had a PR. May I ask what
is the decision on that and if the PR is
15 matches
Mail list logo