Hi experts,
My application is using Apache Beam and with Flink to be the runner. My
source and sink are kafka topics, and I am using KafkaIO connector provided
by Apache Beam to consume and publish.
I am reading through Beam's java doc:
For bounded data, each bundle becomes a file:
https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L356
Kenn
On Mon, Mar 2, 2020 at 6:18 PM Kyle Weaver wrote:
> As Luke and Robert indicated, unsetting
As Luke and Robert indicated, unsetting num shards _may_ cause the runner
to optimize it automatically.
For example, the Flink [1] and Dataflow [2] runners override num shards.
However, in the Spark runner, I don't see any such override. So I have two
questions:
1. Does the Spark runner override
Hi Tobias,
You are right, for this use case where Kafka commits the offset and you
do not use any other stateful sources/operators you will get exactly-once.
Should you have stateful operators that may not hold true anymore. Only
a checkpoint constructs a consistent view of the pipeline.
SplittableDoFn has experimental support within Dataflow so the way you may
be using it could be correct but unsupported.
Can you provide snippets/details of your splittable dofn implementation?
On Mon, Mar 2, 2020 at 11:50 AM Kjetil Halvorsen <
kjetil.halvor...@cognite.com> wrote:
>
> Hi,
>
> I
Hi,
I am looking for pointers to a Dataflow runner error message: Workflow
failed. Causes: Step s22 has conflicting bucketing functions,
This happens at the very startup of the job execution, and I am unable to
find any pointer as to where in the code/job definition the origin of the
conflict
Hi Tobi,
That makes sense to me. My argument was coming from having
"exactly-once" semantics for a pipeline. In this regard, the stop
functionality does not help. But I think having the option to gracefully
shut down a pipeline is beneficial for other uses cases like the ones
you described.
Good morning Max and thanks for clarifying!
I generated the JAR 2.19.0 in the second test via the default demo code
from Beam. There were no further adjustments from my side, but as I can see
there are some open points in JIRA for 1.9.2, so for now I think that we
can focus on 1.9.1 as a target.