You can apply the same DoFn / Transform instance multiple times in the
graph or you can follow regular development practices where the common code
is factored into a method and two different DoFn's invoke it.
On Tue, Jun 23, 2020 at 4:28 PM Praveen K Viswanathan <
harish.prav...@gmail.com> wrote:
Hi Beamsters!
We have scheduled the last webinar on Beam Learning Month! Robert Crowe and
Reza Rokni from Google will share how Beam is used to process large scale
data for ML pipelines. If the subject is interesting to you, please
register via this link:
Hi Luke - Thanks for the explanation. The limitation due to directed graph
processing and the option of external storage clears most of the questions
I had with respect to designing this pipeline. I do have one more scenario
to clarify on this thread.
If I had a certain piece of logic that I had
Beam is really about parallelizing the processing. Using a single DoFn that
does everything is fine as long as the DoFn can process elements in
parallel (e.g. upstream source produces lots of elements). Composing
multiple DoFns is great for re-use and testing but it isn't strictly
necessary. Also,
Hi Alexey,
Thanks a lot for the information! I will give it a try.
Regarding the checkpoint intervals, I think the Flink community suggested
something between 3-5 minutes, I am not sure yet if the checkpoint interval
can be in milliseconds? Currently, our beam pipeline is stateless, there is
no
On Fri, Jun 12, 2020 at 12:52 AM TAREK ALSALEH
wrote:
> Hi,
>
> I am using the Python SDK with Dataflow as my runner. I am looking at
> implementing a streaming pipeline that will continuously monitor a GCS
> bucket for incoming files and depending on the regex of the file, launch a
> set of
> On 23 Jun 2020, at 07:49, Eleanore Jin wrote:
>
> the KafkaIO EOS-sink you referred is: KafkaExactlyOnceSink? If so, the way I
> understand to use is KafkaIO.write().withEOS(numPartitionsForSinkTopic,
> "mySlinkGroupId"), reading from your response, do I need additionally
> configure
Hi Max,
The error message shows up in the flink web UI, so I think it must be reaching
the cluster.
For the portable runner the error is listed in the docker container output as
well but I assume that is just receiving the error message from the flink
cluster.
Thanks,
Jesse
On 6/23/20,
Hey Jesse,
Could you share the context of the error? Where does it occur? In the
client code or on the cluster?
Cheers,
Max
On 22.06.20 18:01, Jesse Lord wrote:
> I am trying to run the wordcount quickstart example on a flink cluster
> on AWS EMR. Beam version 2.22, Flink 1.10.
>
>
>
> I
> Yes, I agree that serializing coders into the checkpoint creates problems.
> I'm wondering whether it is possible to serialize the coder URN + args
> instead.
I wish that was possible but Beam does not have a dedicated interface to
snapshot serializers. It is not possible to restore
I don't really know, my knowledge about Beam source code is not so
deep, but I managed to modify AvroCoder in order to store a string
containing class name (and class parameters in case it was a
parametrized class) instead of references to Class. But, as I said,
AvroCoder is using AVRO's
Yes, I agree that serializing coders into the checkpoint creates problems.
I'm wondering whether it is possible to serialize the coder URN + args
instead.
On Mon, Jun 22, 2020 at 11:00 PM Ivan San Jose
wrote:
> Hi again, just replying here in case this could be useful for someone
> as using
Hi again, just replying here in case this could be useful for someone
as using Flink checkpoints on Beam is not realiable at all right
now...
Even I removed class references to the serialized object in AvroCoder,
finally I couldn't make AvroCoder work as it is inferring schema using
ReflectData
13 matches
Mail list logo