Well, I did not do a proper perf test. What I am saying is that my
observation is:
* Flink native job does use copy of inputs but looking at stack trace perf
snapshots, CPU is most time engaged by inflating bytes read from Kafka
* Running Beam pipeline on Flink, Coder copy trace pops up in top
Misread your post. You're saying that Kryo is more efficient that a
roundtrip obj->bytes->obj_copy. Still, most types use Flink's
serializers which also do the above roundtrip. So I'm not sure this
performance advantage holds true for other Flink jobs.
On 02.05.19 20:01, Maximilian Michels
On Fri, May 3, 2019 at 9:29 AM Viliam Durina wrote:
>
> > you MUST NOT mutate your inputs
> I think it's enough to not mutate the inputs after you emit them. From this
> follows that when you receive an input, the upstream vertex will not try to
> mutate it in parallel. This is what Hazelcast
> you MUST NOT mutate your inputs
I think it's enough to not mutate the inputs after you emit them. From this
follows that when you receive an input, the upstream vertex will not try to
mutate it in parallel. This is what Hazelcast Jet expects. We have no
option to automatically clone objects
I am not sure what are you referring to here. What do you mean Kryo is simply
slower ... Beam Kryo or Flink Kryo or?
Flink uses Kryo as a fallback serializer when its own type serialization
system can't analyze the type. I'm just guessing here that this could be
slower.
On 02.05.19 16:51,
On Thu, May 2, 2019 at 3:41 PM Maximilian Michels wrote:
> Thanks for the JIRA issues Jozef!
>
> > So the feature in Flink is operator chaining and Flink per default
> initiate copy of input elements. In case of Beam coders copy seems to be
> more noticable than native Flink.
>
> Copying between
Thanks for the JIRA issues Jozef!
So the feature in Flink is operator chaining and Flink per default initiate
copy of input elements. In case of Beam coders copy seems to be more noticable
than native Flink.
Copying between chained operators can be turned off in the
FlinkPipelineOptions
Thanks for filing those.
As for how not doing a copy is "safe," it's not really. Beam simply
asserts that you MUST NOT mutate your inputs (and direct runners,
which are used during testing, do perform extra copies and checks to
catch violations of this requirement).
On Thu, May 2, 2019 at 1:02
I have created
https://issues.apache.org/jira/browse/BEAM-7204
https://issues.apache.org/jira/browse/BEAM-7206
to track these topics further
On Wed, May 1, 2019 at 1:24 PM Jozef Vilcek wrote:
>
>
> On Tue, Apr 30, 2019 at 5:42 PM Kenneth Knowles wrote:
>
>>
>>
>> On Tue, Apr 30, 2019, 07:05
On Tue, Apr 30, 2019 at 5:42 PM Kenneth Knowles wrote:
>
>
> On Tue, Apr 30, 2019, 07:05 Reuven Lax wrote:
>
>> In that case, Robert's point is quite valid. The old Flink runner I
>> believe had no knowledge of fusion, which was known to make it extremely
>> slow. A lot of work went into making
All right, I can test it out if I can. How to deploy pipeline on Flink
portable runner? Should I follow this to be able to do it?
https://beam.apache.org/documentation/runners/flink/
On Tue, Apr 30, 2019 at 4:05 PM Reuven Lax wrote:
> In that case, Robert's point is quite valid. The old Flink
In that case, Robert's point is quite valid. The old Flink runner I believe
had no knowledge of fusion, which was known to make it extremely slow. A
lot of work went into making the portable runner fusion aware, so we don't
need to round trip through coders on every ParDo.
Reuven
On Tue, Apr 30,
It was not a portable Flink runner.
Thanks all for the thoughts, I will create JIRAs, as suggested, with my
findings and send them out
On Tue, Apr 30, 2019 at 11:34 AM Reuven Lax wrote:
> Jozef did you use the portable Flink runner or the old one?
>
> Reuven
>
> On Tue, Apr 30, 2019 at 1:03 AM
Jozef did you use the portable Flink runner or the old one?
Reuven
On Tue, Apr 30, 2019 at 1:03 AM Robert Bradshaw wrote:
> Thanks for starting this investigation. As mentioned, most of the work
> to date has been on feature parity, not performance parity, but we're
> at the point that the
Thanks for starting this investigation. As mentioned, most of the work
to date has been on feature parity, not performance parity, but we're
at the point that the latter should be tackled as well. Even if there
is a slight overhead (and there's talk about integrating more deeply
with the Flume DAG
Specifically, a lot of shared code assumes that repeatedly setting a timer
is nearly free / the same cost as determining whether or not to set the
timer. ReduceFnRunner has been refactored in a way so it would be very easy
to set the GC timer once per window that occurs in a bundle, but there's
I think the short answer is that folks working on the BeamFlink runner have
mostly been focused on getting everything working, and so have not dug into
this performance too deeply. I suspect that there is low-hanging fruit to
optimize as a result.
You're right that ReduceFnRunner schedules a
Hi Jozef,
Yes there is potential for overhead with running Beam pipelines on
different Runners. The Beam model has an execution framework which each
Runner utilizes in a slightly different way.
Timers in Flink, for example, are uniquely identified by a namespace and
a timestamp. In Beam,
18 matches
Mail list logo