Hi Eleanore,
that's a good question. :)
There has been discussion about this in the PR of the mentioned Jira
[1]. Generally, objectReuse enables Flink to hypothetically reuse
instances of deserialized objects instead of creating new ones. But due
to how Beam defines Coders it creates new instances nevertheless. See
the discussion for details.
Jan
[1] https://github.com/apache/beam/pull/13240#issuecomment-721635620
On 4/12/21 7:29 PM, Eleanore Jin wrote:
Hi Jan,
Thanks a lot for the reply! This helps, I wonder if you have any idea
whats the difference between fasterCopy vs objectReuse option?
Eleanore
On Fri, Apr 9, 2021 at 11:53 AM Jan Lukavský <[email protected]
<mailto:[email protected]>> wrote:
Hi Eleanore,
the --fasterCopy option disables clone between operators (see
[1]). It
should be safe to use it, unless your pipeline outputs an object and
later modifies the same instance. This is generally not supported
by the
Beam model and is considered to be an user error. FlinkRunner
historically chose a way of "better-safe-than-sorry" approach and
explicitly cloned every received object between (non-shuffle)
operators.
Enabling this option should increase performance, you can verify your
Pipeline is not doing any disallowed mutations using DirectRunner,
which
checks this by default (without --enforceImmutability=false).
Jan
[1] https://issues.apache.org/jira/browse/BEAM-11146
<https://issues.apache.org/jira/browse/BEAM-11146>
On 4/9/21 7:57 AM, Eleanore Jin wrote:
> Hi community,
>
> I am upgrading from Beam 2.23.0 -> 2.28.0, and a new
> FlinkPipelineOption is introduced: fasterCopy.
>
> Can you please help me understand what is the difference between
the
> option objectReuse vs fasterCopy?
>
> Thanks a lot!
> Eleanore