Hi Eleanore,

that's a good question. :)

There has been discussion about this in the PR of the mentioned Jira [1]. Generally, objectReuse enables Flink to hypothetically reuse instances of deserialized objects instead of creating new ones. But due to how Beam defines Coders it creates new instances nevertheless. See the discussion for details.

 Jan

[1] https://github.com/apache/beam/pull/13240#issuecomment-721635620

On 4/12/21 7:29 PM, Eleanore Jin wrote:
Hi Jan,

Thanks a lot for the reply! This helps, I wonder if you have any idea whats the difference between fasterCopy vs objectReuse option?

Eleanore

On Fri, Apr 9, 2021 at 11:53 AM Jan Lukavský <[email protected] <mailto:[email protected]>> wrote:

    Hi Eleanore,

    the --fasterCopy option disables clone between operators (see
    [1]). It
    should be safe to use it, unless your pipeline outputs an object and
    later modifies the same instance. This is generally not supported
    by the
    Beam model and is considered to be an user error. FlinkRunner
    historically chose a way of "better-safe-than-sorry" approach and
    explicitly cloned every received object between (non-shuffle)
    operators.
    Enabling this option should increase performance, you can verify your
    Pipeline is not doing any disallowed mutations using DirectRunner,
    which
    checks this by default (without --enforceImmutability=false).

      Jan

    [1] https://issues.apache.org/jira/browse/BEAM-11146
    <https://issues.apache.org/jira/browse/BEAM-11146>

    On 4/9/21 7:57 AM, Eleanore Jin wrote:
    > Hi community,
    >
    > I am upgrading from Beam 2.23.0 -> 2.28.0, and a new
    > FlinkPipelineOption is introduced: fasterCopy.
    >
    > Can you please help me understand what is the difference between
    the
    > option objectReuse vs fasterCopy?
    >
    > Thanks a lot!
    > Eleanore

Reply via email to