The design doc is here s.apache.org/a-new-dofn

Basically, it was changed to enable better flexibility. Using a method in a
type required all of the accessors to be in the ProcessContext interface --
for instance, accessing the window meant there was a window() method that
gave back a BoundedWindow result. But then your DoFn also needed to
implement RequiresWindowAccess to indicate that it needed the window. So,
there was a lot of potential for user error (accessing the window but not
declaring it crashes at runtime), down-casting the type of window could
fail, etc. It also didn't support anonymous classes and lambdas as well,
since it wasn't possibly to have an anonymous DoFn+RequiresWindowAccess.

With the new mechanism, all of these issues are fixed. If you don't acces
the window you write:
  @ProcessElement void processElement(ProcessContext context) { ...}
If you do access the window you write:
  @ProcessElement void processElement(ProcessContext context, FixedWindow
window) { ... }

During pipeline construction we can detect the parameter is used (getting
rid of RequiresWindowAccess). We allow you to request the type you wanted,
and can validate against the type of windows we know are there, preventing
runtime errors. It works in an anonymous class since it doesn't require
implementing multiple interfaces.

This becomes even more important as we introduced stateful and splittable
DoFn's. Both of these rely on annotations and the ability to add parameters
to the method.

On Wed, Dec 20, 2017 at 5:36 AM Joshua Fox <[email protected]> wrote:

> Can someone refer me to discussion of this design question?
>
>
> Why was processElement turned from an abstract method in Dataflow to an
> annotation in  Beam?
>
>
> <https://beam.apache.org/documentation/sdks/javadoc/0.5.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html>
> The JavaDocs for ProcessElement say:
> <https://beam.apache.org/documentation/sdks/javadoc/2.2.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html>
>
> A subclass of *DoFn* must have a method with this annotation. The
> signature of this method must satisfy the following constraints:
>
>
>
> - Its first argument must be a * DoFn.ProcessContext*.
>
> - If one of its arguments is a subtype of *RestrictionTracker*, then it
> is a splittable  ...
>
> - It must return  void
>
>
>
> Such a specification is exactly what  a type  declaration is meant to do
> --   it seems that a method, rather than an annotation, is just right for
> this purpose.
>
>
>
> --
>
>
> *JOSHUA FOX*
> Principal Software Architect | Freightos
>
>
>
> *T (Israel): *+972-545691165 <+972%2054-569-1165> | *T (US)*:  +
> 1-3123400953 <(312)%20340-0953>
> Smooth shipping.
>
>
>

Reply via email to