Hi Ross,

Thanks for the feedback!

For some "context," the change you mention was:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-478+-+Strongly+typed+Processor+API

In addition to the spec on that page, there are links to the
discussion and voting mailing list threads.

The primary motivation was to go from a prior state in which
there was absolutely no safety or checking possible
regarding which types a parent should pass to a child or a
child should accept from the parent. This was a practical
choice: there have been several subtle bugs in Streams DSL
features in the last several releases, in which some edge
case causes the feature's processors to forward a wrongly-
typed record downstream.

I was trying pretty hard actually _not_ to design a whole
new "V2" of the PAPI, since I really wanted to fix a design
flaw that had cost us hundreds of hours of effort, both as
maintainers and as users, and I wanted to fix it as fast as
possible. Sadly, even though I was trying to be as
incremental as possible, it still took a very long time to
get from proposal to implementation, and unfortunately, the
process of migrating the DSL's internal code is still not
complete. This seems to suggest that it was actually a good
choice not to bring in even more of a grand vision here, as
there would have been even more delays.

Ok, that's enough self-justification ;)


Your appraisal is correct. The new PAPI typing does not
handle the case of child nodes of different types, and if
that's your situation, you will indeed have to set the
outbound types to a supertype of all children's inbound
types (such as Object).

I like the spirit of your proposal, and I agree it would be
good to move in a direction of having good type safety for
your use case. If we can continue to evolve the API via
additions and deprecations, it will be less disruptive for
the overall community of users, but indeed for some kinds of
changes, a clean break is better.

It looks like your specific proposal could be achieved in
place, but it also seems like you have a pretty clear idea
of where you'd like to see the API go in the future.
Actually we have the 3.0 release coming up, so it would be a
favorable time for you to promote your vision.

To help everyone discuss both large and small proposals
unambiguously, we have a formal process for design reviews,
called KIPs:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

If you do want to move forward with a proposal, please feel
free to start drafting a KIP. If you need any help
navigating that process, please reach out to us on the dev
mailing list, or of course, we can continue to use this
thread.

I really appreciate your willingness to share your feedback
and ideas.

Thank you,
John



On Mon, 2021-03-22 at 14:46 +1100, Ross Black wrote:
> Hi,
> 
> I wanted to provide some feedback about the new API introduced in Kafka
> streaming 2.7 for adding a Processor to the Topology (using PAPI).
> 
> Prior to 2.7, the Processor was typed only by the input key values <K,V>,
> and the ProcessorContext was untyped.
> 
> In 2.7 ProcessorSupplier & Processor now have the additional params <...,
> KForward, VForward> and the ProcessorContext also has <KForward,
> VForward>.  The *only* use of this appears to be to provide some typing for
> the "forward" functions.
> One of my common use cases is where I have multiple child nodes each of
> which has different output <K,V>.  To migrate to the new 2.7 API, seems to
> require adding <.., Object, Object> parameters to every type which looks
> noisy in the code and does nothing for improving type safety of the API.
> 
> I would like to ask why the new type parameters were added when they do not
> actually achieve typing for the multiple child use-case?
> 
> Personally, I would prefer new methods on the ProcessorContext that allow
> me to create a "typed" forwarder per child node.
> 
> e.g.
> 
> public interface ProcessorContext {
> 
>     /** Default forwarder. */
>     <K, V> Forwarder<K, V> forwarder();
> 
>     /** Forwarder for a specific child. */
>     <K, V> Forwarder<K, V> forwarder(final String childName);
> 
> }
> 
> 
> 
> public interface Forwarder<K, V> {
>     void forward(Record<K, V> record);
> }
> 
> 
> With my suggested approach you at least get some typing per-topic (child
> node).  To me this also feels like it better represents the domain model of
> single stream input with multiple outputs.  Forwarder also feels like a
> real entity in the model (similar to Producer), rather than just being a
> method hacked into the context.
> 
> 
> To me, the new 2.7 API feels like a step backwards.  I also get the
> impression that the streaming API continues to break each release without a
> particular strong model / design behind it (particularly the sometimes
> jarring disconnect between PAPI and streaming API).
> Having learned from initial implementations, is it time to take a step back
> to design a "V2" streaming API which blends the existing PAPI + streaming
> API?
> I am curious what others think?
> 
> Thanks,
> Ross


Reply via email to