Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they /must not/ return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#datastream-transformations>, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:
I notice https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html#rules-for-pojo-types says that all non-transient fields need a setter.

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen


Reply via email to