[jira] [Updated] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2024-03-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-12317:

Fix Version/s: 3.7.0

> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Florin Akermann
>Priority: Major
>  Labels: kip
> Fix For: 3.7.0
>
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`{-}key (`null`-join-key 
> for stream-globalTable), because for a `null`{-}(join)key the join is 
> undefined: ie, we don't have an attribute the do the table lookup (we 
> consider the stream-record as malformed). Note, that we define the semantics 
> of _left/outer_ join as: keep the stream record if no matching join record 
> was found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left stream-table and 
> left/outer stream-stream join. User need to be aware (ie, we might need to 
> put this into the docs and JavaDocs), that records with `null`-key would be 
> partitioned randomly.
> KIP-962: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-962%3A+Relax+non-null+key+requirement+in+Kafka+Streams]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2023-08-09 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-12317:

Description: 
Currently, for a stream-streams and stream-table/globalTable join KafkaStreams 
drops all stream records with a `null`{-}key (`null`-join-key for 
stream-globalTable), because for a `null`{-}(join)key the join is undefined: 
ie, we don't have an attribute the do the table lookup (we consider the 
stream-record as malformed). Note, that we define the semantics of _left/outer_ 
join as: keep the stream record if no matching join record was found.

We could relax the definition of _left_ stream-table/globalTable and 
_left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
records, and call the ValueJoiner with a `null` "other-side" value instead: if 
the stream record key (or join-key) is `null`, we could treat is as "failed 
lookup" instead of treating the stream record as corrupted.

If we make this change, users that want to keep the current behavior, can add a 
`filter()` before the join to drop `null`-(join)key records from the stream 
explicitly.

Note that this change also requires to change the behavior if we insert a 
repartition topic before the join: currently, we drop `null`-key record before 
writing into the repartition topic (as we know they would be dropped later 
anyway). We need to relax this behavior for a left stream-table and left/outer 
stream-stream join. User need to be aware (ie, we might need to put this into 
the docs and JavaDocs), that records with `null`-key would be partitioned 
randomly.

KIP-962: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-962%3A+Relax+non-null+key+requirement+in+Kafka+Streams]
 

  was:
Currently, for a stream-streams and stream-table/globalTable join KafkaStreams 
drops all stream records with a `null`-key (`null`-join-key for 
stream-globalTable), because for a `null`-(join)key the join is undefined: ie, 
we don't have an attribute the do the table lookup (we consider the 
stream-record as malformed). Note, that we define the semantics of _left/outer_ 
join as: keep the stream record if no matching join record was found.

We could relax the definition of _left_ stream-table/globalTable and 
_left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
records, and call the ValueJoiner with a `null` "other-side" value instead: if 
the stream record key (or join-key) is `null`, we could treat is as "failed 
lookup" instead of treating the stream record as corrupted.

If we make this change, users that want to keep the current behavior, can add a 
`filter()` before the join to drop `null`-(join)key records from the stream 
explicitly.

Note that this change also requires to change the behavior if we insert a 
repartition topic before the join: currently, we drop `null`-key record before 
writing into the repartition topic (as we know they would be dropped later 
anyway). We need to relax this behavior for a left stream-table and left/outer 
stream-stream join. User need to be aware (ie, we might need to put this into 
the docs and JavaDocs), that records with `null`-key would be partitioned 
randomly.


> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Florin Akermann
>Priority: Major
>  Labels: kip
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`{-}key (`null`-join-key 
> for stream-globalTable), because for a `null`{-}(join)key the join is 
> undefined: ie, we don't have an attribute the do the table lookup (we 
> consider the stream-record as malformed). Note, that we define the semantics 
> of _left/outer_ join as: keep the stream record if no matching join record 
> was found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later 

[jira] [Updated] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2023-08-09 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-12317:

Labels: kip  (was: )

> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Florin Akermann
>Priority: Major
>  Labels: kip
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`-key (`null`-join-key for 
> stream-globalTable), because for a `null`-(join)key the join is undefined: 
> ie, we don't have an attribute the do the table lookup (we consider the 
> stream-record as malformed). Note, that we define the semantics of 
> _left/outer_ join as: keep the stream record if no matching join record was 
> found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left stream-table and 
> left/outer stream-stream join. User need to be aware (ie, we might need to 
> put this into the docs and JavaDocs), that records with `null`-key would be 
> partitioned randomly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2021-06-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-12317:

Description: 
Currently, for a stream-streams and stream-table/globalTable join KafkaStreams 
drops all stream records with a `null`-key (`null`-join-key for 
stream-globalTable), because for a `null`-(join)key the join is undefined: ie, 
we don't have an attribute the do the table lookup (we consider the 
stream-record as malformed). Note, that we define the semantics of _left/outer_ 
join as: keep the stream record if no matching join record was found.

We could relax the definition of _left_ stream-table/globalTable and 
_left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
records, and call the ValueJoiner with a `null` "other-side" value instead: if 
the stream record key (or join-key) is `null`, we could treat is as "failed 
lookup" instead of treating the stream record as corrupted.

If we make this change, users that want to keep the current behavior, can add a 
`filter()` before the join to drop `null`-(join)key records from the stream 
explicitly.

Note that this change also requires to change the behavior if we insert a 
repartition topic before the join: currently, we drop `null`-key record before 
writing into the repartition topic (as we know they would be dropped later 
anyway). We need to relax this behavior for a left stream-table and left/outer 
stream-stream join. User need to be aware (ie, we might need to put this into 
the docs and JavaDocs), that records with `null`-key would be partitioned 
randomly.

  was:
Currently, for a stream-streams and stream-table/globalTable join KafkaStreams 
drops all stream records with a null-key, because for a null-key the join is 
undefined: ie, we don't have an attribute the do the table lookup (we consider 
the stream-record as malformed). Note, that we define the semantics of _left_ 
join as: keep the stream record if no KTable record was found.

We could relax the definition of _left_ join though, and not drop non-key 
stream records, and call the ValueJoiner with a `null` table record instead: if 
the stream record key is `null`, we could treat is as "failed table lookup" 
instead of treating the stream record as corrupted.

If we make this change, users that want to keep the current behavior, can add a 
`filter()` before the join to drop `null`-key records from the stream 
explicitly.

 Note that this change also requires to change the behavior if we insert a 
repartition topic before the join: currently, we drop `null`-key record before 
writing into the repartition topic (as we know they would be dropped later 
anyway). We need to relax this behavior for a left/outer stream-table (and 
maybe left/outer 


> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`-key (`null`-join-key for 
> stream-globalTable), because for a `null`-(join)key the join is undefined: 
> ie, we don't have an attribute the do the table lookup (we consider the 
> stream-record as malformed). Note, that we define the semantics of 
> _left/outer_ join as: keep the stream record if no matching join record was 
> found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left stream-table and 
> left/outer stream-stream join. User need to be aware (ie, we might need to 
> put this into the docs and JavaDocs), that records with `null`-key would be 
> partitioned randomly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2021-06-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-12317:

Summary: Relax non-null key requirement for left/outer KStream joins  (was: 
Relax non-null key requirement for left KStream joins)

> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a null-key, because for a null-key 
> the join is undefined: ie, we don't have an attribute the do the table lookup 
> (we consider the stream-record as malformed). Note, that we define the 
> semantics of _left_ join as: keep the stream record if no KTable record was 
> found.
> We could relax the definition of _left_ join though, and not drop non-key 
> stream records, and call the ValueJoiner with a `null` table record instead: 
> if the stream record key is `null`, we could treat is as "failed table 
> lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-key records from the stream 
> explicitly.
>  Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left/outer stream-table 
> (and maybe left/outer 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)