Hi Matthias,
Sorry for long delay. Thanks for the comment. Good to know that the
specified issue is not worth to break the backwards-compatibility.
I fixed the KIP.
Cheers,
Jeyhun
On Fri, Jun 30, 2017 at 1:38 AM Matthias J. Sax
wrote:
> Hi Jeyhun,
>
> thanks for
Hi Jeyhun,
thanks for starting the VOTE thread. I did make one more pass over the
KIP before casting my vote and I saw that the KIP still contains
backward incompatible change introducing `ValueTransformerCommon`.
I think, that for this case, it is not worth breaking compatibility. We
should
I have no strong opinion, but it seems that at least InitializerWithKey
with be helpful if you want to have different start values for different
keys (even if I cannot come up with a use case example why one wanted to
do this...). Otherwise, there is just the "completeness" argument, that
is not
I'm not particularly concerning that we should NEVER break compatibility;
in fact if we think that is worthwhile (i.e. small impact with large
benefit) I think we can break compatibility as long as we have not removed
the compatibility annotations from Streams. All I was saying is that if we
Hi,
I introduced ValueTransformerCommon just to combine common methods of
ValueTransformer and ValueTransformerWithKey and avoid copy-paste.
I am aware of this issue and I agree that this needs users to compile the
code and therefore is not backwards compatible. When I saw this issue, I
thought
I agree with Guozhang's second point. This change does not seem backward
compatible.
As we don't have to support lambdas, it might be the easiest thing to
just extend the current interface:
> public interface ValueTransformerWithKey extends
> ValueTransformer
When plugging the
Jeyhun, Matthias:
Thanks for the explanation, I overlooked the repartition argument
previously.
1) Based on that argument I'm convinced of having ValueMapperWithKey /
ValueJoinerWithKey / ValueTransformerWithKey; though I'm still not
convinced with ReducerWithKey and InitializerWithKey since for
I guess I missunderstood you. Your are right that overloading the method
would also work. As both interfaces will be independent from each other,
there should be no problem.
The initial proposal was to use
> interface ValueMapperWithKey extends ValueMapper
and this would break Lambda. We
On Sun, Jun 4, 2017 at 8:41 PM, Matthias J. Sax
wrote:
> We started with this proposal but it does not allow for Lambdas (in case
> you want key access). Do you think preserving Lambdas is not important
> enough for this case -- I agree that there is some price to be paid
We started with this proposal but it does not allow for Lambdas (in case
you want key access). Do you think preserving Lambdas is not important
enough for this case -- I agree that there is some price to be paid to
keep Lambdas.
About API consistency: I can't remember a concrete user request to
With KIP-159, the added "valueMapperWithKey" and "valueTransformerWithKey"
along seem to be just adding a few syntax-sugars? E.g. we can simply
implement:
mapValues(ValueMapperWithKey mapperWithKey)
as
map((k, v) -> (k, mapperWithKey(k, v))
--
I'm not sure how many users
Thanks for clarification Matthias, now everything is clear.
On Sun, May 28, 2017 at 6:21 PM Matthias J. Sax
wrote:
> I don't think we can drop ValueTransformerSupplier. If you don't have an
> supplier, you only get a single instance of your function. But for a
> stateful
I don't think we can drop ValueTransformerSupplier. If you don't have an
supplier, you only get a single instance of your function. But for a
stateful transformation, we need multiple instances (one for each task)
of ValueTransformer.
We don't need suppliers for functions like "ValueMapper" etc
Hi,
I updated KIP.
Just to avoid misunderstanding, I meant deprecating ValueTransformerSupplier
and I am ok with ValueTransformer.
So instead of using ValueTransformerSupplier can't we directly use
ValueTransformer
or ValueTransformerWithKey?
Btw, in current design all features of
Thanks Jeyhun.
About ValueTransformer: I don't think we can remove it. Note,
ValueTransformer allows to attach a state and also allows to register
punctuations. Both those features will not be available via withKey()
interfaces.
-Matthias
On 5/27/17 1:25 PM, Jeyhun Karimov wrote:
> Hi
Hi Matthias,
Thanks for your comments.
I tested the deep copy approach. It has significant overhead. Especially
for "light" and stateless operators it slows down the topology
significantly (> 20% ). I think "warning" users about not-changing the key
is better warning them about possible
One more question:
Should we add any of
- InitizialierWithKey
- ReducerWithKey
- ValueTransformerWithKey
To get consistent/complete API, it might be a good idea. Any thoughts?
-Matthias
On 5/24/17 3:47 PM, Matthias J. Sax wrote:
> Jeyhun,
>
> I was just wondering if you did look into the
Jeyhun,
I was just wondering if you did look into the key-deep-copy idea we
discussed. I am curious to see what the impact might be.
-Matthias
On 5/20/17 2:03 AM, Jeyhun Karimov wrote:
> Hi,
>
> Thanks for your comments. I rethink about including rich functions into
> this KIP.
> I think once
Hi,
Thanks for your comments. I rethink about including rich functions into
this KIP.
I think once we include rich functions in this KIP and then fix
ProcessorContext in another KIP and incorporate with existing rich
functions, the code will not be backwards compatible.
I see Damian's and your
With the current KIP, using ValueMapper and ValueMapperWithKey
interfaces, RichFunction seems to be an independent add-on. To fix the
original issue to allow key access, RichFunctions are not required IMHO.
I initially put the RichFunction idea on the table, because I was hoping
to get a uniform
Hi Damian,
Thanks for your comments. I think providing to users *interface* rather
than *abstract class* should be preferred (Matthias also raised this issue
), anyway I changed the corresponding parts of KIP.
Regarding with passing additional contextual information, I think it is a
tradeoff,
1)
Hi,
I see you've removed the `ProcessorContext` from the RichFunction which is
good, but why is it a `RichFunction`? I'd have expected it to pass some
additional contextual information, i.e., the `RecordContext` that contains
just the topic, partition, timestamp, offset. I'm ok with it not
Hi,
Thanks. I initiated the PR as well, to get a better overview.
The only reason that I used abstract class instead of interface for Rich
functions is that in future if we will have some AbstractRichFunction
abstract classes,
we can easily extend:
public abstract class RichValueMapper
Thanks for the update and explanations. The KIP is quite improved now --
great job!
One more question: Why are RichValueMapper etc abstract classes and not
interfaces?
Overall, I like the KIP a lot!
-Matthias
On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
> Hi,
>
> Thanks for your comments.
>
Hi,
Thanks for your comments.
I think supporting Lambdas for `withKey` and `AbstractRichFunction`
> don't go together, as Lambdas are only supported for interfaces AFAIK.
Maybe I misunderstood your comment.
*withKey* and and *withOnlyValue* are interfaces. So they don't have direct
relation
Jeyhun,
thanks for the update.
I think supporting Lambdas for `withKey` and `AbstractRichFunction`
don't go together, as Lambdas are only supported for interfaces AFAIK.
Thus, if we want to support Lambdas for `withKey`, we need to have a
interface approach like this
- RichFunction -> only
Hi,
Sorry for super late response. Thanks for your comments.
I am not an expert on Lambdas. Can you elaborate a little bit? I cannot
> follow the explanation in the KIP to see what the problem is.
- From [1] says "A functional interface is an interface that has just one
abstract method, and
I'd rather not see the use of `ProcessorContext` spread any further than
it currently is. So maybe we need another KIP that is done before this?
Otherwise i think the scope of this KIP is becoming too large.
On Mon, 15 May 2017 at 18:06 Matthias J. Sax wrote:
> I agree
I agree that that `ProcessorContext` interface is too broad in general
-- this is even true for transform/process, and it's also reflected in
the API improvement list we want to do.
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Discussions
So I am wondering, if you question the
Thanks for the KIP.
I'm not convinced on the `RichFunction` approach. Do we really want to give
every DSL method access to the `ProcessorContext` ? It has a bunch of
methods on it that seem in-appropriate for some of the DSL methods, i.e,
`register`, `getStateStore`, `forward`, `schedule` etc. It
Jeyhun,
I am not an expert on Lambdas. Can you elaborate a little bit? I cannot
follow the explanation in the KIP to see what the problem is.
For updating the KIP title I don't know -- guess it's up to you. Maybe a
committer can comment on this?
Further comments:
- The KIP get a little hard
Hi,
Thanks for your comments. I think we cannot extend the two interfaces if we
want to keep lambdas. I updated the KIP [1]. Maybe I should change the
title, because now we are not limiting the KIP to only ValueMapper,
ValueTransformer and ValueJoiner.
Please feel free to comment.
[1]
If `ValueMapperWithKey` extends `ValueMapper` we don't need the new
overlaod.
And yes, we need to do one check -- but this happens when building the
topology. At runtime (I mean after KafkaStream#start() we don't need any
check as we will always use `ValueMapperWithKey`)
-Matthias
On 5/9/17
One correction:
and in runtime (inside processor) we still have to check it is ValueMapper
> or ValueMapperWithKey before wrapping it into the rich function.
this will be in compile time in API level.
Cheers,
Jeyhun
On Tue, May 9, 2017 at 11:55 AM Jeyhun Karimov
Hi,
Thanks for feedback.
Then we need to overload method
KStream mapValues(ValueMapper
mapper);
with
KStream mapValues(ValueMapperWithKey
mapper);
and in runtime (inside processor) we still have to check it is ValueMapper
or ValueMapperWithKey before wrapping it into the rich
+1 :)
On 08/05/17 23:52, Matthias J. Sax wrote:
Hi,
I was reading the updated KIP and I am wondering, if we should do the
design a little different.
Instead of distinguishing between a RichFunction and non-RichFunction at
runtime level, we would use RichFunctions all the time. Thus, on the
Hi,
I was reading the updated KIP and I am wondering, if we should do the
design a little different.
Instead of distinguishing between a RichFunction and non-RichFunction at
runtime level, we would use RichFunctions all the time. Thus, on the DSL
entry level, if a user provides a
Michal,
thanks a lot for this comment. I did not consider lambdas when proposing
RichFunctions. I thinks it very important to preserve the ability to use
lambdas!
@Jeyhun: I did not put any thought into this, but can we have a design
that allows for both? Also, with regard to lambdas, it might
Hi Jeyhun,
Thanks for your quick reply.
Indeed, I understand the existing ValueMapper/Joiner etc. have to remain
as-is for backwards compatibility.
I was just expressing my surprise that their proposed richer equivalents
weren't functional interfaces too.
Thanks,
MichaĆ
On 07/05/17
Hi Michal,
Thanks for your comments. We proposed the similar solution to yours in KIP
(please look at rejected alternatives). However, after the discussion in
mailing list I extended it to rich functions. Maybe we should keep them
both: simple and rich versions.
Cheers,
Jeyhun
On Sun, May 7,
Do I understanding correctly, that with the proposed pattern one could
not pass a lambda expression and access the context from within it?
TBH, I was hoping that for simple things this would be possible:
myStream.map( (key, value, ctx) -> new KeyValue<>(ctx.partition(), value) )
or (more to
Hi,
Thanks for comments. I extended PR and KIP to include rich functions. I
will still have to evaluate the cost of deep copying of keys.
Cheers,
Jeyhun
On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak
wrote:
> Hey Matthias,
>
> My opinion would be that documenting
Hey Matthias,
My opinion would be that documenting the immutability of the key is the
best approach available. Better than requiring the key to be serializable
(as with Jeyhun's last pass at the PR), no performance risk.
It'd be different if Java had immutable type constraints of some kind. :-)
Agreed about RichFunction. If we follow this path, it should cover
all(?) DSL interfaces.
About guarding the key -- I am still not sure what to do about it...
Maybe it might be enough to document it (and name the key parameter like
`readOnlyKey` to make is very clear). Ultimately, I would prefer
Hi Matthias,
I think extending KIP to include RichFunctions totally makes sense. So,
we don't want to guard the keys because it is costly.
If we introduce RichFunctions I think it should not be limited only the 3
interfaces proposed in KIP but to wide range of interfaces.
Please correct me if I
One follow up. There was this email on the user list:
http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+key+
It might make sense so include Initializer, Adder, and Substractor
inferface, too.
And we should double check if there
Thanks for updating the KIP.
Deep copying the key will work for sure, but I am actually a little bit
worried about performance impact... We might want to do some test to
quantify this impact.
Btw: this remind me about the idea of `RichFunction` interface that
would allow users to access record
Hi Mathieu,
Thanks for feedback. I followed similar approach and updated PR and KIP
accordingly. I tried to guard the key in Processors sending a copy of an
actual key.
Because I am doing deep copy of an object, I think memory can be bottleneck
in some use-cases.
Cheers,
Jeyhun
On Tue, May 2,
Hi Jeyhun,
This approach would change ValueMapper (...etc) to be classes, rather than
interfaces, which is also a backwards incompatible change. An alternative
approach that would be backwards compatible would be to define new
interfaces, and provide overrides where those interfaces are used.
Thanks for comments.
The concerns makes sense. Although we can guard for immutable keys in
current implementation (with few changes), I didn't consider backward
compatibility.
In this case 2 solutions come to my mind. In both cases, user accesses the
key in Object type, as passing extra type
Jeyhun,
thanks a lot for the KIP!
I think there are two issues we need to address:
(1) The KIP does not consider backward compatibility. Users did complain
about this in past releases already, and as the user base grows, we
should not break backward compatibility in future releases anymore.
Hi Jeyhun,
I just want to add my voice that, I too, have wished for access to the
record key during a mapValues or similar operation.
On the other hand, the number of compile failures that would need to be
fixed from this change is unfortunate. :-) But at least it would all be a
pretty clear
52 matches
Mail list logo