Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-25 Thread Damian Guy
On Thu, 24 Aug 2017 at 18:31 Xavier Léauté  wrote:

> A few comments on the KIP:
> - I'm a bit confused about the BytesStoreSupplier interface. Nothing in its
> definition is really specific to Bytes, and
> when I see return types like BytesStoreSupplier byte[]>>, it seems redundant to have "Bytes" in the supplier name.
> Why can't we reuse the existing StateStoreSupplier interface and move the
> extra logConfig and loggingEnabled methods elsewhere?
>

We can't re-use StateStoreSupplier as it would break compatibility. So we
needed another name.


> - I don't really see any mention of the motivation behind the Materialized
> interface and what the implications are for the user, i.e. what does it
> mean for a store to be materialized.
>

It means that there will be a store either created for you with the store
name provided or with the BytesStoreSupplier. It provides a convenient way
for users to enable/disable caching and logging on a per store basis. And
helps to reduce the current overloads spread throughout the code.


> - Until now, serialization implementation details were decoupled from the
> state store interfaces. With this KIP we are now bubbling up the
> assumptions that state store going to be using Bytes or byte[] into the
> API.

I'm not a fan of this, because it precludes us from providing more
> efficient implementations, e.g. using ByteBuffers, that can avoid costly
> byte array copying and extraneous byte array allocations during serde
> operations.
> A better approach might be to provide a first class ByteStore interface
> that could help abstract the different types of buffers we might want to
> use, or alternatively use a buffer agnostic type in the state store
> definition (similar to what LMDB
> <
> https://github.com/lmdbjava/lmdbjava/blob/master/src/main/java/org/lmdbjava/BufferProxy.java
> >
>  does)
>
>
We decided to do it this way as we want to provide developers with the
ability to use our wrapper stores, i.e, ChangeLogging, Caching, Metered.
Presently they can't do this without jumping through various hoops.
To provide this ability we presently need to use  as the
CachingStores are . They need to remain that way for the
time being as it is how we can put some limits on memory usage.


> On Thu, Aug 24, 2017 at 1:53 AM Damian Guy  wrote:
>
> > I've updated the kip to reflect Bill's comment and also to make
> > StreamBuilder methods have topic as the first param, i.e.,
> > StreamBuilder#stream no longer accepts varargs.
> >
> > On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:
> >
> > > On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:
> > >
> > >> I have a couple of comments but otherwise it LGTM:
> > >>
> > >> 1. For these two functions in StreamsBuilder, the topic String is set
> as
> > >> the second parameter in between of two options. Would that be better
> to
> > be
> > >> set as the first or the last one instead?
> > >>
> > >> It would be better as the first, but then it is different to the
> > > #streams() methods due to varargs.
> > >
> > >
> > >> public synchronized  KTable table(final Consumed
> > >> consumed, final String topic, final Materialized materialized)
> > >>
> > >> public synchronized  GlobalKTable globalTable(final
> > >> Consumed > >> V> consumed, final String topic, final Materialized
> materialized)
> > >>
> > >> I understand that we cannot do it for the first parameter because of
> the
> > >> vararg type. So I'd suggest either
> > >>
> > >> a) set it as the last parameter, but then it is inconsistent with
> other
> > >> functions like these:
> > >>
> > >> void to(final String topic, final Produced options);
> > >>
> > >> KTable through(final String topic, final Materialized
> > >> options);
> > >>
> > >> b) only allow one single topic name parameter in
> StreamsBuilder.stream()
> > >> since in practice we do not see too many usages of multiple topics,
> plus
> > >> it
> > >> can be semi-supported with "merge" as we move it from StreamsBuilder
> to
> > >> KStream (KAFKA-5765),
> > >>
> > >> Perhaps this is the better approach
> > >
> > >
> > >> 2. KGroupedStream's function:
> > >>
> > >>  KTable aggregate(final Initializer initializer,
> > >>  final Aggregator VR>
> > >> aggregator,
> > >>  final Serde aggValueSerde,
> > >>  final Materialized KeyValueStore > >> VR>> materialized);
> > >>
> > >> The "aggValueSerde" seems not needed?
> > >>
> > >> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think
> KGroupedStream
> > >> was
> > >> a bad name as a hind-sight. I personally feel we should just correct
> it
> > >> with a new class and deprecate / remove the old one before 1.0.0, but
> > that
> > >> could be in its own KIP.
> > >>
> > >>
> > > The 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-25 Thread Damian Guy
Matthias, i agree so i've added those two overloads.
Thanks,
Damian

On Thu, 24 Aug 2017 at 21:54 Matthias J. Sax  wrote:

> Thanks for clarification. I see your point. Java varargs are problematic
> in general IMHO as they force you to put them as last argument making
> parameter ordering unnatural for some cases (as we have it currently in
> the API).
>
> Nevertheless, I think that reading a single topic is the most common
> case and thus I would love to see the overloads as mentioned in my last
> email in addition to the overloads taking a Collection of topics. Maybe
> it's just personal taste -- I agree that the overhead of specifying a
> singleton on not severe, but to me it still feels like a "step backward"
> as reading a single topic should be the pattern for like 90% or more of
> the cases.
>
>
> -Matthias
>
>
> On 8/24/17 12:03 PM, Guozhang Wang wrote:
> > Matthias,
> >
> > I think it's my bad that I did not post another comment on the mailing
> list
> > while syncing with Damian. Here it is:
> >
> > Regarding 1) above, a second thought on varargs: though I have not heard
> > from anyone using multiple topics, it is also true that people will just
> > keep silent until their APIs gets removed. So instead of keeping a single
> > topic name in the constructor, it'd better to still allow users to pass
> > multiple topics, as a Collection topic.
> >
> > It does mean that users who would only want a single topic would feel
> > inconvenient with "Collections.singleton(topic)", but I felt it is not
> too
> > big of an issue. On the other hand KafkaConsumer also only allow
> > `subscribe(Collection topics)` so I'd suggest in this KIP we do
> not
> > have two overloads of "stream(topic)" and "stream(topics)" and consider
> > adding that as a syntax-sugar if it does become a big complaint.
> >
> >
> > Guozhang
> >
> >
> >
> > On Thu, Aug 24, 2017 at 11:32 AM, Matthias J. Sax  >
> > wrote:
> >
> >> We now have
> >>
> >>> public synchronized  KStream stream(final
> Collection
> >> topic, final Consumed options)
> >>
> >> This would prevent so write code like
> >>
> >> builder.stream("topic", Consumers.with(...));
> >>
> >> I think, we need methods
> >>
> >> StreamsBuilder#stream(String topic);
> >> StreamsBuilder#stream(String topic, Consumed options);
> >>
> >> Or do I miss anything?
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 8/24/17 1:53 AM, Damian Guy wrote:
> >>> I've updated the kip to reflect Bill's comment and also to make
> >>> StreamBuilder methods have topic as the first param, i.e.,
> >>> StreamBuilder#stream no longer accepts varargs.
> >>>
> >>> On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:
> >>>
>  On Thu, 24 Aug 2017 at 02:49 Guozhang Wang 
> wrote:
> 
> > I have a couple of comments but otherwise it LGTM:
> >
> > 1. For these two functions in StreamsBuilder, the topic String is set
> >> as
> > the second parameter in between of two options. Would that be better
> >> to be
> > set as the first or the last one instead?
> >
> > It would be better as the first, but then it is different to the
>  #streams() methods due to varargs.
> 
> 
> > public synchronized  KTable table(final Consumed
> > consumed, final String topic, final Materialized materialized)
> >
> > public synchronized  GlobalKTable globalTable(final
> > Consumed > V> consumed, final String topic, final Materialized
> materialized)
> >
> > I understand that we cannot do it for the first parameter because of
> >> the
> > vararg type. So I'd suggest either
> >
> > a) set it as the last parameter, but then it is inconsistent with
> other
> > functions like these:
> >
> > void to(final String topic, final Produced options);
> >
> > KTable through(final String topic, final Materialized
> > options);
> >
> > b) only allow one single topic name parameter in
> >> StreamsBuilder.stream()
> > since in practice we do not see too many usages of multiple topics,
> >> plus
> > it
> > can be semi-supported with "merge" as we move it from StreamsBuilder
> to
> > KStream (KAFKA-5765),
> >
> > Perhaps this is the better approach
> 
> 
> > 2. KGroupedStream's function:
> >
> >  KTable aggregate(final Initializer initializer,
> >  final Aggregator VR>
> > aggregator,
> >  final Serde aggValueSerde,
> >  final Materialized KeyValueStore > VR>> materialized);
> >
> > The "aggValueSerde" seems not needed?
> >
> > 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think
> KGroupedStream
> > was
> > a bad name as a hind-sight. I personally feel we should just correct
> it
> 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-24 Thread Matthias J. Sax
Thanks for clarification. I see your point. Java varargs are problematic
in general IMHO as they force you to put them as last argument making
parameter ordering unnatural for some cases (as we have it currently in
the API).

Nevertheless, I think that reading a single topic is the most common
case and thus I would love to see the overloads as mentioned in my last
email in addition to the overloads taking a Collection of topics. Maybe
it's just personal taste -- I agree that the overhead of specifying a
singleton on not severe, but to me it still feels like a "step backward"
as reading a single topic should be the pattern for like 90% or more of
the cases.


-Matthias


On 8/24/17 12:03 PM, Guozhang Wang wrote:
> Matthias,
> 
> I think it's my bad that I did not post another comment on the mailing list
> while syncing with Damian. Here it is:
> 
> Regarding 1) above, a second thought on varargs: though I have not heard
> from anyone using multiple topics, it is also true that people will just
> keep silent until their APIs gets removed. So instead of keeping a single
> topic name in the constructor, it'd better to still allow users to pass
> multiple topics, as a Collection topic.
> 
> It does mean that users who would only want a single topic would feel
> inconvenient with "Collections.singleton(topic)", but I felt it is not too
> big of an issue. On the other hand KafkaConsumer also only allow
> `subscribe(Collection topics)` so I'd suggest in this KIP we do not
> have two overloads of "stream(topic)" and "stream(topics)" and consider
> adding that as a syntax-sugar if it does become a big complaint.
> 
> 
> Guozhang
> 
> 
> 
> On Thu, Aug 24, 2017 at 11:32 AM, Matthias J. Sax 
> wrote:
> 
>> We now have
>>
>>> public synchronized  KStream stream(final Collection
>> topic, final Consumed options)
>>
>> This would prevent so write code like
>>
>> builder.stream("topic", Consumers.with(...));
>>
>> I think, we need methods
>>
>> StreamsBuilder#stream(String topic);
>> StreamsBuilder#stream(String topic, Consumed options);
>>
>> Or do I miss anything?
>>
>>
>> -Matthias
>>
>>
>> On 8/24/17 1:53 AM, Damian Guy wrote:
>>> I've updated the kip to reflect Bill's comment and also to make
>>> StreamBuilder methods have topic as the first param, i.e.,
>>> StreamBuilder#stream no longer accepts varargs.
>>>
>>> On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:
>>>
 On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:

> I have a couple of comments but otherwise it LGTM:
>
> 1. For these two functions in StreamsBuilder, the topic String is set
>> as
> the second parameter in between of two options. Would that be better
>> to be
> set as the first or the last one instead?
>
> It would be better as the first, but then it is different to the
 #streams() methods due to varargs.


> public synchronized  KTable table(final Consumed
> consumed, final String topic, final Materialized materialized)
>
> public synchronized  GlobalKTable globalTable(final
> Consumed V> consumed, final String topic, final Materialized materialized)
>
> I understand that we cannot do it for the first parameter because of
>> the
> vararg type. So I'd suggest either
>
> a) set it as the last parameter, but then it is inconsistent with other
> functions like these:
>
> void to(final String topic, final Produced options);
>
> KTable through(final String topic, final Materialized
> options);
>
> b) only allow one single topic name parameter in
>> StreamsBuilder.stream()
> since in practice we do not see too many usages of multiple topics,
>> plus
> it
> can be semi-supported with "merge" as we move it from StreamsBuilder to
> KStream (KAFKA-5765),
>
> Perhaps this is the better approach


> 2. KGroupedStream's function:
>
>  KTable aggregate(final Initializer initializer,
>  final Aggregator
> aggregator,
>  final Serde aggValueSerde,
>  final Materialized VR>> materialized);
>
> The "aggValueSerde" seems not needed?
>
> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream
> was
> a bad name as a hind-sight. I personally feel we should just correct it
> with a new class and deprecate / remove the old one before 1.0.0, but
>> that
> could be in its own KIP.
>
>
 The problem with this is that we'd need to add new `groupBy` and
 `groupByKey` methods that return `GroupedKStream`, we can't change the
 existing ones as that would break compatibility. So what would we name
 these methods?


>
> Guozhang
>
>
>
> 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-24 Thread Guozhang Wang
Matthias,

I think it's my bad that I did not post another comment on the mailing list
while syncing with Damian. Here it is:

Regarding 1) above, a second thought on varargs: though I have not heard
from anyone using multiple topics, it is also true that people will just
keep silent until their APIs gets removed. So instead of keeping a single
topic name in the constructor, it'd better to still allow users to pass
multiple topics, as a Collection topic.

It does mean that users who would only want a single topic would feel
inconvenient with "Collections.singleton(topic)", but I felt it is not too
big of an issue. On the other hand KafkaConsumer also only allow
`subscribe(Collection topics)` so I'd suggest in this KIP we do not
have two overloads of "stream(topic)" and "stream(topics)" and consider
adding that as a syntax-sugar if it does become a big complaint.


Guozhang



On Thu, Aug 24, 2017 at 11:32 AM, Matthias J. Sax 
wrote:

> We now have
>
> > public synchronized  KStream stream(final Collection
> topic, final Consumed options)
>
> This would prevent so write code like
>
> builder.stream("topic", Consumers.with(...));
>
> I think, we need methods
>
> StreamsBuilder#stream(String topic);
> StreamsBuilder#stream(String topic, Consumed options);
>
> Or do I miss anything?
>
>
> -Matthias
>
>
> On 8/24/17 1:53 AM, Damian Guy wrote:
> > I've updated the kip to reflect Bill's comment and also to make
> > StreamBuilder methods have topic as the first param, i.e.,
> > StreamBuilder#stream no longer accepts varargs.
> >
> > On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:
> >
> >> On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:
> >>
> >>> I have a couple of comments but otherwise it LGTM:
> >>>
> >>> 1. For these two functions in StreamsBuilder, the topic String is set
> as
> >>> the second parameter in between of two options. Would that be better
> to be
> >>> set as the first or the last one instead?
> >>>
> >>> It would be better as the first, but then it is different to the
> >> #streams() methods due to varargs.
> >>
> >>
> >>> public synchronized  KTable table(final Consumed
> >>> consumed, final String topic, final Materialized materialized)
> >>>
> >>> public synchronized  GlobalKTable globalTable(final
> >>> Consumed >>> V> consumed, final String topic, final Materialized materialized)
> >>>
> >>> I understand that we cannot do it for the first parameter because of
> the
> >>> vararg type. So I'd suggest either
> >>>
> >>> a) set it as the last parameter, but then it is inconsistent with other
> >>> functions like these:
> >>>
> >>> void to(final String topic, final Produced options);
> >>>
> >>> KTable through(final String topic, final Materialized
> >>> options);
> >>>
> >>> b) only allow one single topic name parameter in
> StreamsBuilder.stream()
> >>> since in practice we do not see too many usages of multiple topics,
> plus
> >>> it
> >>> can be semi-supported with "merge" as we move it from StreamsBuilder to
> >>> KStream (KAFKA-5765),
> >>>
> >>> Perhaps this is the better approach
> >>
> >>
> >>> 2. KGroupedStream's function:
> >>>
> >>>  KTable aggregate(final Initializer initializer,
> >>>  final Aggregator
> >>> aggregator,
> >>>  final Serde aggValueSerde,
> >>>  final Materialized >>> VR>> materialized);
> >>>
> >>> The "aggValueSerde" seems not needed?
> >>>
> >>> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream
> >>> was
> >>> a bad name as a hind-sight. I personally feel we should just correct it
> >>> with a new class and deprecate / remove the old one before 1.0.0, but
> that
> >>> could be in its own KIP.
> >>>
> >>>
> >> The problem with this is that we'd need to add new `groupBy` and
> >> `groupByKey` methods that return `GroupedKStream`, we can't change the
> >> existing ones as that would break compatibility. So what would we name
> >> these methods?
> >>
> >>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>>
> >>> On Wed, Aug 23, 2017 at 1:01 PM, Damian Guy 
> wrote:
> >>>
>  We already have GlobalKTable and i can't rename KGroupedStream, which
>  really should be GroupedKStream. So I think we should name new things
>  correctly, i.e., WindowedKStream etc and fix the others when we can.
> 
>  On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax 
>  wrote:
> 
> > About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
> > convention consistent? And if we change the naming schema just change
> > all at once? I personally don't care which naming scheme is better,
> >>> but
> > I think consistency is super important!
> >
> > About Bill's comment: I agree, and had a similar thought.
> >
> >
> 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-24 Thread Matthias J. Sax
We now have

> public synchronized  KStream stream(final Collection 
> topic, final Consumed options)

This would prevent so write code like

builder.stream("topic", Consumers.with(...));

I think, we need methods

StreamsBuilder#stream(String topic);
StreamsBuilder#stream(String topic, Consumed options);

Or do I miss anything?


-Matthias


On 8/24/17 1:53 AM, Damian Guy wrote:
> I've updated the kip to reflect Bill's comment and also to make
> StreamBuilder methods have topic as the first param, i.e.,
> StreamBuilder#stream no longer accepts varargs.
> 
> On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:
> 
>> On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:
>>
>>> I have a couple of comments but otherwise it LGTM:
>>>
>>> 1. For these two functions in StreamsBuilder, the topic String is set as
>>> the second parameter in between of two options. Would that be better to be
>>> set as the first or the last one instead?
>>>
>>> It would be better as the first, but then it is different to the
>> #streams() methods due to varargs.
>>
>>
>>> public synchronized  KTable table(final Consumed
>>> consumed, final String topic, final Materialized materialized)
>>>
>>> public synchronized  GlobalKTable globalTable(final
>>> Consumed>> V> consumed, final String topic, final Materialized materialized)
>>>
>>> I understand that we cannot do it for the first parameter because of the
>>> vararg type. So I'd suggest either
>>>
>>> a) set it as the last parameter, but then it is inconsistent with other
>>> functions like these:
>>>
>>> void to(final String topic, final Produced options);
>>>
>>> KTable through(final String topic, final Materialized
>>> options);
>>>
>>> b) only allow one single topic name parameter in StreamsBuilder.stream()
>>> since in practice we do not see too many usages of multiple topics, plus
>>> it
>>> can be semi-supported with "merge" as we move it from StreamsBuilder to
>>> KStream (KAFKA-5765),
>>>
>>> Perhaps this is the better approach
>>
>>
>>> 2. KGroupedStream's function:
>>>
>>>  KTable aggregate(final Initializer initializer,
>>>  final Aggregator
>>> aggregator,
>>>  final Serde aggValueSerde,
>>>  final Materialized>> VR>> materialized);
>>>
>>> The "aggValueSerde" seems not needed?
>>>
>>> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream
>>> was
>>> a bad name as a hind-sight. I personally feel we should just correct it
>>> with a new class and deprecate / remove the old one before 1.0.0, but that
>>> could be in its own KIP.
>>>
>>>
>> The problem with this is that we'd need to add new `groupBy` and
>> `groupByKey` methods that return `GroupedKStream`, we can't change the
>> existing ones as that would break compatibility. So what would we name
>> these methods?
>>
>>
>>>
>>> Guozhang
>>>
>>>
>>>
>>> On Wed, Aug 23, 2017 at 1:01 PM, Damian Guy  wrote:
>>>
 We already have GlobalKTable and i can't rename KGroupedStream, which
 really should be GroupedKStream. So I think we should name new things
 correctly, i.e., WindowedKStream etc and fix the others when we can.

 On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax 
 wrote:

> About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
> convention consistent? And if we change the naming schema just change
> all at once? I personally don't care which naming scheme is better,
>>> but
> I think consistency is super important!
>
> About Bill's comment: I agree, and had a similar thought.
>
>
> -Matthias
>
> On 8/23/17 12:24 PM, Bill Bejeck wrote:
>> Thanks for all the work on this KIP Damian.
>>
>> Both `Produced` and `Joined` have a `with` method accepting all
> parameters,
>> but `Consumed` doesn't. Should we add one for consistency?
>>
>> Thanks,
>> Bill
>>
>> On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy 
> wrote:
>>
>>> KIP has been updated. thanks
>>>
>>> On Wed, 23 Aug 2017 at 09:10 Damian Guy 
>>> wrote:
>>>
 Hi Matthias,


> KStream:
> leftJoin and outerJoin for KStream/KTable join should not have
> `JoinWindows` parameter
>
> Thanks!


>
> Nit: TopologyBuilder -> Topology
>
> Ack


> Nit: new class Serialized list static method #with twice
>
> Ack


> WindowedKStream -> for consistency we should either have
> GroupedKStream
> or KWindowedStream... (similar argument for
>>> SessionWindowedKStream)
>
> We can't rename KGroupedStream -> 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-24 Thread Xavier Léauté
A few comments on the KIP:
- I'm a bit confused about the BytesStoreSupplier interface. Nothing in its
definition is really specific to Bytes, and
when I see return types like BytesStoreSupplier>, it seems redundant to have "Bytes" in the supplier name.
Why can't we reuse the existing StateStoreSupplier interface and move the
extra logConfig and loggingEnabled methods elsewhere?
- I don't really see any mention of the motivation behind the Materialized
interface and what the implications are for the user, i.e. what does it
mean for a store to be materialized.
- Until now, serialization implementation details were decoupled from the
state store interfaces. With this KIP we are now bubbling up the
assumptions that state store going to be using Bytes or byte[] into the
API. I'm not a fan of this, because it precludes us from providing more
efficient implementations, e.g. using ByteBuffers, that can avoid costly
byte array copying and extraneous byte array allocations during serde
operations.
A better approach might be to provide a first class ByteStore interface
that could help abstract the different types of buffers we might want to
use, or alternatively use a buffer agnostic type in the state store
definition (similar to what LMDB

 does)

On Thu, Aug 24, 2017 at 1:53 AM Damian Guy  wrote:

> I've updated the kip to reflect Bill's comment and also to make
> StreamBuilder methods have topic as the first param, i.e.,
> StreamBuilder#stream no longer accepts varargs.
>
> On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:
>
> > On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:
> >
> >> I have a couple of comments but otherwise it LGTM:
> >>
> >> 1. For these two functions in StreamsBuilder, the topic String is set as
> >> the second parameter in between of two options. Would that be better to
> be
> >> set as the first or the last one instead?
> >>
> >> It would be better as the first, but then it is different to the
> > #streams() methods due to varargs.
> >
> >
> >> public synchronized  KTable table(final Consumed
> >> consumed, final String topic, final Materialized materialized)
> >>
> >> public synchronized  GlobalKTable globalTable(final
> >> Consumed >> V> consumed, final String topic, final Materialized materialized)
> >>
> >> I understand that we cannot do it for the first parameter because of the
> >> vararg type. So I'd suggest either
> >>
> >> a) set it as the last parameter, but then it is inconsistent with other
> >> functions like these:
> >>
> >> void to(final String topic, final Produced options);
> >>
> >> KTable through(final String topic, final Materialized
> >> options);
> >>
> >> b) only allow one single topic name parameter in StreamsBuilder.stream()
> >> since in practice we do not see too many usages of multiple topics, plus
> >> it
> >> can be semi-supported with "merge" as we move it from StreamsBuilder to
> >> KStream (KAFKA-5765),
> >>
> >> Perhaps this is the better approach
> >
> >
> >> 2. KGroupedStream's function:
> >>
> >>  KTable aggregate(final Initializer initializer,
> >>  final Aggregator
> >> aggregator,
> >>  final Serde aggValueSerde,
> >>  final Materialized >> VR>> materialized);
> >>
> >> The "aggValueSerde" seems not needed?
> >>
> >> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream
> >> was
> >> a bad name as a hind-sight. I personally feel we should just correct it
> >> with a new class and deprecate / remove the old one before 1.0.0, but
> that
> >> could be in its own KIP.
> >>
> >>
> > The problem with this is that we'd need to add new `groupBy` and
> > `groupByKey` methods that return `GroupedKStream`, we can't change the
> > existing ones as that would break compatibility. So what would we name
> > these methods?
> >
> >
> >>
> >> Guozhang
> >>
> >>
> >>
> >> On Wed, Aug 23, 2017 at 1:01 PM, Damian Guy 
> wrote:
> >>
> >> > We already have GlobalKTable and i can't rename KGroupedStream, which
> >> > really should be GroupedKStream. So I think we should name new things
> >> > correctly, i.e., WindowedKStream etc and fix the others when we can.
> >> >
> >> > On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax 
> >> > wrote:
> >> >
> >> > > About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
> >> > > convention consistent? And if we change the naming schema just
> change
> >> > > all at once? I personally don't care which naming scheme is better,
> >> but
> >> > > I think consistency is super important!
> >> > >
> >> > > About Bill's comment: I agree, and had a similar thought.
> >> > >
> >> > >
> >> > > -Matthias
> >> > >
> 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-24 Thread Damian Guy
I've updated the kip to reflect Bill's comment and also to make
StreamBuilder methods have topic as the first param, i.e.,
StreamBuilder#stream no longer accepts varargs.

On Thu, 24 Aug 2017 at 09:12 Damian Guy  wrote:

> On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:
>
>> I have a couple of comments but otherwise it LGTM:
>>
>> 1. For these two functions in StreamsBuilder, the topic String is set as
>> the second parameter in between of two options. Would that be better to be
>> set as the first or the last one instead?
>>
>> It would be better as the first, but then it is different to the
> #streams() methods due to varargs.
>
>
>> public synchronized  KTable table(final Consumed
>> consumed, final String topic, final Materialized materialized)
>>
>> public synchronized  GlobalKTable globalTable(final
>> Consumed> V> consumed, final String topic, final Materialized materialized)
>>
>> I understand that we cannot do it for the first parameter because of the
>> vararg type. So I'd suggest either
>>
>> a) set it as the last parameter, but then it is inconsistent with other
>> functions like these:
>>
>> void to(final String topic, final Produced options);
>>
>> KTable through(final String topic, final Materialized
>> options);
>>
>> b) only allow one single topic name parameter in StreamsBuilder.stream()
>> since in practice we do not see too many usages of multiple topics, plus
>> it
>> can be semi-supported with "merge" as we move it from StreamsBuilder to
>> KStream (KAFKA-5765),
>>
>> Perhaps this is the better approach
>
>
>> 2. KGroupedStream's function:
>>
>>  KTable aggregate(final Initializer initializer,
>>  final Aggregator
>> aggregator,
>>  final Serde aggValueSerde,
>>  final Materialized> VR>> materialized);
>>
>> The "aggValueSerde" seems not needed?
>>
>> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream
>> was
>> a bad name as a hind-sight. I personally feel we should just correct it
>> with a new class and deprecate / remove the old one before 1.0.0, but that
>> could be in its own KIP.
>>
>>
> The problem with this is that we'd need to add new `groupBy` and
> `groupByKey` methods that return `GroupedKStream`, we can't change the
> existing ones as that would break compatibility. So what would we name
> these methods?
>
>
>>
>> Guozhang
>>
>>
>>
>> On Wed, Aug 23, 2017 at 1:01 PM, Damian Guy  wrote:
>>
>> > We already have GlobalKTable and i can't rename KGroupedStream, which
>> > really should be GroupedKStream. So I think we should name new things
>> > correctly, i.e., WindowedKStream etc and fix the others when we can.
>> >
>> > On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax 
>> > wrote:
>> >
>> > > About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
>> > > convention consistent? And if we change the naming schema just change
>> > > all at once? I personally don't care which naming scheme is better,
>> but
>> > > I think consistency is super important!
>> > >
>> > > About Bill's comment: I agree, and had a similar thought.
>> > >
>> > >
>> > > -Matthias
>> > >
>> > > On 8/23/17 12:24 PM, Bill Bejeck wrote:
>> > > > Thanks for all the work on this KIP Damian.
>> > > >
>> > > > Both `Produced` and `Joined` have a `with` method accepting all
>> > > parameters,
>> > > > but `Consumed` doesn't. Should we add one for consistency?
>> > > >
>> > > > Thanks,
>> > > > Bill
>> > > >
>> > > > On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy 
>> > > wrote:
>> > > >
>> > > >> KIP has been updated. thanks
>> > > >>
>> > > >> On Wed, 23 Aug 2017 at 09:10 Damian Guy 
>> wrote:
>> > > >>
>> > > >>> Hi Matthias,
>> > > >>>
>> > > >>>
>> > >  KStream:
>> > >  leftJoin and outerJoin for KStream/KTable join should not have
>> > >  `JoinWindows` parameter
>> > > 
>> > >  Thanks!
>> > > >>>
>> > > >>>
>> > > 
>> > >  Nit: TopologyBuilder -> Topology
>> > > 
>> > >  Ack
>> > > >>>
>> > > >>>
>> > >  Nit: new class Serialized list static method #with twice
>> > > 
>> > >  Ack
>> > > >>>
>> > > >>>
>> > >  WindowedKStream -> for consistency we should either have
>> > > GroupedKStream
>> > >  or KWindowedStream... (similar argument for
>> SessionWindowedKStream)
>> > > 
>> > >  We can't rename KGroupedStream -> GroupedKStream without breaking
>> > > >>> compatibility. So we are stuck with it for now. Hopefully in the
>> > future
>> > > >> we
>> > > >>> can rename KGroupedStream to GroupedKStream.
>> > > >>>
>> > > >>>
>> > > 
>> > >  KGroupedStream
>> > >  -> why do we use a different name for `sessionWindowedBy()` --
>> seems
>> > > to
>> > >  be cleaner to call both methods `windowedBy()`

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-24 Thread Damian Guy
On Thu, 24 Aug 2017 at 02:49 Guozhang Wang  wrote:

> I have a couple of comments but otherwise it LGTM:
>
> 1. For these two functions in StreamsBuilder, the topic String is set as
> the second parameter in between of two options. Would that be better to be
> set as the first or the last one instead?
>
> It would be better as the first, but then it is different to the
#streams() methods due to varargs.


> public synchronized  KTable table(final Consumed
> consumed, final String topic, final Materialized materialized)
>
> public synchronized  GlobalKTable globalTable(final Consumed V> consumed, final String topic, final Materialized materialized)
>
> I understand that we cannot do it for the first parameter because of the
> vararg type. So I'd suggest either
>
> a) set it as the last parameter, but then it is inconsistent with other
> functions like these:
>
> void to(final String topic, final Produced options);
>
> KTable through(final String topic, final Materialized options);
>
> b) only allow one single topic name parameter in StreamsBuilder.stream()
> since in practice we do not see too many usages of multiple topics, plus it
> can be semi-supported with "merge" as we move it from StreamsBuilder to
> KStream (KAFKA-5765),
>
> Perhaps this is the better approach


> 2. KGroupedStream's function:
>
>  KTable aggregate(final Initializer initializer,
>  final Aggregator
> aggregator,
>  final Serde aggValueSerde,
>  final Materialized VR>> materialized);
>
> The "aggValueSerde" seems not needed?
>
> 3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream was
> a bad name as a hind-sight. I personally feel we should just correct it
> with a new class and deprecate / remove the old one before 1.0.0, but that
> could be in its own KIP.
>
>
The problem with this is that we'd need to add new `groupBy` and
`groupByKey` methods that return `GroupedKStream`, we can't change the
existing ones as that would break compatibility. So what would we name
these methods?


>
> Guozhang
>
>
>
> On Wed, Aug 23, 2017 at 1:01 PM, Damian Guy  wrote:
>
> > We already have GlobalKTable and i can't rename KGroupedStream, which
> > really should be GroupedKStream. So I think we should name new things
> > correctly, i.e., WindowedKStream etc and fix the others when we can.
> >
> > On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax 
> > wrote:
> >
> > > About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
> > > convention consistent? And if we change the naming schema just change
> > > all at once? I personally don't care which naming scheme is better, but
> > > I think consistency is super important!
> > >
> > > About Bill's comment: I agree, and had a similar thought.
> > >
> > >
> > > -Matthias
> > >
> > > On 8/23/17 12:24 PM, Bill Bejeck wrote:
> > > > Thanks for all the work on this KIP Damian.
> > > >
> > > > Both `Produced` and `Joined` have a `with` method accepting all
> > > parameters,
> > > > but `Consumed` doesn't. Should we add one for consistency?
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > > > On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy 
> > > wrote:
> > > >
> > > >> KIP has been updated. thanks
> > > >>
> > > >> On Wed, 23 Aug 2017 at 09:10 Damian Guy 
> wrote:
> > > >>
> > > >>> Hi Matthias,
> > > >>>
> > > >>>
> > >  KStream:
> > >  leftJoin and outerJoin for KStream/KTable join should not have
> > >  `JoinWindows` parameter
> > > 
> > >  Thanks!
> > > >>>
> > > >>>
> > > 
> > >  Nit: TopologyBuilder -> Topology
> > > 
> > >  Ack
> > > >>>
> > > >>>
> > >  Nit: new class Serialized list static method #with twice
> > > 
> > >  Ack
> > > >>>
> > > >>>
> > >  WindowedKStream -> for consistency we should either have
> > > GroupedKStream
> > >  or KWindowedStream... (similar argument for
> SessionWindowedKStream)
> > > 
> > >  We can't rename KGroupedStream -> GroupedKStream without breaking
> > > >>> compatibility. So we are stuck with it for now. Hopefully in the
> > future
> > > >> we
> > > >>> can rename KGroupedStream to GroupedKStream.
> > > >>>
> > > >>>
> > > 
> > >  KGroupedStream
> > >  -> why do we use a different name for `sessionWindowedBy()` --
> seems
> > > to
> > >  be cleaner to call both methods `windowedBy()`
> > > 
> > > 
> > > >>> I beg to differ that it is cleaner either way!
> > > >>>
> > > >>>
> > > 
> > >  StreamsBuilder#stream -> parameter order is confusing... We should
> > > have
> > >  Pattern as second parameter to align both methods.
> > > 
> > >  Ack
> > > >>>
> > > >>>
> > >  StreamsBuilder#table/globalTable -> move parameter `Consumed` as
> > first
> > >  

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Guozhang Wang
I have a couple of comments but otherwise it LGTM:

1. For these two functions in StreamsBuilder, the topic String is set as
the second parameter in between of two options. Would that be better to be
set as the first or the last one instead?

public synchronized  KTable table(final Consumed
consumed, final String topic, final Materialized materialized)

public synchronized  GlobalKTable globalTable(final Consumed consumed, final String topic, final Materialized materialized)

I understand that we cannot do it for the first parameter because of the
vararg type. So I'd suggest either

a) set it as the last parameter, but then it is inconsistent with other
functions like these:

void to(final String topic, final Produced options);

KTable through(final String topic, final Materialized options);

b) only allow one single topic name parameter in StreamsBuilder.stream()
since in practice we do not see too many usages of multiple topics, plus it
can be semi-supported with "merge" as we move it from StreamsBuilder to
KStream (KAFKA-5765),

2. KGroupedStream's function:

 KTable aggregate(final Initializer initializer,
 final Aggregator
aggregator,
 final Serde aggValueSerde,
 final Materialized> materialized);

The "aggValueSerde" seems not needed?

3. +1 on `KGroupedStream` v.s. `GroupedKStream`. I think KGroupedStream was
a bad name as a hind-sight. I personally feel we should just correct it
with a new class and deprecate / remove the old one before 1.0.0, but that
could be in its own KIP.


Guozhang



On Wed, Aug 23, 2017 at 1:01 PM, Damian Guy  wrote:

> We already have GlobalKTable and i can't rename KGroupedStream, which
> really should be GroupedKStream. So I think we should name new things
> correctly, i.e., WindowedKStream etc and fix the others when we can.
>
> On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax 
> wrote:
>
> > About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
> > convention consistent? And if we change the naming schema just change
> > all at once? I personally don't care which naming scheme is better, but
> > I think consistency is super important!
> >
> > About Bill's comment: I agree, and had a similar thought.
> >
> >
> > -Matthias
> >
> > On 8/23/17 12:24 PM, Bill Bejeck wrote:
> > > Thanks for all the work on this KIP Damian.
> > >
> > > Both `Produced` and `Joined` have a `with` method accepting all
> > parameters,
> > > but `Consumed` doesn't. Should we add one for consistency?
> > >
> > > Thanks,
> > > Bill
> > >
> > > On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy 
> > wrote:
> > >
> > >> KIP has been updated. thanks
> > >>
> > >> On Wed, 23 Aug 2017 at 09:10 Damian Guy  wrote:
> > >>
> > >>> Hi Matthias,
> > >>>
> > >>>
> >  KStream:
> >  leftJoin and outerJoin for KStream/KTable join should not have
> >  `JoinWindows` parameter
> > 
> >  Thanks!
> > >>>
> > >>>
> > 
> >  Nit: TopologyBuilder -> Topology
> > 
> >  Ack
> > >>>
> > >>>
> >  Nit: new class Serialized list static method #with twice
> > 
> >  Ack
> > >>>
> > >>>
> >  WindowedKStream -> for consistency we should either have
> > GroupedKStream
> >  or KWindowedStream... (similar argument for SessionWindowedKStream)
> > 
> >  We can't rename KGroupedStream -> GroupedKStream without breaking
> > >>> compatibility. So we are stuck with it for now. Hopefully in the
> future
> > >> we
> > >>> can rename KGroupedStream to GroupedKStream.
> > >>>
> > >>>
> > 
> >  KGroupedStream
> >  -> why do we use a different name for `sessionWindowedBy()` -- seems
> > to
> >  be cleaner to call both methods `windowedBy()`
> > 
> > 
> > >>> I beg to differ that it is cleaner either way!
> > >>>
> > >>>
> > 
> >  StreamsBuilder#stream -> parameter order is confusing... We should
> > have
> >  Pattern as second parameter to align both methods.
> > 
> >  Ack
> > >>>
> > >>>
> >  StreamsBuilder#table/globalTable -> move parameter `Consumed` as
> first
> >  parameter to align with `#stream`
> > 
> > 
> >  Ack
> > >>>
> >  Produced#with(Serde, Serde)
> >  Produced#with(StreamPartitioner, Serde, Serde)
> >  -> should StreamPartitioner be the third argument instead of the
> > first?
> > 
> >  Sure
> > >>>
> > 
> >  Consumed:
> >  Why do we need 3 different names for the 3 static methods? I would
> all
> >  of them just call `with()`. Current names sound clumsy to me. And a
> >  plain `with()` also aligns with the naming of static methods of
> other
> >  classes.
> > 
> > >>>
> > >>> I disagree that the names sound clumsy! But yes they should be
> aligned
> > >>> with the others.
> > >>>
> > 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Damian Guy
We already have GlobalKTable and i can't rename KGroupedStream, which
really should be GroupedKStream. So I think we should name new things
correctly, i.e., WindowedKStream etc and fix the others when we can.

On Wed, 23 Aug 2017 at 20:38 Matthias J. Sax  wrote:

> About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
> convention consistent? And if we change the naming schema just change
> all at once? I personally don't care which naming scheme is better, but
> I think consistency is super important!
>
> About Bill's comment: I agree, and had a similar thought.
>
>
> -Matthias
>
> On 8/23/17 12:24 PM, Bill Bejeck wrote:
> > Thanks for all the work on this KIP Damian.
> >
> > Both `Produced` and `Joined` have a `with` method accepting all
> parameters,
> > but `Consumed` doesn't. Should we add one for consistency?
> >
> > Thanks,
> > Bill
> >
> > On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy 
> wrote:
> >
> >> KIP has been updated. thanks
> >>
> >> On Wed, 23 Aug 2017 at 09:10 Damian Guy  wrote:
> >>
> >>> Hi Matthias,
> >>>
> >>>
>  KStream:
>  leftJoin and outerJoin for KStream/KTable join should not have
>  `JoinWindows` parameter
> 
>  Thanks!
> >>>
> >>>
> 
>  Nit: TopologyBuilder -> Topology
> 
>  Ack
> >>>
> >>>
>  Nit: new class Serialized list static method #with twice
> 
>  Ack
> >>>
> >>>
>  WindowedKStream -> for consistency we should either have
> GroupedKStream
>  or KWindowedStream... (similar argument for SessionWindowedKStream)
> 
>  We can't rename KGroupedStream -> GroupedKStream without breaking
> >>> compatibility. So we are stuck with it for now. Hopefully in the future
> >> we
> >>> can rename KGroupedStream to GroupedKStream.
> >>>
> >>>
> 
>  KGroupedStream
>  -> why do we use a different name for `sessionWindowedBy()` -- seems
> to
>  be cleaner to call both methods `windowedBy()`
> 
> 
> >>> I beg to differ that it is cleaner either way!
> >>>
> >>>
> 
>  StreamsBuilder#stream -> parameter order is confusing... We should
> have
>  Pattern as second parameter to align both methods.
> 
>  Ack
> >>>
> >>>
>  StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
>  parameter to align with `#stream`
> 
> 
>  Ack
> >>>
>  Produced#with(Serde, Serde)
>  Produced#with(StreamPartitioner, Serde, Serde)
>  -> should StreamPartitioner be the third argument instead of the
> first?
> 
>  Sure
> >>>
> 
>  Consumed:
>  Why do we need 3 different names for the 3 static methods? I would all
>  of them just call `with()`. Current names sound clumsy to me. And a
>  plain `with()` also aligns with the naming of static methods of other
>  classes.
> 
> >>>
> >>> I disagree that the names sound clumsy! But yes they should be aligned
> >>> with the others.
> >>>
> >>>
> 
> 
>  I guess we are also deprecation a bunch of method for
>  KStream/KTable/KGroupedStream/KGroupedTable and should mention which
>  one? There is just one sentence "Deprecate the existing overloads.",
> but
>  we don't deprecate all existing once. I personally don't care to much
> if
>  we spell deprecated method out explicitly, but right now it's not
>  consistent as we only list methods we add.
> 
> 
> >>>
>  Should we deprecate `StateStoreSupplier`?
> 
> >>> Yep
> >>>
> 
> 
>  -Matthias
> 
> 
> 
>  On 8/22/17 6:55 AM, Damian Guy wrote:
> > I've just updated the KIP with some additional changes targeted at
> > StreamsBuilder
> >
> > Thanks,
> > Damian
> >
> > On Thu, 10 Aug 2017 at 12:59 Damian Guy 
> wrote:
> >
> >>
> >>> Got it, thanks.
> >>>
> >>> Does it still make sense to have one static constructors for each
>  spec,
> >>> with one constructor having only one parameter to make it more
> >> usable,
> >>> i.e.
> >>> as a user I do not need to give all parameters if I only want to
>  override
> >>> one of them? Maybe we can just name the constructors as `with` but
>  I'm not
> >>> sure if Java distinguish:
> >>>
> >>> public static  Produced with(final Serde keySerde)
> >>> public static  Produced with(final Serde valueSerde)
> >>>
> >>> as two function signatures.
> >>>
> >>>
> >> No that won't work. That is why we have all options, i.e., on
> Produce
> >> public static  Produced with(final Serde keySerde,
>  final Serde
> >> valueSerde)
> >> public static  Produced with(final StreamPartitioner >> V>
> >> partitioner, final Serde keySerde, final Serde valueSerde)
> >> public static  Produced keySerde(final Serde
> keySerde)
> >> public static  

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Damian Guy
Thanks Bill

On Wed, 23 Aug 2017 at 20:24 Bill Bejeck  wrote:

> Thanks for all the work on this KIP Damian.
>
> Both `Produced` and `Joined` have a `with` method accepting all parameters,
> but `Consumed` doesn't. Should we add one for consistency?
>
>
Yep, i'll update it


> Thanks,
> Bill
>
> On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy  wrote:
>
> > KIP has been updated. thanks
> >
> > On Wed, 23 Aug 2017 at 09:10 Damian Guy  wrote:
> >
> > > Hi Matthias,
> > >
> > >
> > >> KStream:
> > >> leftJoin and outerJoin for KStream/KTable join should not have
> > >> `JoinWindows` parameter
> > >>
> > >> Thanks!
> > >
> > >
> > >>
> > >> Nit: TopologyBuilder -> Topology
> > >>
> > >> Ack
> > >
> > >
> > >> Nit: new class Serialized list static method #with twice
> > >>
> > >> Ack
> > >
> > >
> > >> WindowedKStream -> for consistency we should either have
> GroupedKStream
> > >> or KWindowedStream... (similar argument for SessionWindowedKStream)
> > >>
> > >> We can't rename KGroupedStream -> GroupedKStream without breaking
> > > compatibility. So we are stuck with it for now. Hopefully in the future
> > we
> > > can rename KGroupedStream to GroupedKStream.
> > >
> > >
> > >>
> > >> KGroupedStream
> > >> -> why do we use a different name for `sessionWindowedBy()` -- seems
> to
> > >> be cleaner to call both methods `windowedBy()`
> > >>
> > >>
> > > I beg to differ that it is cleaner either way!
> > >
> > >
> > >>
> > >> StreamsBuilder#stream -> parameter order is confusing... We should
> have
> > >> Pattern as second parameter to align both methods.
> > >>
> > >> Ack
> > >
> > >
> > >> StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
> > >> parameter to align with `#stream`
> > >>
> > >>
> > >> Ack
> > >
> > >> Produced#with(Serde, Serde)
> > >> Produced#with(StreamPartitioner, Serde, Serde)
> > >> -> should StreamPartitioner be the third argument instead of the
> first?
> > >>
> > >> Sure
> > >
> > >>
> > >> Consumed:
> > >> Why do we need 3 different names for the 3 static methods? I would all
> > >> of them just call `with()`. Current names sound clumsy to me. And a
> > >> plain `with()` also aligns with the naming of static methods of other
> > >> classes.
> > >>
> > >
> > > I disagree that the names sound clumsy! But yes they should be aligned
> > > with the others.
> > >
> > >
> > >>
> > >>
> > >> I guess we are also deprecation a bunch of method for
> > >> KStream/KTable/KGroupedStream/KGroupedTable and should mention which
> > >> one? There is just one sentence "Deprecate the existing overloads.",
> but
> > >> we don't deprecate all existing once. I personally don't care to much
> if
> > >> we spell deprecated method out explicitly, but right now it's not
> > >> consistent as we only list methods we add.
> > >>
> > >>
> > >
> > >> Should we deprecate `StateStoreSupplier`?
> > >>
> > > Yep
> > >
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >>
> > >> On 8/22/17 6:55 AM, Damian Guy wrote:
> > >> > I've just updated the KIP with some additional changes targeted at
> > >> > StreamsBuilder
> > >> >
> > >> > Thanks,
> > >> > Damian
> > >> >
> > >> > On Thu, 10 Aug 2017 at 12:59 Damian Guy 
> wrote:
> > >> >
> > >> >>
> > >> >>> Got it, thanks.
> > >> >>>
> > >> >>> Does it still make sense to have one static constructors for each
> > >> spec,
> > >> >>> with one constructor having only one parameter to make it more
> > usable,
> > >> >>> i.e.
> > >> >>> as a user I do not need to give all parameters if I only want to
> > >> override
> > >> >>> one of them? Maybe we can just name the constructors as `with` but
> > >> I'm not
> > >> >>> sure if Java distinguish:
> > >> >>>
> > >> >>> public static  Produced with(final Serde keySerde)
> > >> >>> public static  Produced with(final Serde
> valueSerde)
> > >> >>>
> > >> >>> as two function signatures.
> > >> >>>
> > >> >>>
> > >> >> No that won't work. That is why we have all options, i.e., on
> Produce
> > >> >> public static  Produced with(final Serde keySerde,
> > >> final Serde
> > >> >> valueSerde)
> > >> >> public static  Produced with(final StreamPartitioner > V>
> > >> >> partitioner, final Serde keySerde, final Serde valueSerde)
> > >> >> public static  Produced keySerde(final Serde
> keySerde)
> > >> >> public static  Produced valueSerde(final Serde
> > >> valueSerde)
> > >> >> public static  Produced streamPartitioner(final
> > >> StreamPartitioner > >> >> V> partitioner)
> > >> >>
> > >> >> So if you only want to use one you can just use the function that
> > takes
> > >> >> one argument.
> > >> >>
> > >> >>>
> > >> >>> Guozhang
> > >> >>>
> > >> >>>
> > >> >>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy 
> > >> wrote:
> > >> >>>
> > >>  On Tue, 8 Aug 2017 at 20:11 Guozhang Wang 
> > >> wrote:
> > >> 
> > 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Matthias J. Sax
About KGroupedStream vs GroupedKStream: shouldn't we keep the naming
convention consistent? And if we change the naming schema just change
all at once? I personally don't care which naming scheme is better, but
I think consistency is super important!

About Bill's comment: I agree, and had a similar thought.


-Matthias

On 8/23/17 12:24 PM, Bill Bejeck wrote:
> Thanks for all the work on this KIP Damian.
> 
> Both `Produced` and `Joined` have a `with` method accepting all parameters,
> but `Consumed` doesn't. Should we add one for consistency?
> 
> Thanks,
> Bill
> 
> On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy  wrote:
> 
>> KIP has been updated. thanks
>>
>> On Wed, 23 Aug 2017 at 09:10 Damian Guy  wrote:
>>
>>> Hi Matthias,
>>>
>>>
 KStream:
 leftJoin and outerJoin for KStream/KTable join should not have
 `JoinWindows` parameter

 Thanks!
>>>
>>>

 Nit: TopologyBuilder -> Topology

 Ack
>>>
>>>
 Nit: new class Serialized list static method #with twice

 Ack
>>>
>>>
 WindowedKStream -> for consistency we should either have GroupedKStream
 or KWindowedStream... (similar argument for SessionWindowedKStream)

 We can't rename KGroupedStream -> GroupedKStream without breaking
>>> compatibility. So we are stuck with it for now. Hopefully in the future
>> we
>>> can rename KGroupedStream to GroupedKStream.
>>>
>>>

 KGroupedStream
 -> why do we use a different name for `sessionWindowedBy()` -- seems to
 be cleaner to call both methods `windowedBy()`


>>> I beg to differ that it is cleaner either way!
>>>
>>>

 StreamsBuilder#stream -> parameter order is confusing... We should have
 Pattern as second parameter to align both methods.

 Ack
>>>
>>>
 StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
 parameter to align with `#stream`


 Ack
>>>
 Produced#with(Serde, Serde)
 Produced#with(StreamPartitioner, Serde, Serde)
 -> should StreamPartitioner be the third argument instead of the first?

 Sure
>>>

 Consumed:
 Why do we need 3 different names for the 3 static methods? I would all
 of them just call `with()`. Current names sound clumsy to me. And a
 plain `with()` also aligns with the naming of static methods of other
 classes.

>>>
>>> I disagree that the names sound clumsy! But yes they should be aligned
>>> with the others.
>>>
>>>


 I guess we are also deprecation a bunch of method for
 KStream/KTable/KGroupedStream/KGroupedTable and should mention which
 one? There is just one sentence "Deprecate the existing overloads.", but
 we don't deprecate all existing once. I personally don't care to much if
 we spell deprecated method out explicitly, but right now it's not
 consistent as we only list methods we add.


>>>
 Should we deprecate `StateStoreSupplier`?

>>> Yep
>>>


 -Matthias



 On 8/22/17 6:55 AM, Damian Guy wrote:
> I've just updated the KIP with some additional changes targeted at
> StreamsBuilder
>
> Thanks,
> Damian
>
> On Thu, 10 Aug 2017 at 12:59 Damian Guy  wrote:
>
>>
>>> Got it, thanks.
>>>
>>> Does it still make sense to have one static constructors for each
 spec,
>>> with one constructor having only one parameter to make it more
>> usable,
>>> i.e.
>>> as a user I do not need to give all parameters if I only want to
 override
>>> one of them? Maybe we can just name the constructors as `with` but
 I'm not
>>> sure if Java distinguish:
>>>
>>> public static  Produced with(final Serde keySerde)
>>> public static  Produced with(final Serde valueSerde)
>>>
>>> as two function signatures.
>>>
>>>
>> No that won't work. That is why we have all options, i.e., on Produce
>> public static  Produced with(final Serde keySerde,
 final Serde
>> valueSerde)
>> public static  Produced with(final StreamPartitioner> V>
>> partitioner, final Serde keySerde, final Serde valueSerde)
>> public static  Produced keySerde(final Serde keySerde)
>> public static  Produced valueSerde(final Serde
 valueSerde)
>> public static  Produced streamPartitioner(final
 StreamPartitioner> V> partitioner)
>>
>> So if you only want to use one you can just use the function that
>> takes
>> one argument.
>>
>>>
>>> Guozhang
>>>
>>>
>>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy 
 wrote:
>>>
 On Tue, 8 Aug 2017 at 20:11 Guozhang Wang 
 wrote:

> Damian,
>
> Thanks for the proposal, I had a few comments on the APIs:

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Bill Bejeck
Thanks for all the work on this KIP Damian.

Both `Produced` and `Joined` have a `with` method accepting all parameters,
but `Consumed` doesn't. Should we add one for consistency?

Thanks,
Bill

On Wed, Aug 23, 2017 at 4:15 AM, Damian Guy  wrote:

> KIP has been updated. thanks
>
> On Wed, 23 Aug 2017 at 09:10 Damian Guy  wrote:
>
> > Hi Matthias,
> >
> >
> >> KStream:
> >> leftJoin and outerJoin for KStream/KTable join should not have
> >> `JoinWindows` parameter
> >>
> >> Thanks!
> >
> >
> >>
> >> Nit: TopologyBuilder -> Topology
> >>
> >> Ack
> >
> >
> >> Nit: new class Serialized list static method #with twice
> >>
> >> Ack
> >
> >
> >> WindowedKStream -> for consistency we should either have GroupedKStream
> >> or KWindowedStream... (similar argument for SessionWindowedKStream)
> >>
> >> We can't rename KGroupedStream -> GroupedKStream without breaking
> > compatibility. So we are stuck with it for now. Hopefully in the future
> we
> > can rename KGroupedStream to GroupedKStream.
> >
> >
> >>
> >> KGroupedStream
> >> -> why do we use a different name for `sessionWindowedBy()` -- seems to
> >> be cleaner to call both methods `windowedBy()`
> >>
> >>
> > I beg to differ that it is cleaner either way!
> >
> >
> >>
> >> StreamsBuilder#stream -> parameter order is confusing... We should have
> >> Pattern as second parameter to align both methods.
> >>
> >> Ack
> >
> >
> >> StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
> >> parameter to align with `#stream`
> >>
> >>
> >> Ack
> >
> >> Produced#with(Serde, Serde)
> >> Produced#with(StreamPartitioner, Serde, Serde)
> >> -> should StreamPartitioner be the third argument instead of the first?
> >>
> >> Sure
> >
> >>
> >> Consumed:
> >> Why do we need 3 different names for the 3 static methods? I would all
> >> of them just call `with()`. Current names sound clumsy to me. And a
> >> plain `with()` also aligns with the naming of static methods of other
> >> classes.
> >>
> >
> > I disagree that the names sound clumsy! But yes they should be aligned
> > with the others.
> >
> >
> >>
> >>
> >> I guess we are also deprecation a bunch of method for
> >> KStream/KTable/KGroupedStream/KGroupedTable and should mention which
> >> one? There is just one sentence "Deprecate the existing overloads.", but
> >> we don't deprecate all existing once. I personally don't care to much if
> >> we spell deprecated method out explicitly, but right now it's not
> >> consistent as we only list methods we add.
> >>
> >>
> >
> >> Should we deprecate `StateStoreSupplier`?
> >>
> > Yep
> >
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 8/22/17 6:55 AM, Damian Guy wrote:
> >> > I've just updated the KIP with some additional changes targeted at
> >> > StreamsBuilder
> >> >
> >> > Thanks,
> >> > Damian
> >> >
> >> > On Thu, 10 Aug 2017 at 12:59 Damian Guy  wrote:
> >> >
> >> >>
> >> >>> Got it, thanks.
> >> >>>
> >> >>> Does it still make sense to have one static constructors for each
> >> spec,
> >> >>> with one constructor having only one parameter to make it more
> usable,
> >> >>> i.e.
> >> >>> as a user I do not need to give all parameters if I only want to
> >> override
> >> >>> one of them? Maybe we can just name the constructors as `with` but
> >> I'm not
> >> >>> sure if Java distinguish:
> >> >>>
> >> >>> public static  Produced with(final Serde keySerde)
> >> >>> public static  Produced with(final Serde valueSerde)
> >> >>>
> >> >>> as two function signatures.
> >> >>>
> >> >>>
> >> >> No that won't work. That is why we have all options, i.e., on Produce
> >> >> public static  Produced with(final Serde keySerde,
> >> final Serde
> >> >> valueSerde)
> >> >> public static  Produced with(final StreamPartitioner V>
> >> >> partitioner, final Serde keySerde, final Serde valueSerde)
> >> >> public static  Produced keySerde(final Serde keySerde)
> >> >> public static  Produced valueSerde(final Serde
> >> valueSerde)
> >> >> public static  Produced streamPartitioner(final
> >> StreamPartitioner >> >> V> partitioner)
> >> >>
> >> >> So if you only want to use one you can just use the function that
> takes
> >> >> one argument.
> >> >>
> >> >>>
> >> >>> Guozhang
> >> >>>
> >> >>>
> >> >>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy 
> >> wrote:
> >> >>>
> >>  On Tue, 8 Aug 2017 at 20:11 Guozhang Wang 
> >> wrote:
> >> 
> >> > Damian,
> >> >
> >> > Thanks for the proposal, I had a few comments on the APIs:
> >> >
> >> > 1. Printed#withFile seems not needed, as users should always spec
> if
> >> >>> it
> >>  is
> >> > to sysOut or to File at the beginning. In addition as a second
> >> >>> thought, I
> >> > think serdes are not useful for prints anyways since we assume
> >> >>> `toString`
> >> > is provided except for byte 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Damian Guy
KIP has been updated. thanks

On Wed, 23 Aug 2017 at 09:10 Damian Guy  wrote:

> Hi Matthias,
>
>
>> KStream:
>> leftJoin and outerJoin for KStream/KTable join should not have
>> `JoinWindows` parameter
>>
>> Thanks!
>
>
>>
>> Nit: TopologyBuilder -> Topology
>>
>> Ack
>
>
>> Nit: new class Serialized list static method #with twice
>>
>> Ack
>
>
>> WindowedKStream -> for consistency we should either have GroupedKStream
>> or KWindowedStream... (similar argument for SessionWindowedKStream)
>>
>> We can't rename KGroupedStream -> GroupedKStream without breaking
> compatibility. So we are stuck with it for now. Hopefully in the future we
> can rename KGroupedStream to GroupedKStream.
>
>
>>
>> KGroupedStream
>> -> why do we use a different name for `sessionWindowedBy()` -- seems to
>> be cleaner to call both methods `windowedBy()`
>>
>>
> I beg to differ that it is cleaner either way!
>
>
>>
>> StreamsBuilder#stream -> parameter order is confusing... We should have
>> Pattern as second parameter to align both methods.
>>
>> Ack
>
>
>> StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
>> parameter to align with `#stream`
>>
>>
>> Ack
>
>> Produced#with(Serde, Serde)
>> Produced#with(StreamPartitioner, Serde, Serde)
>> -> should StreamPartitioner be the third argument instead of the first?
>>
>> Sure
>
>>
>> Consumed:
>> Why do we need 3 different names for the 3 static methods? I would all
>> of them just call `with()`. Current names sound clumsy to me. And a
>> plain `with()` also aligns with the naming of static methods of other
>> classes.
>>
>
> I disagree that the names sound clumsy! But yes they should be aligned
> with the others.
>
>
>>
>>
>> I guess we are also deprecation a bunch of method for
>> KStream/KTable/KGroupedStream/KGroupedTable and should mention which
>> one? There is just one sentence "Deprecate the existing overloads.", but
>> we don't deprecate all existing once. I personally don't care to much if
>> we spell deprecated method out explicitly, but right now it's not
>> consistent as we only list methods we add.
>>
>>
>
>> Should we deprecate `StateStoreSupplier`?
>>
> Yep
>
>>
>>
>> -Matthias
>>
>>
>>
>> On 8/22/17 6:55 AM, Damian Guy wrote:
>> > I've just updated the KIP with some additional changes targeted at
>> > StreamsBuilder
>> >
>> > Thanks,
>> > Damian
>> >
>> > On Thu, 10 Aug 2017 at 12:59 Damian Guy  wrote:
>> >
>> >>
>> >>> Got it, thanks.
>> >>>
>> >>> Does it still make sense to have one static constructors for each
>> spec,
>> >>> with one constructor having only one parameter to make it more usable,
>> >>> i.e.
>> >>> as a user I do not need to give all parameters if I only want to
>> override
>> >>> one of them? Maybe we can just name the constructors as `with` but
>> I'm not
>> >>> sure if Java distinguish:
>> >>>
>> >>> public static  Produced with(final Serde keySerde)
>> >>> public static  Produced with(final Serde valueSerde)
>> >>>
>> >>> as two function signatures.
>> >>>
>> >>>
>> >> No that won't work. That is why we have all options, i.e., on Produce
>> >> public static  Produced with(final Serde keySerde,
>> final Serde
>> >> valueSerde)
>> >> public static  Produced with(final StreamPartitioner
>> >> partitioner, final Serde keySerde, final Serde valueSerde)
>> >> public static  Produced keySerde(final Serde keySerde)
>> >> public static  Produced valueSerde(final Serde
>> valueSerde)
>> >> public static  Produced streamPartitioner(final
>> StreamPartitioner> >> V> partitioner)
>> >>
>> >> So if you only want to use one you can just use the function that takes
>> >> one argument.
>> >>
>> >>>
>> >>> Guozhang
>> >>>
>> >>>
>> >>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy 
>> wrote:
>> >>>
>>  On Tue, 8 Aug 2017 at 20:11 Guozhang Wang 
>> wrote:
>> 
>> > Damian,
>> >
>> > Thanks for the proposal, I had a few comments on the APIs:
>> >
>> > 1. Printed#withFile seems not needed, as users should always spec if
>> >>> it
>>  is
>> > to sysOut or to File at the beginning. In addition as a second
>> >>> thought, I
>> > think serdes are not useful for prints anyways since we assume
>> >>> `toString`
>> > is provided except for byte arrays, in which we will special handle
>> >>> it.
>> >
>> >
>>  +1
>> 
>> 
>> > Another comment about Printed in general is it differs with other
>> >>> options
>> > that it is a required option than optional one, since it includes
>>  toSysOut
>> > / toFile specs; what are the pros and cons for including these two
>> in
>> >>> the
>> > option and hence make it a required option than leaving them at the
>> >>> API
>> > layer and make Printed as optional for mapper / label only?
>> >
>> >
>>  It isn't required as we will still have the 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-23 Thread Damian Guy
Hi Matthias,


> KStream:
> leftJoin and outerJoin for KStream/KTable join should not have
> `JoinWindows` parameter
>
> Thanks!


>
> Nit: TopologyBuilder -> Topology
>
> Ack


> Nit: new class Serialized list static method #with twice
>
> Ack


> WindowedKStream -> for consistency we should either have GroupedKStream
> or KWindowedStream... (similar argument for SessionWindowedKStream)
>
> We can't rename KGroupedStream -> GroupedKStream without breaking
compatibility. So we are stuck with it for now. Hopefully in the future we
can rename KGroupedStream to GroupedKStream.


>
> KGroupedStream
> -> why do we use a different name for `sessionWindowedBy()` -- seems to
> be cleaner to call both methods `windowedBy()`
>
>
I beg to differ that it is cleaner either way!


>
> StreamsBuilder#stream -> parameter order is confusing... We should have
> Pattern as second parameter to align both methods.
>
> Ack


> StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
> parameter to align with `#stream`
>
>
> Ack

> Produced#with(Serde, Serde)
> Produced#with(StreamPartitioner, Serde, Serde)
> -> should StreamPartitioner be the third argument instead of the first?
>
> Sure

>
> Consumed:
> Why do we need 3 different names for the 3 static methods? I would all
> of them just call `with()`. Current names sound clumsy to me. And a
> plain `with()` also aligns with the naming of static methods of other
> classes.
>

I disagree that the names sound clumsy! But yes they should be aligned with
the others.


>
>
> I guess we are also deprecation a bunch of method for
> KStream/KTable/KGroupedStream/KGroupedTable and should mention which
> one? There is just one sentence "Deprecate the existing overloads.", but
> we don't deprecate all existing once. I personally don't care to much if
> we spell deprecated method out explicitly, but right now it's not
> consistent as we only list methods we add.
>
>

> Should we deprecate `StateStoreSupplier`?
>
Yep

>
>
> -Matthias
>
>
>
> On 8/22/17 6:55 AM, Damian Guy wrote:
> > I've just updated the KIP with some additional changes targeted at
> > StreamsBuilder
> >
> > Thanks,
> > Damian
> >
> > On Thu, 10 Aug 2017 at 12:59 Damian Guy  wrote:
> >
> >>
> >>> Got it, thanks.
> >>>
> >>> Does it still make sense to have one static constructors for each spec,
> >>> with one constructor having only one parameter to make it more usable,
> >>> i.e.
> >>> as a user I do not need to give all parameters if I only want to
> override
> >>> one of them? Maybe we can just name the constructors as `with` but I'm
> not
> >>> sure if Java distinguish:
> >>>
> >>> public static  Produced with(final Serde keySerde)
> >>> public static  Produced with(final Serde valueSerde)
> >>>
> >>> as two function signatures.
> >>>
> >>>
> >> No that won't work. That is why we have all options, i.e., on Produce
> >> public static  Produced with(final Serde keySerde, final
> Serde
> >> valueSerde)
> >> public static  Produced with(final StreamPartitioner
> >> partitioner, final Serde keySerde, final Serde valueSerde)
> >> public static  Produced keySerde(final Serde keySerde)
> >> public static  Produced valueSerde(final Serde
> valueSerde)
> >> public static  Produced streamPartitioner(final
> StreamPartitioner >> V> partitioner)
> >>
> >> So if you only want to use one you can just use the function that takes
> >> one argument.
> >>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy 
> wrote:
> >>>
>  On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:
> 
> > Damian,
> >
> > Thanks for the proposal, I had a few comments on the APIs:
> >
> > 1. Printed#withFile seems not needed, as users should always spec if
> >>> it
>  is
> > to sysOut or to File at the beginning. In addition as a second
> >>> thought, I
> > think serdes are not useful for prints anyways since we assume
> >>> `toString`
> > is provided except for byte arrays, in which we will special handle
> >>> it.
> >
> >
>  +1
> 
> 
> > Another comment about Printed in general is it differs with other
> >>> options
> > that it is a required option than optional one, since it includes
>  toSysOut
> > / toFile specs; what are the pros and cons for including these two in
> >>> the
> > option and hence make it a required option than leaving them at the
> >>> API
> > layer and make Printed as optional for mapper / label only?
> >
> >
>  It isn't required as we will still have the no-arg print() which will
> >>> just
>  go to sysout as it does now.
> 
> 
> >
> > 2.1 KStream#through / to
> >
> > We should have an overloaded function without Produced?
> >
> 
>  Yes - we already have those so they are not part of the KIP, i.e,
> 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-22 Thread Matthias J. Sax
Thanks to the update Damian!


Couple of comments:

KStream:
leftJoin and outerJoin for KStream/KTable join should not have
`JoinWindows` parameter


Nit: TopologyBuilder -> Topology

Nit: new class Serialized list static method #with twice


WindowedKStream -> for consistency we should either have GroupedKStream
or KWindowedStream... (similar argument for SessionWindowedKStream)


KGroupedStream
-> why do we use a different name for `sessionWindowedBy()` -- seems to
be cleaner to call both methods `windowedBy()`


StreamsBuilder#stream -> parameter order is confusing... We should have
Pattern as second parameter to align both methods.

StreamsBuilder#table/globalTable -> move parameter `Consumed` as first
parameter to align with `#stream`


Produced#with(Serde, Serde)
Produced#with(StreamPartitioner, Serde, Serde)
-> should StreamPartitioner be the third argument instead of the first?


Consumed:
Why do we need 3 different names for the 3 static methods? I would all
of them just call `with()`. Current names sound clumsy to me. And a
plain `with()` also aligns with the naming of static methods of other
classes.


I guess we are also deprecation a bunch of method for
KStream/KTable/KGroupedStream/KGroupedTable and should mention which
one? There is just one sentence "Deprecate the existing overloads.", but
we don't deprecate all existing once. I personally don't care to much if
we spell deprecated method out explicitly, but right now it's not
consistent as we only list methods we add.

Should we deprecate `StateStoreSupplier`?


-Matthias



On 8/22/17 6:55 AM, Damian Guy wrote:
> I've just updated the KIP with some additional changes targeted at
> StreamsBuilder
> 
> Thanks,
> Damian
> 
> On Thu, 10 Aug 2017 at 12:59 Damian Guy  wrote:
> 
>>
>>> Got it, thanks.
>>>
>>> Does it still make sense to have one static constructors for each spec,
>>> with one constructor having only one parameter to make it more usable,
>>> i.e.
>>> as a user I do not need to give all parameters if I only want to override
>>> one of them? Maybe we can just name the constructors as `with` but I'm not
>>> sure if Java distinguish:
>>>
>>> public static  Produced with(final Serde keySerde)
>>> public static  Produced with(final Serde valueSerde)
>>>
>>> as two function signatures.
>>>
>>>
>> No that won't work. That is why we have all options, i.e., on Produce
>> public static  Produced with(final Serde keySerde, final 
>> Serde
>> valueSerde)
>> public static  Produced with(final StreamPartitioner
>> partitioner, final Serde keySerde, final Serde valueSerde)
>> public static  Produced keySerde(final Serde keySerde)
>> public static  Produced valueSerde(final Serde valueSerde)
>> public static  Produced streamPartitioner(final 
>> StreamPartitioner> V> partitioner)
>>
>> So if you only want to use one you can just use the function that takes
>> one argument.
>>
>>>
>>> Guozhang
>>>
>>>
>>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy  wrote:
>>>
 On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:

> Damian,
>
> Thanks for the proposal, I had a few comments on the APIs:
>
> 1. Printed#withFile seems not needed, as users should always spec if
>>> it
 is
> to sysOut or to File at the beginning. In addition as a second
>>> thought, I
> think serdes are not useful for prints anyways since we assume
>>> `toString`
> is provided except for byte arrays, in which we will special handle
>>> it.
>
>
 +1


> Another comment about Printed in general is it differs with other
>>> options
> that it is a required option than optional one, since it includes
 toSysOut
> / toFile specs; what are the pros and cons for including these two in
>>> the
> option and hence make it a required option than leaving them at the
>>> API
> layer and make Printed as optional for mapper / label only?
>
>
 It isn't required as we will still have the no-arg print() which will
>>> just
 go to sysout as it does now.


>
> 2.1 KStream#through / to
>
> We should have an overloaded function without Produced?
>

 Yes - we already have those so they are not part of the KIP, i.e,
 through(topic)


>
> 2.2 KStream#groupBy / groupByKey
>
> We should have an overloaded function without Serialized?
>

 Yes, as above

>
> 2.3 KGroupedStream#count / reduce / aggregate
>
> We should have an overloaded function without Materialized?
>

 As above

>
> 2.4 KStream#join
>
> We should have an overloaded function without Joined?
>

 as above

>
>
> 2.5 Each of KTable's operators:
>
> We should have an overloaded function without Produced / Serialized /
> Materialized?

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-22 Thread Damian Guy
I've just updated the KIP with some additional changes targeted at
StreamsBuilder

Thanks,
Damian

On Thu, 10 Aug 2017 at 12:59 Damian Guy  wrote:

>
>> Got it, thanks.
>>
>> Does it still make sense to have one static constructors for each spec,
>> with one constructor having only one parameter to make it more usable,
>> i.e.
>> as a user I do not need to give all parameters if I only want to override
>> one of them? Maybe we can just name the constructors as `with` but I'm not
>> sure if Java distinguish:
>>
>> public static  Produced with(final Serde keySerde)
>> public static  Produced with(final Serde valueSerde)
>>
>> as two function signatures.
>>
>>
> No that won't work. That is why we have all options, i.e., on Produce
> public static  Produced with(final Serde keySerde, final 
> Serde
> valueSerde)
> public static  Produced with(final StreamPartitioner
> partitioner, final Serde keySerde, final Serde valueSerde)
> public static  Produced keySerde(final Serde keySerde)
> public static  Produced valueSerde(final Serde valueSerde)
> public static  Produced streamPartitioner(final 
> StreamPartitioner V> partitioner)
>
> So if you only want to use one you can just use the function that takes
> one argument.
>
>>
>> Guozhang
>>
>>
>> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy  wrote:
>>
>> > On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:
>> >
>> > > Damian,
>> > >
>> > > Thanks for the proposal, I had a few comments on the APIs:
>> > >
>> > > 1. Printed#withFile seems not needed, as users should always spec if
>> it
>> > is
>> > > to sysOut or to File at the beginning. In addition as a second
>> thought, I
>> > > think serdes are not useful for prints anyways since we assume
>> `toString`
>> > > is provided except for byte arrays, in which we will special handle
>> it.
>> > >
>> > >
>> > +1
>> >
>> >
>> > > Another comment about Printed in general is it differs with other
>> options
>> > > that it is a required option than optional one, since it includes
>> > toSysOut
>> > > / toFile specs; what are the pros and cons for including these two in
>> the
>> > > option and hence make it a required option than leaving them at the
>> API
>> > > layer and make Printed as optional for mapper / label only?
>> > >
>> > >
>> > It isn't required as we will still have the no-arg print() which will
>> just
>> > go to sysout as it does now.
>> >
>> >
>> > >
>> > > 2.1 KStream#through / to
>> > >
>> > > We should have an overloaded function without Produced?
>> > >
>> >
>> > Yes - we already have those so they are not part of the KIP, i.e,
>> > through(topic)
>> >
>> >
>> > >
>> > > 2.2 KStream#groupBy / groupByKey
>> > >
>> > > We should have an overloaded function without Serialized?
>> > >
>> >
>> > Yes, as above
>> >
>> > >
>> > > 2.3 KGroupedStream#count / reduce / aggregate
>> > >
>> > > We should have an overloaded function without Materialized?
>> > >
>> >
>> > As above
>> >
>> > >
>> > > 2.4 KStream#join
>> > >
>> > > We should have an overloaded function without Joined?
>> > >
>> >
>> > as above
>> >
>> > >
>> > >
>> > > 2.5 Each of KTable's operators:
>> > >
>> > > We should have an overloaded function without Produced / Serialized /
>> > > Materialized?
>> > >
>> > >
>> > as above
>> >
>> >
>> > >
>> > >
>> > > 3.1 Produced: the static functions have overlaps, which seems not
>> > > necessary. I'd suggest jut having the following three static with
>> another
>> > > three similar member functions:
>> > >
>> > > public static  Produced withKeySerde(final Serde
>> keySerde)
>> > >
>> > > public static  Produced withValueSerde(final Serde
>> > > valueSerde)
>> > >
>> > > public static  Produced withStreamPartitioner(final
>> > > StreamPartitioner partitioner)
>> > >
>> > > The key idea is that by using the same function name string for static
>> > > constructor and member functions, users do not need to remember what
>> are
>> > > the differences but can call these functions with any ordering they
>> want,
>> > > and later calls on the same spec will win over early calls.
>> > >
>> > >
>> > That would be great if java supported it, but it doesn't. You can't have
>> > static an member functions with the same signature.
>> >
>> >
>> > >
>> > > 3.2 Serialized: similarly
>> > >
>> > > public static  Serialized withKeySerde(final Serde
>> > keySerde)
>> > >
>> > > public static  Serialized withValueSerde(final Serde
>> > > valueSerde)
>> > >
>> > > public Serialized withKeySerde(final Serde keySerde)
>> > >
>> > > public Serialized withValueSerde(final Serde valueSerde)
>> > >
>> >
>> > as above
>> >
>> >
>> > >
>> > > Also it has a final Serde otherValueSerde in one of its static
>> > > constructor, it that intentional?
>> > >
>> >
>> > Nope: thanks.
>> >
>> > >
>> > > 3.3. 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-10 Thread Damian Guy
> Got it, thanks.
>
> Does it still make sense to have one static constructors for each spec,
> with one constructor having only one parameter to make it more usable, i.e.
> as a user I do not need to give all parameters if I only want to override
> one of them? Maybe we can just name the constructors as `with` but I'm not
> sure if Java distinguish:
>
> public static  Produced with(final Serde keySerde)
> public static  Produced with(final Serde valueSerde)
>
> as two function signatures.
>
>
No that won't work. That is why we have all options, i.e., on Produce
public static  Produced with(final Serde keySerde,
final Serde
valueSerde)
public static  Produced with(final StreamPartitioner
partitioner, final Serde keySerde, final Serde valueSerde)
public static  Produced keySerde(final Serde keySerde)
public static  Produced valueSerde(final Serde valueSerde)
public static  Produced streamPartitioner(final
StreamPartitioner partitioner)

So if you only want to use one you can just use the function that takes one
argument.

>
> Guozhang
>
>
> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy  wrote:
>
> > On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:
> >
> > > Damian,
> > >
> > > Thanks for the proposal, I had a few comments on the APIs:
> > >
> > > 1. Printed#withFile seems not needed, as users should always spec if it
> > is
> > > to sysOut or to File at the beginning. In addition as a second
> thought, I
> > > think serdes are not useful for prints anyways since we assume
> `toString`
> > > is provided except for byte arrays, in which we will special handle it.
> > >
> > >
> > +1
> >
> >
> > > Another comment about Printed in general is it differs with other
> options
> > > that it is a required option than optional one, since it includes
> > toSysOut
> > > / toFile specs; what are the pros and cons for including these two in
> the
> > > option and hence make it a required option than leaving them at the API
> > > layer and make Printed as optional for mapper / label only?
> > >
> > >
> > It isn't required as we will still have the no-arg print() which will
> just
> > go to sysout as it does now.
> >
> >
> > >
> > > 2.1 KStream#through / to
> > >
> > > We should have an overloaded function without Produced?
> > >
> >
> > Yes - we already have those so they are not part of the KIP, i.e,
> > through(topic)
> >
> >
> > >
> > > 2.2 KStream#groupBy / groupByKey
> > >
> > > We should have an overloaded function without Serialized?
> > >
> >
> > Yes, as above
> >
> > >
> > > 2.3 KGroupedStream#count / reduce / aggregate
> > >
> > > We should have an overloaded function without Materialized?
> > >
> >
> > As above
> >
> > >
> > > 2.4 KStream#join
> > >
> > > We should have an overloaded function without Joined?
> > >
> >
> > as above
> >
> > >
> > >
> > > 2.5 Each of KTable's operators:
> > >
> > > We should have an overloaded function without Produced / Serialized /
> > > Materialized?
> > >
> > >
> > as above
> >
> >
> > >
> > >
> > > 3.1 Produced: the static functions have overlaps, which seems not
> > > necessary. I'd suggest jut having the following three static with
> another
> > > three similar member functions:
> > >
> > > public static  Produced withKeySerde(final Serde
> keySerde)
> > >
> > > public static  Produced withValueSerde(final Serde
> > > valueSerde)
> > >
> > > public static  Produced withStreamPartitioner(final
> > > StreamPartitioner partitioner)
> > >
> > > The key idea is that by using the same function name string for static
> > > constructor and member functions, users do not need to remember what
> are
> > > the differences but can call these functions with any ordering they
> want,
> > > and later calls on the same spec will win over early calls.
> > >
> > >
> > That would be great if java supported it, but it doesn't. You can't have
> > static an member functions with the same signature.
> >
> >
> > >
> > > 3.2 Serialized: similarly
> > >
> > > public static  Serialized withKeySerde(final Serde
> > keySerde)
> > >
> > > public static  Serialized withValueSerde(final Serde
> > > valueSerde)
> > >
> > > public Serialized withKeySerde(final Serde keySerde)
> > >
> > > public Serialized withValueSerde(final Serde valueSerde)
> > >
> >
> > as above
> >
> >
> > >
> > > Also it has a final Serde otherValueSerde in one of its static
> > > constructor, it that intentional?
> > >
> >
> > Nope: thanks.
> >
> > >
> > > 3.3. Joined: similarly, keep the static constructor signatures the same
> > as
> > > its corresponding member fields.
> > >
> > >
> > As above
> >
> >
> > > 3.4 Materialized: it is a bit special, and I think we can keep its
> static
> > > constructors with only two `as` as they are today.K
> > >
> > >
> > 4. Is there any modifications on StateStoreSupplier? Is it replaced 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-09 Thread Damian Guy
On Wed, 9 Aug 2017 at 20:00 Guozhang Wang  wrote:

> >> Another comment about Printed in general is it differs with other
> options
> >> that it is a required option than optional one, since it includes
> toSysOut
> >> / toFile specs; what are the pros and cons for including these two in
> the
> >> option and hence make it a required option than leaving them at the API
> >> layer and make Printed as optional for mapper / label only?
> >>
> >>
> >It isn't required as we will still have the no-arg print() which will just
> >go to sysout as it does now.
>
> Got it. So just to clarify are we going to deprecate writeAsText or not?
>
>
Correct.


>
> On Wed, Aug 9, 2017 at 11:38 AM, Guozhang Wang  wrote:
>
> > >> The key idea is that by using the same function name string for static
> > >> constructor and member functions, users do not need to remember what
> > are
> > >> the differences but can call these functions with any ordering they
> > want,
> > >> and later calls on the same spec will win over early calls.
> > >>
> > >>
> > >That would be great if java supported it, but it doesn't. You can't have
> > >static an member functions with the same signature.
> >
> > Got it, thanks.
> >
> > Does it still make sense to have one static constructors for each spec,
> > with one constructor having only one parameter to make it more usable,
> i.e.
> > as a user I do not need to give all parameters if I only want to override
> > one of them? Maybe we can just name the constructors as `with` but I'm
> not
> > sure if Java distinguish:
> >
> > public static  Produced with(final Serde keySerde)
> > public static  Produced with(final Serde valueSerde)
> >
> > as two function signatures.
> >
> >
> > Guozhang
> >
> >
> > On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy  wrote:
> >
> >> On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:
> >>
> >> > Damian,
> >> >
> >> > Thanks for the proposal, I had a few comments on the APIs:
> >> >
> >> > 1. Printed#withFile seems not needed, as users should always spec if
> it
> >> is
> >> > to sysOut or to File at the beginning. In addition as a second
> thought,
> >> I
> >> > think serdes are not useful for prints anyways since we assume
> >> `toString`
> >> > is provided except for byte arrays, in which we will special handle
> it.
> >> >
> >> >
> >> +1
> >>
> >>
> >> > Another comment about Printed in general is it differs with other
> >> options
> >> > that it is a required option than optional one, since it includes
> >> toSysOut
> >> > / toFile specs; what are the pros and cons for including these two in
> >> the
> >> > option and hence make it a required option than leaving them at the
> API
> >> > layer and make Printed as optional for mapper / label only?
> >> >
> >> >
> >> It isn't required as we will still have the no-arg print() which will
> just
> >> go to sysout as it does now.
> >>
> >>
> >> >
> >> > 2.1 KStream#through / to
> >> >
> >> > We should have an overloaded function without Produced?
> >> >
> >>
> >> Yes - we already have those so they are not part of the KIP, i.e,
> >> through(topic)
> >>
> >>
> >> >
> >> > 2.2 KStream#groupBy / groupByKey
> >> >
> >> > We should have an overloaded function without Serialized?
> >> >
> >>
> >> Yes, as above
> >>
> >> >
> >> > 2.3 KGroupedStream#count / reduce / aggregate
> >> >
> >> > We should have an overloaded function without Materialized?
> >> >
> >>
> >> As above
> >>
> >> >
> >> > 2.4 KStream#join
> >> >
> >> > We should have an overloaded function without Joined?
> >> >
> >>
> >> as above
> >>
> >> >
> >> >
> >> > 2.5 Each of KTable's operators:
> >> >
> >> > We should have an overloaded function without Produced / Serialized /
> >> > Materialized?
> >> >
> >> >
> >> as above
> >>
> >>
> >> >
> >> >
> >> > 3.1 Produced: the static functions have overlaps, which seems not
> >> > necessary. I'd suggest jut having the following three static with
> >> another
> >> > three similar member functions:
> >> >
> >> > public static  Produced withKeySerde(final Serde
> >> keySerde)
> >> >
> >> > public static  Produced withValueSerde(final Serde
> >> > valueSerde)
> >> >
> >> > public static  Produced withStreamPartitioner(final
> >> > StreamPartitioner partitioner)
> >> >
> >> > The key idea is that by using the same function name string for static
> >> > constructor and member functions, users do not need to remember what
> are
> >> > the differences but can call these functions with any ordering they
> >> want,
> >> > and later calls on the same spec will win over early calls.
> >> >
> >> >
> >> That would be great if java supported it, but it doesn't. You can't have
> >> static an member functions with the same signature.
> >>
> >>
> >> >
> >> > 3.2 Serialized: similarly
> >> >
> >> > public static  Serialized withKeySerde(final Serde
> >> keySerde)
> >> >
> >> > public static 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-09 Thread Guozhang Wang
>> Another comment about Printed in general is it differs with other options
>> that it is a required option than optional one, since it includes
toSysOut
>> / toFile specs; what are the pros and cons for including these two in the
>> option and hence make it a required option than leaving them at the API
>> layer and make Printed as optional for mapper / label only?
>>
>>
>It isn't required as we will still have the no-arg print() which will just
>go to sysout as it does now.

Got it. So just to clarify are we going to deprecate writeAsText or not?

Guozhang


On Wed, Aug 9, 2017 at 11:38 AM, Guozhang Wang  wrote:

> >> The key idea is that by using the same function name string for static
> >> constructor and member functions, users do not need to remember what
> are
> >> the differences but can call these functions with any ordering they
> want,
> >> and later calls on the same spec will win over early calls.
> >>
> >>
> >That would be great if java supported it, but it doesn't. You can't have
> >static an member functions with the same signature.
>
> Got it, thanks.
>
> Does it still make sense to have one static constructors for each spec,
> with one constructor having only one parameter to make it more usable, i.e.
> as a user I do not need to give all parameters if I only want to override
> one of them? Maybe we can just name the constructors as `with` but I'm not
> sure if Java distinguish:
>
> public static  Produced with(final Serde keySerde)
> public static  Produced with(final Serde valueSerde)
>
> as two function signatures.
>
>
> Guozhang
>
>
> On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy  wrote:
>
>> On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:
>>
>> > Damian,
>> >
>> > Thanks for the proposal, I had a few comments on the APIs:
>> >
>> > 1. Printed#withFile seems not needed, as users should always spec if it
>> is
>> > to sysOut or to File at the beginning. In addition as a second thought,
>> I
>> > think serdes are not useful for prints anyways since we assume
>> `toString`
>> > is provided except for byte arrays, in which we will special handle it.
>> >
>> >
>> +1
>>
>>
>> > Another comment about Printed in general is it differs with other
>> options
>> > that it is a required option than optional one, since it includes
>> toSysOut
>> > / toFile specs; what are the pros and cons for including these two in
>> the
>> > option and hence make it a required option than leaving them at the API
>> > layer and make Printed as optional for mapper / label only?
>> >
>> >
>> It isn't required as we will still have the no-arg print() which will just
>> go to sysout as it does now.
>>
>>
>> >
>> > 2.1 KStream#through / to
>> >
>> > We should have an overloaded function without Produced?
>> >
>>
>> Yes - we already have those so they are not part of the KIP, i.e,
>> through(topic)
>>
>>
>> >
>> > 2.2 KStream#groupBy / groupByKey
>> >
>> > We should have an overloaded function without Serialized?
>> >
>>
>> Yes, as above
>>
>> >
>> > 2.3 KGroupedStream#count / reduce / aggregate
>> >
>> > We should have an overloaded function without Materialized?
>> >
>>
>> As above
>>
>> >
>> > 2.4 KStream#join
>> >
>> > We should have an overloaded function without Joined?
>> >
>>
>> as above
>>
>> >
>> >
>> > 2.5 Each of KTable's operators:
>> >
>> > We should have an overloaded function without Produced / Serialized /
>> > Materialized?
>> >
>> >
>> as above
>>
>>
>> >
>> >
>> > 3.1 Produced: the static functions have overlaps, which seems not
>> > necessary. I'd suggest jut having the following three static with
>> another
>> > three similar member functions:
>> >
>> > public static  Produced withKeySerde(final Serde
>> keySerde)
>> >
>> > public static  Produced withValueSerde(final Serde
>> > valueSerde)
>> >
>> > public static  Produced withStreamPartitioner(final
>> > StreamPartitioner partitioner)
>> >
>> > The key idea is that by using the same function name string for static
>> > constructor and member functions, users do not need to remember what are
>> > the differences but can call these functions with any ordering they
>> want,
>> > and later calls on the same spec will win over early calls.
>> >
>> >
>> That would be great if java supported it, but it doesn't. You can't have
>> static an member functions with the same signature.
>>
>>
>> >
>> > 3.2 Serialized: similarly
>> >
>> > public static  Serialized withKeySerde(final Serde
>> keySerde)
>> >
>> > public static  Serialized withValueSerde(final Serde
>> > valueSerde)
>> >
>> > public Serialized withKeySerde(final Serde keySerde)
>> >
>> > public Serialized withValueSerde(final Serde valueSerde)
>> >
>>
>> as above
>>
>>
>> >
>> > Also it has a final Serde otherValueSerde in one of its static
>> > constructor, it that intentional?
>> >
>>
>> Nope: thanks.
>>
>> >
>> > 3.3. Joined: 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-09 Thread Guozhang Wang
>> The key idea is that by using the same function name string for static
>> constructor and member functions, users do not need to remember what are
>> the differences but can call these functions with any ordering they want,
>> and later calls on the same spec will win over early calls.
>>
>>
>That would be great if java supported it, but it doesn't. You can't have
>static an member functions with the same signature.

Got it, thanks.

Does it still make sense to have one static constructors for each spec,
with one constructor having only one parameter to make it more usable, i.e.
as a user I do not need to give all parameters if I only want to override
one of them? Maybe we can just name the constructors as `with` but I'm not
sure if Java distinguish:

public static  Produced with(final Serde keySerde)
public static  Produced with(final Serde valueSerde)

as two function signatures.


Guozhang


On Wed, Aug 9, 2017 at 6:20 AM, Damian Guy  wrote:

> On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:
>
> > Damian,
> >
> > Thanks for the proposal, I had a few comments on the APIs:
> >
> > 1. Printed#withFile seems not needed, as users should always spec if it
> is
> > to sysOut or to File at the beginning. In addition as a second thought, I
> > think serdes are not useful for prints anyways since we assume `toString`
> > is provided except for byte arrays, in which we will special handle it.
> >
> >
> +1
>
>
> > Another comment about Printed in general is it differs with other options
> > that it is a required option than optional one, since it includes
> toSysOut
> > / toFile specs; what are the pros and cons for including these two in the
> > option and hence make it a required option than leaving them at the API
> > layer and make Printed as optional for mapper / label only?
> >
> >
> It isn't required as we will still have the no-arg print() which will just
> go to sysout as it does now.
>
>
> >
> > 2.1 KStream#through / to
> >
> > We should have an overloaded function without Produced?
> >
>
> Yes - we already have those so they are not part of the KIP, i.e,
> through(topic)
>
>
> >
> > 2.2 KStream#groupBy / groupByKey
> >
> > We should have an overloaded function without Serialized?
> >
>
> Yes, as above
>
> >
> > 2.3 KGroupedStream#count / reduce / aggregate
> >
> > We should have an overloaded function without Materialized?
> >
>
> As above
>
> >
> > 2.4 KStream#join
> >
> > We should have an overloaded function without Joined?
> >
>
> as above
>
> >
> >
> > 2.5 Each of KTable's operators:
> >
> > We should have an overloaded function without Produced / Serialized /
> > Materialized?
> >
> >
> as above
>
>
> >
> >
> > 3.1 Produced: the static functions have overlaps, which seems not
> > necessary. I'd suggest jut having the following three static with another
> > three similar member functions:
> >
> > public static  Produced withKeySerde(final Serde keySerde)
> >
> > public static  Produced withValueSerde(final Serde
> > valueSerde)
> >
> > public static  Produced withStreamPartitioner(final
> > StreamPartitioner partitioner)
> >
> > The key idea is that by using the same function name string for static
> > constructor and member functions, users do not need to remember what are
> > the differences but can call these functions with any ordering they want,
> > and later calls on the same spec will win over early calls.
> >
> >
> That would be great if java supported it, but it doesn't. You can't have
> static an member functions with the same signature.
>
>
> >
> > 3.2 Serialized: similarly
> >
> > public static  Serialized withKeySerde(final Serde
> keySerde)
> >
> > public static  Serialized withValueSerde(final Serde
> > valueSerde)
> >
> > public Serialized withKeySerde(final Serde keySerde)
> >
> > public Serialized withValueSerde(final Serde valueSerde)
> >
>
> as above
>
>
> >
> > Also it has a final Serde otherValueSerde in one of its static
> > constructor, it that intentional?
> >
>
> Nope: thanks.
>
> >
> > 3.3. Joined: similarly, keep the static constructor signatures the same
> as
> > its corresponding member fields.
> >
> >
> As above
>
>
> > 3.4 Materialized: it is a bit special, and I think we can keep its static
> > constructors with only two `as` as they are today.K
> >
> >
> 4. Is there any modifications on StateStoreSupplier? Is it replaced by
> > BytesStoreSupplier? Seems some more descriptions are lacking here. Also
> in
> >
> >
> No modifications to StateStoreSupplier. It is superseceded by
> BytesStoreSupplier.
>
>
>
> > public static  Materialized
> > as(final StateStoreSupplier
> > supplier)
> >
> > Is the parameter in type of BytesStoreSupplier?
> >
>
> Yep - thanks
>
>
> >
> >
> >
> >
> > Guozhang
> >
> >
> > On Thu, Jul 27, 2017 at 5:26 AM, Damian Guy 
> wrote:
> >
> > > 

Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-09 Thread Damian Guy
On Tue, 8 Aug 2017 at 20:11 Guozhang Wang  wrote:

> Damian,
>
> Thanks for the proposal, I had a few comments on the APIs:
>
> 1. Printed#withFile seems not needed, as users should always spec if it is
> to sysOut or to File at the beginning. In addition as a second thought, I
> think serdes are not useful for prints anyways since we assume `toString`
> is provided except for byte arrays, in which we will special handle it.
>
>
+1


> Another comment about Printed in general is it differs with other options
> that it is a required option than optional one, since it includes toSysOut
> / toFile specs; what are the pros and cons for including these two in the
> option and hence make it a required option than leaving them at the API
> layer and make Printed as optional for mapper / label only?
>
>
It isn't required as we will still have the no-arg print() which will just
go to sysout as it does now.


>
> 2.1 KStream#through / to
>
> We should have an overloaded function without Produced?
>

Yes - we already have those so they are not part of the KIP, i.e,
through(topic)


>
> 2.2 KStream#groupBy / groupByKey
>
> We should have an overloaded function without Serialized?
>

Yes, as above

>
> 2.3 KGroupedStream#count / reduce / aggregate
>
> We should have an overloaded function without Materialized?
>

As above

>
> 2.4 KStream#join
>
> We should have an overloaded function without Joined?
>

as above

>
>
> 2.5 Each of KTable's operators:
>
> We should have an overloaded function without Produced / Serialized /
> Materialized?
>
>
as above


>
>
> 3.1 Produced: the static functions have overlaps, which seems not
> necessary. I'd suggest jut having the following three static with another
> three similar member functions:
>
> public static  Produced withKeySerde(final Serde keySerde)
>
> public static  Produced withValueSerde(final Serde
> valueSerde)
>
> public static  Produced withStreamPartitioner(final
> StreamPartitioner partitioner)
>
> The key idea is that by using the same function name string for static
> constructor and member functions, users do not need to remember what are
> the differences but can call these functions with any ordering they want,
> and later calls on the same spec will win over early calls.
>
>
That would be great if java supported it, but it doesn't. You can't have
static an member functions with the same signature.


>
> 3.2 Serialized: similarly
>
> public static  Serialized withKeySerde(final Serde keySerde)
>
> public static  Serialized withValueSerde(final Serde
> valueSerde)
>
> public Serialized withKeySerde(final Serde keySerde)
>
> public Serialized withValueSerde(final Serde valueSerde)
>

as above


>
> Also it has a final Serde otherValueSerde in one of its static
> constructor, it that intentional?
>

Nope: thanks.

>
> 3.3. Joined: similarly, keep the static constructor signatures the same as
> its corresponding member fields.
>
>
As above


> 3.4 Materialized: it is a bit special, and I think we can keep its static
> constructors with only two `as` as they are today.K
>
>
4. Is there any modifications on StateStoreSupplier? Is it replaced by
> BytesStoreSupplier? Seems some more descriptions are lacking here. Also in
>
>
No modifications to StateStoreSupplier. It is superseceded by
BytesStoreSupplier.



> public static  Materialized
> as(final StateStoreSupplier
> supplier)
>
> Is the parameter in type of BytesStoreSupplier?
>

Yep - thanks


>
>
>
>
> Guozhang
>
>
> On Thu, Jul 27, 2017 at 5:26 AM, Damian Guy  wrote:
>
> > Updated link:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+
> > use+of+custom+storage+engines
> >
> > Thanks,
> > Damian
> >
> > On Thu, 27 Jul 2017 at 13:09 Damian Guy  wrote:
> >
> > > Hi,
> > >
> > > I've put together a KIP to make some changes to the KafkaStreams DSL
> that
> > > will hopefully allow us to:
> > > 1) reduce the explosion of overloads
> > > 2) add new features without having to continue adding more overloads
> > > 3) provide simpler ways for people to use custom storage engines and
> wrap
> > > them with logging, caching etc if desired
> > > 4) enable per-operator caching rather than global caching without
> having
> > > to resort to supplying a StateStoreSupplier when you just want to turn
> > > caching off.
> > >
> > > The KIP is here:
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=73631309
> > >
> > > Thanks,
> > > Damian
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-08-08 Thread Guozhang Wang
Damian,

Thanks for the proposal, I had a few comments on the APIs:

1. Printed#withFile seems not needed, as users should always spec if it is
to sysOut or to File at the beginning. In addition as a second thought, I
think serdes are not useful for prints anyways since we assume `toString`
is provided except for byte arrays, in which we will special handle it.

Another comment about Printed in general is it differs with other options
that it is a required option than optional one, since it includes toSysOut
/ toFile specs; what are the pros and cons for including these two in the
option and hence make it a required option than leaving them at the API
layer and make Printed as optional for mapper / label only?


2.1 KStream#through / to

We should have an overloaded function without Produced?

2.2 KStream#groupBy / groupByKey

We should have an overloaded function without Serialized?

2.3 KGroupedStream#count / reduce / aggregate

We should have an overloaded function without Materialized?

2.4 KStream#join

We should have an overloaded function without Joined?


2.5 Each of KTable's operators:

We should have an overloaded function without Produced / Serialized /
Materialized?



3.1 Produced: the static functions have overlaps, which seems not
necessary. I'd suggest jut having the following three static with another
three similar member functions:

public static  Produced withKeySerde(final Serde keySerde)

public static  Produced withValueSerde(final Serde
valueSerde)

public static  Produced withStreamPartitioner(final
StreamPartitioner partitioner)

The key idea is that by using the same function name string for static
constructor and member functions, users do not need to remember what are
the differences but can call these functions with any ordering they want,
and later calls on the same spec will win over early calls.


3.2 Serialized: similarly

public static  Serialized withKeySerde(final Serde keySerde)

public static  Serialized withValueSerde(final Serde
valueSerde)

public Serialized withKeySerde(final Serde keySerde)

public Serialized withValueSerde(final Serde valueSerde)

Also it has a final Serde otherValueSerde in one of its static
constructor, it that intentional?

3.3. Joined: similarly, keep the static constructor signatures the same as
its corresponding member fields.

3.4 Materialized: it is a bit special, and I think we can keep its static
constructors with only two `as` as they are today.K


4. Is there any modifications on StateStoreSupplier? Is it replaced by
BytesStoreSupplier? Seems some more descriptions are lacking here. Also in

public static  Materialized
as(final StateStoreSupplier
supplier)

Is the parameter in type of BytesStoreSupplier?




Guozhang


On Thu, Jul 27, 2017 at 5:26 AM, Damian Guy  wrote:

> Updated link:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+
> use+of+custom+storage+engines
>
> Thanks,
> Damian
>
> On Thu, 27 Jul 2017 at 13:09 Damian Guy  wrote:
>
> > Hi,
> >
> > I've put together a KIP to make some changes to the KafkaStreams DSL that
> > will hopefully allow us to:
> > 1) reduce the explosion of overloads
> > 2) add new features without having to continue adding more overloads
> > 3) provide simpler ways for people to use custom storage engines and wrap
> > them with logging, caching etc if desired
> > 4) enable per-operator caching rather than global caching without having
> > to resort to supplying a StateStoreSupplier when you just want to turn
> > caching off.
> >
> > The KIP is here:
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=73631309
> >
> > Thanks,
> > Damian
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-182: Reduce Streams DSL overloads and allow easier use of custom storage engines

2017-07-27 Thread Damian Guy
Updated link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+use+of+custom+storage+engines

Thanks,
Damian

On Thu, 27 Jul 2017 at 13:09 Damian Guy  wrote:

> Hi,
>
> I've put together a KIP to make some changes to the KafkaStreams DSL that
> will hopefully allow us to:
> 1) reduce the explosion of overloads
> 2) add new features without having to continue adding more overloads
> 3) provide simpler ways for people to use custom storage engines and wrap
> them with logging, caching etc if desired
> 4) enable per-operator caching rather than global caching without having
> to resort to supplying a StateStoreSupplier when you just want to turn
> caching off.
>
> The KIP is here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=73631309
>
> Thanks,
> Damian
>