With regard to the JIRA. I guess we do not want to put the end timestamp
into the key. For general usage, windows of different type are written
into different topics.

Thus, Nicolas' use case is quite special and using custom Serde is the
better approach to handle it, instead of changing Kafka Streams.

Nicolas, of course you are still welcome to work on
https://issues.apache.org/jira/browse/KAFKA-4468 but the patch should
not change the key format but only compute the correct window end
timestamp if a window gets deserialized.

@Guozhang: please correct me if I am wrong and we want to follow Eno's
suggestion.


-Matthias

On 1/17/17 1:39 AM, Eno Thereska wrote:
> For changes that may be backwards incompatible or change the APIs we usually 
> do a short KIP first (e.g., I just did one yesterday: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-114%3A+KTable+materialization+and+improved+semantics
>  
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-114:+KTable+materialization+and+improved+semantics>).
>  It's not meant to be overly-burdensome, and it encourages the community to 
> participate in the design. In this case I suspect the KIP can be very short, 
> a paragraph or so.
> 
> Thanks
> Eno
> 
>> On 16 Jan 2017, at 22:52, Nicolas Fouché <nfou...@onfocus.io> wrote:
>>
>> In the case of KAFKA-4468, it's more about state stores. But still, keys
>> would not be backward compatible. What is the "official" policy about this
>> kind of change ?
>>
>> 2017-01-16 23:47 GMT+01:00 Nicolas Fouché <nfou...@onfocus.io>:
>>
>>> Hi Eno,
>>> I thought it would be impossible to put this in Kafka because of backward
>>> incompatibility with the existing windowed keys, no ?
>>> In my case, I had to recreate a new output topic, reset the topology, and
>>> and reprocess all my data.
>>>
>>> 2017-01-16 23:05 GMT+01:00 Eno Thereska <eno.there...@gmail.com>:
>>>
>>>> Nicolas,
>>>>
>>>> I'm checking with Bill who originally was interested in KAFKA-4468. If he
>>>> isn't actively working on it, why don't you give it a go and create a pull
>>>> request (PR) for it? That way your contribution is properly acknowledged
>>>> etc. We can help you through with that.
>>>>
>>>> Thanks
>>>> Eno
>>>>> On 16 Jan 2017, at 18:46, Nicolas Fouché <nfou...@onfocus.io> wrote:
>>>>>
>>>>> My current implementation:
>>>>> https://gist.github.com/nfo/eaf350afb5667a3516593da4d48e757a . I just
>>>>> appended the window `end` at the end of the byte array.
>>>>> Comments and suggestions are welcome !
>>>>>
>>>>>
>>>>> 2017-01-16 15:48 GMT+01:00 Nicolas Fouché <nfou...@onfocus.io>:
>>>>>
>>>>>> Hi Damian,
>>>>>>
>>>>>> I recall now that I copied the `WindowedSerde` class [1] from Confluent
>>>>>> examples by Confluent, which uses the internal `WindowedSerializer`
>>>> class.
>>>>>> Better write my own Serde them. You're right, I should not rely on
>>>>>> internal classes, especially for data written outside Kafka Streams
>>>>>> topologies.
>>>>>>
>>>>>> Thanks for the insights on KAFKA-4468.
>>>>>>
>>>>>> https://github.com/confluentinc/examples/blob/
>>>>>> 89db45c6890cf757b8e18565bdf7bc23f119a2ff/kafka-streams/src/
>>>>>> main/java/io/confluent/examples/streams/utils/WindowedSerde.java
>>>>>>
>>>>>> Nicolas.
>>>>>>
>>>>>> 2017-01-16 12:31 GMT+01:00 Damian Guy <damian....@gmail.com>:
>>>>>>
>>>>>>> Hi Nicolas,
>>>>>>>
>>>>>>> I guess you are using the Processor API for your topology? The
>>>>>>> WindowedSerializer is an internal class that is used as part of the
>>>> DSL.
>>>>>>> In
>>>>>>> the DSL a topic will be created for each window operation, so we don't
>>>>>>> need
>>>>>>> the end time as it can be calculated from the window size.
>>>>>>> However, there is an open jira for this:
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-4468
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Damian
>>>>>>>
>>>>>>> On Mon, 16 Jan 2017 at 11:18 Nicolas Fouché <nfou...@onfocus.io>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In the same topology, I generate aggregates with 1-day windows and
>>>>>>> 1-week
>>>>>>>> windows and write them in one single topic. On Mondays, these windows
>>>>>>> have
>>>>>>>> the same start time. The effect: these aggregates overrides each
>>>> other.
>>>>>>>>
>>>>>>>> That happens because WindowedSerializer [1] only serializes the
>>>> window
>>>>>>>> start time. I'm a bit surprised, a window has by definition a start
>>>> and
>>>>>>> an
>>>>>>>> end. I suppose one wanted save on key sizes ? And/or one would
>>>> consider
>>>>>>>> that topics should not contain aggregates with different
>>>> granularities ?
>>>>>>>>
>>>>>>>> I have two choices then, either create as many output topics as I
>>>> have
>>>>>>>> granularities, or create my own serializer which also includes the
>>>>>>> window
>>>>>>>> end time. What would the community recommend ?
>>>>>>>>
>>>>>>>> Getting back to the core problem:
>>>>>>>> I could understand that it's not "right" to store different
>>>>>>> granularities
>>>>>>>> in one topic, and I thought it would save resources (less topic to
>>>>>>> manage
>>>>>>>> by Kafka). But, I'm really not sure about this default serializer: it
>>>>>>> does
>>>>>>>> not serialize all instance variables of the `Window` class, and more
>>>>>>>> generally does comply to the definition of a window.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>> https://github.com/apache/kafka/blob/0.10.1/streams/src/main
>>>>>>> /java/org/apache/kafka/streams/kstream/internals/WindowedSer
>>>> ializer.java
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Nicolas
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to