Hi,

there was a similar discussion on the list already "Kafka stream join
scenario":

http://search-hadoop.com/m/uyzND1WsAGW1vB5O91&subj=Kafka+stream+join+scenarios

Long story short: there is no explicit support or guarantee. As Jay
mentioned, some alignment is best effort. However, the main issues is
the question, what does it mean to load a KTable *completely*. As a
KTable consumers a changelog stream, there is no defined end as a KTable
is a always/infinitely updating "dynamic" table...

You might be able to build a custom solution for it, though (see the
email thread I linked above). Hope this helps.


-Matthias

On 06/29/2016 04:26 AM, Gwen Shapira wrote:
> Upgrade :)
> 
> On Tue, Jun 28, 2016 at 6:49 PM, Rohit Valsakumar <rvalsaku...@tivo.com> 
> wrote:
>> Hi Jay,
>>
>> Thanks for the reply.
>>
>> Unfortunately in our case due to legacy reasons we are using
>> WallclockTimestampExtractor in the application for all the streams and the
>> existing messages in the stream probably won¹t have timestamps as they are
>> being produced by legacy clients. So the events are being ingested with
>> processing times and it may not be able to synchronize based on the
>> message timestamps. What do you recommend for this scenario?
>>
>> Rohit
>>
>> On 6/28/16, 5:18 PM, "Jay Kreps" <j...@confluent.io> wrote:
>>
>>> I think you may get this for free as Kafka Streams attempts to align
>>> consumption across different topics/partitions by the timestamp in the
>>> messages. So in a case where you are starting a job fresh and it has a
>>> database changelog to consume and a event stream to consume, it will
>>> attempt to keep the Ktable at the "time" the event stream is at. This is
>>> only a heuristic, of course, since messages are necessarily strongly
>>> ordered by time. I think this is likely mostly the same but slightly
>>> better
>>> than the bootstrap usage in Samza but also covers other cases of
>>> alignment.
>>>
>>> If you want more control you can override the timestamp extractor that
>>> associates time and hence priority for the streams:
>>> https://kafka.apache.org/0100/javadoc/org/apache/kafka/streams/processor/T
>>> imestampExtractor.html
>>>
>>> -Jay
>>>
>>> On Tue, Jun 28, 2016 at 2:49 PM, Rohit Valsakumar <rvalsaku...@tivo.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Is there a way to consume all the contents of a kafka topic into a
>>>> KTable
>>>> before doing a left join with another Kstream?
>>>>
>>>> I am looking at something that simulates a bootstrap topic in a Samza
>>>> job.
>>>>
>>>> Thanks,
>>>> Rohit Valsakumar
>>>>
>>>> ________________________________
>>>>
>>>> This email and any attachments may contain confidential and privileged
>>>> material for the sole use of the intended recipient. Any review,
>>>> copying,
>>>> or distribution of this email (or any attachments) by others is
>>>> prohibited.
>>>> If you are not the intended recipient, please contact the sender
>>>> immediately and permanently delete this email and any attachments. No
>>>> employee or agent of TiVo Inc. is authorized to conclude any binding
>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
>>>> Inc. may only be made by a signed written agreement.
>>>>
>>
>>
>> ________________________________
>>
>> This email and any attachments may contain confidential and privileged 
>> material for the sole use of the intended recipient. Any review, copying, or 
>> distribution of this email (or any attachments) by others is prohibited. If 
>> you are not the intended recipient, please contact the sender immediately 
>> and permanently delete this email and any attachments. No employee or agent 
>> of TiVo Inc. is authorized to conclude any binding agreement on behalf of 
>> TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a 
>> signed written agreement.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to