Re: KStream/KTable prioritization/initial offset

2016-03-25 Thread Guozhang Wang
Hello Greg, You can pass underlying client configs for consumer and producer as well into StreamsConfig, e.g.: props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); As for your case, since ConsumerConfig.AUTO_OFFSET_REST_CONFIG takes universal effects on all topics that do not have

Re: KStream/KTable prioritization/initial offset

2016-03-25 Thread Greg Fodor
Hey Jay, thanks -- the timestamp alignment makes sense and clears up why prioritization isn't needed, thanks. It sounds like the one thing I still don't understand is how we can tell Kafka Streams to start reading from the latest or earliest offset for a given source. I might be thinking about

Re: KStream/KTable prioritization/initial offset

2016-03-25 Thread Jay Kreps
Hey Greg, I think the use case you have is that you have some derived state and you want to make sure it is fully populated before any of the input streams are populated. There would be two ways to accomplish this: (1) have a changelog that captures the state, (2) reprocess the input to recreate

Re: KStream/KTable prioritization/initial offset

2016-03-25 Thread Greg Fodor
That's great. Is there anything I can do to force that particular topic to start at the earliest message every time the job is started? On Mar 25, 2016 10:20 AM, "Yasuhiro Matsuda" wrote: > It may not be ideal, but there is a way to prioritize particular topics. It >

Re: KStream/KTable prioritization/initial offset

2016-03-25 Thread Yasuhiro Matsuda
It may not be ideal, but there is a way to prioritize particular topics. It is to set the record timestamps to zero. This can be done by using a custom TimestampExtractor. Kafka Streams tries to synchronize multiple streams using the extracted timestamps. So, records with the timestamp 0 have

KStream/KTable prioritization/initial offset

2016-03-24 Thread Greg Fodor
Really digging Kafka Streams so far, nice work all. I'm interested in being able to materialize one or more KTables in full before the rest of the topology begins processing messages. This seems fundamentally useful since it allows you to get your database tables replicated up off the change