Hi Srikanth, 1) there is no support on DSL level, but if you use Processor API you can do "anything" you like. So yes, a map-like transform() that gets initialized with the "broadcast-side" of the join should work.
2) Right now, there is no way to stall a stream -- a custom TimestampExtractor will not do the tricker either, because Kafka Streams allows for out-of-order/late arriving data -- thus, it processes what is available without "waiting" for late data... Of course, you could build a custom solution via transfrom() again. -Matthias On 05/21/2016 05:24 AM, Srikanth wrote: > Hello, > > I'm writing a workflow using kafka streams where an incoming stream needs > to be denormalized and joined with a few dimension table. > It will be written back to another kafka topic. Fairly typical I believe. > > 1) Can I do broadcast join if my dimension table is small enough to be held > in each processor's local state? > Not doing this will force KStream to be shuffled with an internal topic. > If not should I write a transformer that reads the table in init() and > cache's it. I can then do a map like transformation in transform() with > cache lookup instead of a join. > > > 2) When joining a KStream withe KTable for bigger tables, how do I stall > reading KStream until KTable is completely materialized? > I know completely read is a very loose term in stream processing :-) The > KTable is going to be fairly static and needs the "current content" to be > read completely before it can be used for joins. > The only option for synchronizing streams I see is TimestampExtractor. I > can't figure out how to use that because > > i) The join between KStream and KTable is time agnostic and doesn't > really fit into a window operation. > ii) TimestampExtractor is set as a stream config at global level. > Timestamp extraction logic on the other hand will be specific to each > stream. > How does one write a generic extractor? > > Thanks, > Srikanth >
signature.asc
Description: OpenPGP digital signature