Re: Caching in Kafka Streams to ignore garbage message

2017-04-30 Thread Matthias J. Sax
Ah. Sorry. You are right. Nevertheless, you can set an non-null dummy value like `byte[0]` instead of the actual "tuple" to not blow up your storage requirement. -Matthias On 4/30/17 10:24 AM, Michal Borowiecki wrote: > Apologies, I must have not made myself clear. > > I meant the values in t

Re: Caching in Kafka Streams to ignore garbage message

2017-04-30 Thread Michal Borowiecki
Apologies, I must have not made myself clear. I meant the values in the records coming from the input topic (which in turn are coming from kafka connect in the example at hand) and not the records coming out of the join. My intention was to warn against sending null values from kafka connect

Re: Caching in Kafka Streams to ignore garbage message

2017-04-30 Thread Matthias J. Sax
Your observation is correct. If you use inner KStream-KTable join, the join will implement the filter automatically as the join will not return any result. -Matthias On 4/30/17 7:23 AM, Michal Borowiecki wrote: > I have something working on the same principle (except not using > connect), th

Re: Caching in Kafka Streams to ignore garbage message

2017-04-30 Thread Michal Borowiecki
I have something working on the same principle (except not using connect), that is, I put ids to filter on into a ktable and then (inner) join a kstream with that ktable. I don't believe the value can be null though. In a changlog null value is interpreted as a delete so won't be put into a kt

Re: Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Matthias J. Sax
>> I'd like to avoid repeated trips to the db, and caching a large amount of >> data in memory. Lookups to the DB would be hard to get done anyway. Ie, it would not perform well, as all your calls would need to be synchronous... >> Is it possible to send a message w/ the id as the partition key

Re: Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Ali Akhtar
I'd like to avoid repeated trips to the db, and caching a large amount of data in memory. Is it possible to send a message w/ the id as the partition key to a topic, and then use the same id as the key, so the same node which will receive the data for an id is the one which will process it? On F

Re: Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Matthias J. Sax
The recommended solution would be to use Kafka Connect to load you DB data into a Kafka topic. With Kafka Streams you read your db-topic as KTable and do a (inne) KStream-KTable join to lookup the IDs. -Matthias On 4/27/17 2:22 PM, Ali Akhtar wrote: > I have a Kafka topic which will receive a l

Caching in Kafka Streams to ignore garbage message

2017-04-27 Thread Ali Akhtar
I have a Kafka topic which will receive a large amount of data. This data has an 'id' field. I need to look up the id in an external db, see if we are tracking that id, and if yes, we process that message, if not, we ignore it. 99% of the data will be for ids which are not being tracked - 1% or s