Hi I’m looking for some advice for the “right” way to load historical data into a stream.
The case is as follow. I have a stream, sometimes I need to match the current live stream data up against data stored in database, let say elasticsearch, I generate a side output with the query information and now want get the rows from elasticsearch the number of rows can be high so I want to read in a paginated way and forward each response downstream as received. This also means that I have to execute n queries against elasticsearch and I have to do it in order and I don’t know how many. (Search response tell me if there is more data) 1. Use Async IO This work nice but if I read the data in a Paginated way I have to buffer all the data before I can return the result and it doesn’t scale. 2. Iterate stream The requirement is more recursive than iteration and have some limitations regarding checkpoints. 3. Process function Is not intended to do external IO operation as they take time to execute. 4. Elasticsearch source together with Kafka Store the sideoutput I Kafka and create a elasticsearch / Kafka source function. Complicated There could be other ways of doing it and I’m open for good ideas and suggestions how to handle this challenge Thanks in advance Med venlig hilsen / Best regards Lasse Nedergaard