Hard to achieve.

I guess a naive approach would be to use a `flatMapTransform()` to implement a filter that drops all record that are not in the desired time range.

pause() and resume() are not available in Kafka Streams, but only on the KafkaConsumer (The Spring docs you cite is also about the consumer, not Kafka Streams).


-Matthias

On 11/24/21 11:05 AM, Miguel González wrote:
Hello

For my use case I need to work with a chuck of records, let's say per
month... We have over two years of data... and we are testing if we can
deploy it to production, but we need to test in small batches.

I have built a Kafka Streams app that processes two input topics and output
to one topic.

I would like to process the first two months of data. Is that possible?

    - I have tried blocking the consumer thread using .map and comparing the
    timestamp on the message and a timestamp I get from another system that
    would tell me until what time I should process on the two KStreams I have
    but I have noticed.I also increased MAX_POLL_INTERVAL_MS_CONFIG but I have
    noticed the messages that are in range do not get processed and sent to the
    output topic.
    - I have also seen a Spring Cloud library apparently offer a
    pause-resume feature.
    
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.5/reference/html/spring-cloud-stream-binder-kafka.html#_binding_visualization_and_control_in_kafka_streams_binder
    - I have also seen that implementing a transformer or processor could
    work but in this case the state store would possible less than years of
    data. That is something I would like to avoid.


Any help is appreciated.

regards
- Miguel

Reply via email to