Pavel Kuznetsov created KAFKA-10327:
---------------------------------------

             Summary: Make flush after some count of putted records in SinkTask
                 Key: KAFKA-10327
                 URL: https://issues.apache.org/jira/browse/KAFKA-10327
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
    Affects Versions: 2.5.0
            Reporter: Pavel Kuznetsov


In current version of kafka connect all records accumulated with SinkTask.put 
method are flushed to target system on a time-based manner. So data is flushed 
and offsets are committed every  offset.flush.timeout.ms (default is 60000) ms.

But you can't control the number of messages you receive from Kafka between two 
flushes. It may cause out of memory errors, because in-memory buffer may grow a 
lot. 

I suggest to add out of box support of count-based flush to kafka connect. It 
requires new configuration parameter (offset.flush.count, for example). Number 
of records sent to SinkTask.put should be counted, and if these amount is 
greater than offset.flush.count's value, SinkTask.flush is called and offsets 
are committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to