Jason Gustafson created KAFKA-4137:
--------------------------------------

             Summary: Refactor multi-threaded consumer for safer network layer 
access
                 Key: KAFKA-4137
                 URL: https://issues.apache.org/jira/browse/KAFKA-4137
             Project: Kafka
          Issue Type: Improvement
          Components: consumer
            Reporter: Jason Gustafson


In KIP-62, we added a background thread to send heartbeats while the user is 
processing fetched data from a call to poll(). In the implementation, we 
elected to share the instance of {{NetworkClient}} between the foreground 
thread and this background thread. After working with the system test failure 
in KAFKA-3807, we've realized that this probably wasn't a good decision. It is 
very tricky to get the synchronization correct with respect to response 
callbacks and reasoning about the multi-threaded behavior is very difficult. 
For example, a common pattern is to send a request and then call 
{{NetworkClient.poll()}} to await its return. With another thread also 
potentially calling poll(), the response can actually return before the sending 
thread itself invokes poll(). This can cause unnecessary (and potentially 
unbounded) blocking, and avoiding it is quite complex. 

A different approach we've discussed would be to use two instances of 
NetworkClient, one dedicated to fetching, and one dedicated to coordinator 
communication. The fetching NetworkClient can continue to work exclusively in 
the foreground thread and we can confine the coordinator NetworkClient to the 
background thread. This provides much better isolation and avoids all of the 
race conditions with calling poll() from two threads. The main complication is 
in how to expose blocking APIs to interact with the background thread. For 
example, in the current consumer API, rebalance are completed in the foreground 
thread, so we would need to coordinate with the background thread to preserve 
this (e.g. by using a Future abstraction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to