Hi, 0.8.0 HEAD from 3/4/2013.
As I think through building a robust SimpleConsumer I ran some failure tests today and want to make sure I understand what is going on. FYI I know that I should be doing a metadata lookup to find the leader, but I wanted to see what happens if things are going well and the leader changes between requests or I've cached the leader and try to connect without the cost of a leader lookup. First test: connect to a Broker that is a 'copy' of the topic/partition but not leader. Get an error '5' which maps to 'ErrorMapping.LeaderNotAvailableCode'. Why didn't I get ErrorMapping.NotLeaderForPartitionCode or something else to tell me I'm not talking to the Leader? 'not available' implies something is wrong with replication. But connecting to the leader Broker everything works fine. Second test: connect to a Broker that isn't the leader or a copy and I get error 3, unknown topic or partition. Makes sense. Third test: connect to the leader and while reading data, shutdown the leader Broker via command line: I get some IOExceptions then Connection Refused on the reconnect. (Note that the Connect Refused is the exception raised, IOException was written to logs but not raised to my code.) Not sure the best way to code to recover from this without assuming the worst every time Could there be some notice from Kafka that the connection to the leader was closed due to a shutdown vs. getting Connection Refused errors so I can respond differently? Something like 'Broker has closed connection due to shutdown'. So I know to sleep for a second before going through the leader lookup logic again? Or ideally have Kafka know it was a clean shutdown and automatically transition to the new leader. Knowing it was a clean shutdown would also allow me to treat the clean shutdown as a normal occurrence vs. an exception when something goes wrong. Thanks, Chris