https://bugzilla.wikimedia.org/show_bug.cgi?id=71056
--- Comment #1 from Andrew Otto <[email protected]> --- Magnus and I worked to try to figure out what was going on. We have upgraded librdkafka to 0.8.4 on analytics1003 (and also attempted to use broker offset storage). By only including the webrequest_upload as input to kafkatee, I was able to reproduce this problemĀ in a simplified setting. Evidence points to analytics1021 as being the cause of this problem (again). Likely related, is this bug: https://github.com/edenhill/librdkafka/issues/147 https://issues.apache.org/jira/browse/KAFKA-1367 I restarted analytics1021 and issue a preferred-replica-election. This did not solve the broker/zookeeper metadata mismatch (described in those bugs), but it did solve the problem of kafkatee not consuming from all partitions. I'm not entirely sure how to move on from here. I'm going to re-add kafkatee consuming from all topics, and see how things go over the weekend. I wonder if this has something to do with a metadata refresh bug in librdkafka/kafka + the weird analytics1021 kafka<->zookeeper timeout bug[1] that we have been struggling with. If these issues persist, I think we should consider dropping analytics1021 from our Kafka cluster. Its hard to say if we have problems because of this machine, or because of the Rack/network it is in, or because of a fluke. [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=69667 -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
