I have been investigating an issue at one of our customers (our product embeds Zookeeper in it), and here's a summary of what I've been able to discern:

* One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C client) experiences a hardware/firmware failure of its RAID partition (possibly affecting the network card as well). From /var/log/syslog:

-------------------------------
Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas 0000:03:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update. Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq 73 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq 74 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq 75 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq 76 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq 77 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq 78 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq 79 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq 80 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq 81 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0: eth0: using MSIX Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP): eth0: link is not ready Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Feb 24 01:53:57 controller3 kernel: [1398907.311239] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241 offset 63.134559 sec Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name = controller/disk-sda/disk_octets; value time = 1330048506; last cache update = 1330048506; Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1. Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name = controller/disk-sda/disk_ops; value time = 1330048506; last cache update = 1330048506; Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
...
Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1. Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers present Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq 73 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq 74 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq 75 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: irq 76 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: irq 77 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: irq 78 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889960] bnx2 0000:01:00.0: irq 79 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889968] bnx2 0000:01:00.0: irq 80 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889975] bnx2 0000:01:00.0: irq 81 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.970592] bnx2 0000:01:00.0: eth0: using MSIX Feb 24 01:55:56 controller3 kernel: [1398962.971959] ADDRCONF(NETDEV_UP): eth0: link is not ready Feb 24 01:55:59 controller3 kernel: [1398966.028629] bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Feb 24 01:55:59 controller3 kernel: [1398966.030132] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Feb 24 01:56:05 controller3 ntpdate[10968]: step time server 72.3.128.241 offset 0.002667 sec
-------------------------------

* This leads existing ZK client sessions to expire, and those clients attempt to make new sessions. * One client (on a different node) establishes a new session to this server at 01:55:56 (coinciding exactly with the second set of IRQ messages shown above):

-------------------------------
2012-02-24 01:55:56,964:17928(0x7fbf0b077700):ZOO_INFO@check_events@1632: session establishment complete on server [10.14.237.70:2888], sessionId=0xd235ad12815a0000, negotiated timeout=6000
-------------------------------

* This session is able to successfully create ephemeral znodes, but watches never fire for the session. For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients).

Unfortunately I cannot reproduce this, and we don't have good logs. Because of this, I cannot definitively place the blame on Zookeeper -- there are layers of our product above ZK that could theoretically be the problem. But given the evidence I've looked at, ZK seems to be the likely culprit.

So my question is: is anyone aware of a failure scenario that would cause watches to not be fired, but allow other operations from the same session to be successful? And are there any workarounds? I don't expect much help on this from the community, given the lack of logs and evidence, but I just wanted to throw it out there and see if anyone had any ideas off the top of their head. Thanks,

Jeremy

Reply via email to