Public bug reported: [Description]
On a 3 nodes rabbitMQ cluster after a failure on one of the nodes (kernel crash) the following succession of events has been registered: === This initial crash report $ ack-grep -i mnesia_locker | wc -l 16 60-=CRASH REPORT==== 7-Sep-2015::14:48:45 === 61- crasher: -- 66: {mnesia_locker,'rabbit@10-10-34-2',granted}} 67- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045) 68- ancestors: [worker_pool_sup,rabbit_sup,<0.110.0>] 69- messages: [] 70- links: [<0.114.0>] 71- dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct}, 72- {{xtype_to_module,topic},rabbit_exchange_type_topic}, 73- {random_seed,{5643,25632,27953}}, 74- {{xtype_to_module,fanout},rabbit_exchange_type_fanout}, 75- {worker_pool_worker,true}, 76- {fhc_age_tree,{0,nil}}, -- 88: Reason: {unexpected_info,{mnesia_locker,'rabbit@10-10-34-2',granted}} 89- Offender: [{pid,<0.142.0>}, 90- {name,25}, 91- {mfargs,{worker_pool_worker,start_link,[25]}}, 92- {restart_type,transient}, 93- {shutdown,4294967295}, 94- {child_type,worker}] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1.log 5166:Mnesia('rabbit@10-10-34-1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-2'} sosreport-svz-op-fdc-os-3.00087142-20150909140820/var/log/rabbitmq/rab...@10-10-34-3.log.1 545:Mnesia('rabbit@10-10-34-3'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-1'} After this I started seeing this traces: sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log:572:=CRASH REPORT==== 7-Sep-2015::14:48:45 === sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log:573: crasher: sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-574- initial call: gen:init_it/6 sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-575- pid: <0.2346.189> sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-576- registered_name: [] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-577- exception exit: {{unexpected_cast,{next_job_from,<0.4960.181>}}, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-578- {gen_server2,call, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-579- [<0.22014.255>, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-580- {submit,#Fun<rabbit_misc.6.25154013>,<0.2862.189>}, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-581- infinity]}} sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-582- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045) sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-583- ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.110.0>] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-584- messages: [] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-585- links: [<0.259.0>] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-586- dictionary: [{guid,{{3735536962,2437967587,3023977752,60868675},0}}] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-587- trap_exit: true Related Bugs: - https://bugs.launchpad.net/mos/+bug/1401948 - https://github.com/erlang/otp/compare/maint...dgud:dgud%3Bmnesia%3Bsticky-race%3BOTP-11375 Possible related upstream commit: - http://hg.rabbitmq.com/rabbitmq-server/rev/5a63c9e273cc This seems to be related to [1] and [2] both fixes available since rabbitmq-server 3.4.0 [1] https://github.com/rabbitmq/rabbitmq- server/commit/1a32616b744f4dab09cba4e7a7e747c2b6550361#diff- 3b9dc5e3c18be9549b0ab00763e4e123 [2] https://github.com/rabbitmq/rabbitmq- server/commit/238243b10ba6f666fcd8e84289525961fe6e68b9#diff- 3b9dc5e3c18be9549b0ab00763e4e123 ** Affects: rabbitmq-server (Ubuntu) Importance: Undecided Status: New ** Tags: sts ** Tags added: sts ** Description changed: [Description] On a 3 nodes rabbitMQ cluster after a failure on one of the nodes (kernel crash) the following succession of events has been registered: === This initial crash report $ ack-grep -i mnesia_locker | wc -l 16 60-=CRASH REPORT==== 7-Sep-2015::14:48:45 === 61- crasher: -- 66: {mnesia_locker,'rabbit@10-10-34-2',granted}} 67- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045) 68- ancestors: [worker_pool_sup,rabbit_sup,<0.110.0>] 69- messages: [] 70- links: [<0.114.0>] 71- dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct}, 72- {{xtype_to_module,topic},rabbit_exchange_type_topic}, 73- {random_seed,{5643,25632,27953}}, 74- {{xtype_to_module,fanout},rabbit_exchange_type_fanout}, 75- {worker_pool_worker,true}, 76- {fhc_age_tree,{0,nil}}, -- 88: Reason: {unexpected_info,{mnesia_locker,'rabbit@10-10-34-2',granted}} 89- Offender: [{pid,<0.142.0>}, 90- {name,25}, 91- {mfargs,{worker_pool_worker,start_link,[25]}}, 92- {restart_type,transient}, 93- {shutdown,4294967295}, 94- {child_type,worker}] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1.log 5166:Mnesia('rabbit@10-10-34-1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-2'} sosreport-svz-op-fdc-os-3.00087142-20150909140820/var/log/rabbitmq/rab...@10-10-34-3.log.1 545:Mnesia('rabbit@10-10-34-3'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-1'} After this I started seeing this traces: sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log:572:=CRASH REPORT==== 7-Sep-2015::14:48:45 === sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log:573: crasher: sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-574- initial call: gen:init_it/6 sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-575- pid: <0.2346.189> sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-576- registered_name: [] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-577- exception exit: {{unexpected_cast,{next_job_from,<0.4960.181>}}, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-578- {gen_server2,call, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-579- [<0.22014.255>, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-580- {submit,#Fun<rabbit_misc.6.25154013>,<0.2862.189>}, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-581- infinity]}} sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-582- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045) sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-583- ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.110.0>] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-584- messages: [] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-585- links: [<0.259.0>] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-586- dictionary: [{guid,{{3735536962,2437967587,3023977752,60868675},0}}] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rab...@10-10-34-1-sasl.log-587- trap_exit: true Related Bugs: - https://bugs.launchpad.net/mos/+bug/1401948 - https://github.com/erlang/otp/compare/maint...dgud:dgud%3Bmnesia%3Bsticky-race%3BOTP-11375 - Possible related upstream commit: - http://hg.rabbitmq.com/rabbitmq-server/rev/5a63c9e273cc + + This seems to be related to [1] and [2] both fixes available since + rabbitmq-server 3.4.0 + + [1] https://github.com/rabbitmq/rabbitmq- + server/commit/1a32616b744f4dab09cba4e7a7e747c2b6550361#diff- + 3b9dc5e3c18be9549b0ab00763e4e123 + + [2] https://github.com/rabbitmq/rabbitmq- + server/commit/238243b10ba6f666fcd8e84289525961fe6e68b9#diff- + 3b9dc5e3c18be9549b0ab00763e4e123 ** Summary changed: - Race condition in mnesia_locker after nodedown + Race condition in mnesia_locker after node down -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1496409 Title: Race condition in mnesia_locker after node down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1496409/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs