Re: [Linux-HA] MySQL unknown exec error
On Thu, Nov 04, 2010 at 02:54:59PM -0300, mike wrote: On 10-11-04 12:38 PM, Dejan Muhamedagic wrote: Hi, On Thu, Nov 04, 2010 at 11:06:48AM -0300, mike wrote: Looking for a more experienced person who can explain this issue we had last night. Our backups kicked in during the night at 1AM. At 1:01AM, our mysql cluster had issues. Specifically I can see in crm_mon where the cluster has it as failed due to an unknown exec error. Looking at the performance of the node, I can see where wait on I/O went through the roof at 1AM when the tsm backups kicked in. I can see where this caused heartbeat issues because mysql was late checking its instances - it generally takes a few seconds but in this case it took 3 minutes. Of course this is all due to the extremely high wait on I/O but I am curious - why didn't the cluster fail over? Why put MySQL in an unmanaged state and simply say there was an unknown exec error?. Can't say without looking at the logs and the PE files. One possible explanation is that a resource was for whatever reason not allowed to run on the other node: a failure in the past which didn't expire or a negative location constraint. Or the fail count reached migration threshold (if defined). Thanks, Dejan Thanks for any comments Thanks for the reply Dejan. I have the failcount threshold set to 3 on both nodes and if I understand it correctly, after a 3rd failure it should fail over to then backup node. Correct? Yes. What do you mean by a negative location constraint? A location constraint with a negative score. For instance, such constraint is inserted by the crm resource move command. Thanks, Dejan Mike ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] MySQL unknown exec error
Looking for a more experienced person who can explain this issue we had last night. Our backups kicked in during the night at 1AM. At 1:01AM, our mysql cluster had issues. Specifically I can see in crm_mon where the cluster has it as failed due to an unknown exec error. Looking at the performance of the node, I can see where wait on I/O went through the roof at 1AM when the tsm backups kicked in. I can see where this caused heartbeat issues because mysql was late checking its instances - it generally takes a few seconds but in this case it took 3 minutes. Of course this is all due to the extremely high wait on I/O but I am curious - why didn't the cluster fail over? Why put MySQL in an unmanaged state and simply say there was an unknown exec error?. Thanks for any comments ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] MySQL unknown exec error
Hi, On Thu, Nov 04, 2010 at 11:06:48AM -0300, mike wrote: Looking for a more experienced person who can explain this issue we had last night. Our backups kicked in during the night at 1AM. At 1:01AM, our mysql cluster had issues. Specifically I can see in crm_mon where the cluster has it as failed due to an unknown exec error. Looking at the performance of the node, I can see where wait on I/O went through the roof at 1AM when the tsm backups kicked in. I can see where this caused heartbeat issues because mysql was late checking its instances - it generally takes a few seconds but in this case it took 3 minutes. Of course this is all due to the extremely high wait on I/O but I am curious - why didn't the cluster fail over? Why put MySQL in an unmanaged state and simply say there was an unknown exec error?. Can't say without looking at the logs and the PE files. One possible explanation is that a resource was for whatever reason not allowed to run on the other node: a failure in the past which didn't expire or a negative location constraint. Or the fail count reached migration threshold (if defined). Thanks, Dejan Thanks for any comments ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] MySQL unknown exec error
On 10-11-04 12:38 PM, Dejan Muhamedagic wrote: Hi, On Thu, Nov 04, 2010 at 11:06:48AM -0300, mike wrote: Looking for a more experienced person who can explain this issue we had last night. Our backups kicked in during the night at 1AM. At 1:01AM, our mysql cluster had issues. Specifically I can see in crm_mon where the cluster has it as failed due to an unknown exec error. Looking at the performance of the node, I can see where wait on I/O went through the roof at 1AM when the tsm backups kicked in. I can see where this caused heartbeat issues because mysql was late checking its instances - it generally takes a few seconds but in this case it took 3 minutes. Of course this is all due to the extremely high wait on I/O but I am curious - why didn't the cluster fail over? Why put MySQL in an unmanaged state and simply say there was an unknown exec error?. Can't say without looking at the logs and the PE files. One possible explanation is that a resource was for whatever reason not allowed to run on the other node: a failure in the past which didn't expire or a negative location constraint. Or the fail count reached migration threshold (if defined). Thanks, Dejan Thanks for any comments ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Thanks for the reply Dejan. I have the failcount threshold set to 3 on both nodes and if I understand it correctly, after a 3rd failure it should fail over to then backup node. Correct? What do you mean by a negative location constraint? Mike ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems