Hi, On Tue, Feb 09, 2010 at 11:29:10AM +0100, Oscar Remírez de Ganuza Satrústegui wrote: > Hello everybody, > > We have a pacemaker cluster, active-pasive, with a mysql service as > a cluster resource. We are using pacemaker 1.0.6 + corosync 1.1.2 on > RedHat 5.3. > Everything is working fine, except when we reset (power down and > start) a node (through stonith) and want to run the mysql service > again on the node. > > Let me explain the problem: > Pacemaker manage the mysql service through the /etc/init.d service > (lsb:mysql), and usually it can start and stop it ok. But when the > active node is "stonithed", there is left on disk the pid file > (node1.pid), as the service was not stopped properly. > So here comes the problem: when pacemaker tries to start the service > again, it start it twice.
A resource agent must handle that properly. > From /var/log/messages: > > * The order to start the service is done: > Feb 9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op: > Performing key=30:119:0:6ce08921-3e51-462c-949a-aa540c206e3c > op=mysql-unav-service_start_0 ) > Feb 9 10:45:06 herculespre lrmd: [2176]: info: > rsc:mysql-unav-service:13: start > Feb 9 10:45:06 herculespre lrmd: [7602]: WARN: For LSB init script, > no additional parameters are needed. > Feb 9 10:45:06 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:start:stdout) Starting MySQL > eb 9 10:45:06 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:start:stdout) [ OK ] > Feb 9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event: > LRM operation mysql-unav-service_start_0 (call=13, rc=0, > cib-update=19, confirmed=true) ok > > * But then a monitor operation is performed at the same time and it > detects that it is not running: The resource agent didn't really start mysql. Please try with the OCF mysql resource agent. Thanks, Dejan > Feb 9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op: > Performing key=31:119:0:6ce08921-3e51-462c-949a-aa540c206e3c > op=mysql-unav-service_monitor_20000 ) > Feb 9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event: > LRM operation mysql-unav-service_monitor_20000 (call=14, rc=7, > cib-update=20, confirmed=false) not running > Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch: > Update relayed from jasonpre > Feb 9 10:45:06 herculespre attrd: [2179]: info: > attrd_local_callback: Expanded fail-count-mysql-unav-service=value++ > to 1 > Feb 9 10:45:06 herculespre attrd: [2179]: info: > attrd_trigger_update: Sending flush op to all hosts for: > fail-count-mysql-unav-service (1) > Feb 9 10:45:06 herculespre attrd: [2179]: info: > attrd_perform_update: Sent update 26: > fail-count-mysql-unav-service=1 > Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch: > Update relayed from jasonpre > Feb 9 10:45:06 herculespre attrd: [2179]: info: > attrd_trigger_update: Sending flush op to all hosts for: > last-failure-mysql-unav-service (1265708729) > Feb 9 10:45:06 herculespre attrd: [2179]: info: > attrd_perform_update: Sent update 29: > last-failure-mysql-unav-service=1265708729 > > * So it is decided to stop the resource and start it again!!!: > Feb 9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op: > Performing key=3:122:0:6ce08921-3e51-462c-949a-aa540c206e3c > op=mysql-unav-service_stop_0 ) > Feb 9 10:45:07 herculespre lrmd: [2176]: info: > rsc:mysql-unav-service:15: stop > Feb 9 10:45:07 herculespre lrmd: [7743]: WARN: For LSB init script, > no additional parameters are needed. > Feb 9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event: > LRM operation mysql-unav-service_monitor_20000 (call=14, status=1, > cib-update=0, confirmed=true) Cancelled > Feb 9 10:45:07 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:stop:stdout) MySQL manager or server PID file > could not be found![FAILED] > > * But the stop operation failed.. > Feb 9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event: > LRM operation mysql-unav-service_stop_0 (call=15, rc=0, > cib-update=21, confirmed=true) ok > Feb 9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op: > Performing key=30:122:0:6ce08921-3e51-462c-949a-aa540c206e3c > op=mysql-unav-service_start_0 ) > Feb 9 10:45:07 herculespre lrmd: [2176]: info: > rsc:mysql-unav-service:16: start > Feb 9 10:45:07 herculespre lrmd: [7762]: WARN: For LSB init script, > no additional parameters are needed. > Feb 9 10:45:07 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:start:stdout) Starting MySQL > Feb 9 10:45:07 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:start:stdout) . > Feb 9 10:45:08 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:start:stdout) [ OK ] > Feb 9 10:45:08 herculespre lrmd: [2176]: info: RA output: > (mysql-unav-service:start:stdout) > Feb 9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event: > LRM operation mysql-unav-service_start_0 (call=16, rc=0, > cib-update=22, confirmed=true) ok > Feb 9 10:45:08 herculespre crmd: [2181]: info: do_lrm_rsc_op: > Performing key=2:122:0:6ce08921-3e51-462c-949a-aa540c206e3c > op=mysql-unav-service_monitor_20000 ) > Feb 9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event: > LRM operation mysql-unav-service_monitor_20000 (call=17, rc=0, > cib-update=23, confirmed=false) ok > * And the start operation is performed ok > > So we have now twice the same service running, and logically this is > not working properly, as the two services try to access the same > files and system ports. > > From /var/log/mysql > 100209 10:45:07 mysqld_safe Starting mysqld daemon with databases > from /usr/local/etc2/mysql-unav/var > 100209 10:45:07 mysqld_safe Starting mysqld daemon with databases > from /usr/local/etc2/mysql-unav/var > InnoDB: Unable to lock ./ibdata1, error: 11 > InnoDB: Check that you do not already have another mysqld process > InnoDB: using the same InnoDB data or log files. > 100209 10:45:07 InnoDB: Retrying to lock the first data file > ... > > If we start the service manually (/etc/init.d/mysql start) whith the > pid file existing, there is no problem. > So, what it is doing pacemaker wrong? > > I reckon that maybe we have some timeouts not properly configured. > Right now we have: > Operations timeout: default-action-timeout="120s" > Mysql Resource Monitor: op monitor interval="20s" timeout="20s" > > Is there someone with a similar setup (with mysql or with another > service) and experiencing any problems? > > Can you tell me some safe values for the timeouts? > > Thank you very much for your help! > > Regards, > > PS: I did not want to spam the list with more logs or configuration > files, but if they are needed i can attach them, of course. > > --- > Oscar Remírez de Ganuza > Servicios Informáticos > Universidad de Navarra > Ed. de Derecho, Campus Universitario > 31080 Pamplona (Navarra), Spain > tfno: +34 948 425600 Ext. 3130 > http://www.unav.es/SI > > _______________________________________________ > Pacemaker mailing list > [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker
