Hi
> > dlm_tool dump ? > It gives a empty return for this command, since we do not have the dlm resource started yet. I tried then the dlm_controld.pcmk command and this is the result. apolo:~ # dlm_controld.pcmk -D cluster-dlm[4616]: main: dlm_controld master started 1467918891 dlm_controld master started cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_clm' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_evt' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_ckpt' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_amf_v2' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_msg' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_lck' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_tmr' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options... cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'pacemaker' for option: name cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found '0' for option: ver cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_cluster_type: Detected an active 'classic openais (with plugin)' cluster cluster-dlm[4616]: 2016/07/07_16:14:51 info: init_ais_connection_classic: Creating connection to our Corosync plugin cluster-dlm[4616]: 2016/07/07_16:14:51 info: init_ais_connection_classic: AIS connection established cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_ais_nodeid: Server details: id=16845322 uname=apolo cname=pcmk cluster-dlm[4616]: 2016/07/07_16:14:51 info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established cluster-dlm[4616]: 2016/07/07_16:14:51 info: crm_new_peer: Node apolo now has id: 16845322 cluster-dlm[4616]: 2016/07/07_16:14:51 info: crm_new_peer: Node 16845322 is now known as apolo 1467918891 Is dlm missing from kernel? No misc devices found. 1467918891 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 1467918891 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 1467918891 No /sys/kernel/config, is configfs loaded? 1467918891 shutdown 1467918891 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 1467918891 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 cluster-dlm[4616]: 2016/07/07_16:14:51 notice: terminate_ais_connection: Disconnecting from AIS I even tried to remove this constraints from the cluster configuration, in order to see if the DLM resource get up but with no good result colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master order ordDRBDDLM 0: msDRBD_01:promote cloneDLM:start Tank you. Carlos > 2016-07-07 18:57 GMT+02:00 Carlos Xavier <[email protected]>: > > Tank you for the fast reply > > > >> > >> have you configured the stonith and drbd stonith handler? > >> > > > > Yes. they were configured. > > The cluster was running fine for more than 4 years, until we loose one host > > by power supply failure. > > Now I need to access the files on the host that is working. > > > >> 2016-07-07 16:43 GMT+02:00 Carlos Xavier <[email protected]>: > >> > Hi. > >> > We had a Pacemaker cluster running OCFS2 filesystem over a DRBD > >> > device and we completely lost one of > >> the hosts. > >> > Now I need some help to recover the data on the remaining machine. > >> > I was able to load the DRBD module by hand bring up the devices using > >> > the drbdadm command line: > >> > apolo:~ # modprobe drbd > >> > apolo:~ # cat /proc/drbd > >> > version: 8.3.9 (api:88/proto:86-95) > >> > srcversion: A67EB2D25C5AFBFF3D8B788 > >> > > >> > apolo:~ # drbd-overview > >> > 0:backup > >> > 1:export > >> > apolo:~ # drbdadm attach backup > >> > apolo:~ # drbdadm attach export > >> > apolo:~ # drbd-overview > >> > 0:backup StandAlone Secondary/Unknown UpToDate/DUnknown r----- > >> > 1:export StandAlone Secondary/Unknown UpToDate/DUnknown r----- > >> > apolo:~ # drbdadm primary backup apolo:~ # drbdadm primary export > >> > apolo:~ # drbd-overview > >> > 0:backup StandAlone Primary/Unknown UpToDate/DUnknown r----- > >> > 1:export StandAlone Primary/Unknown UpToDate/DUnknown r----- > >> > > >> > We have these resources and constraints configured: > >> > primitive resDLM ocf:pacemaker:controld \ > >> > op monitor interval="120s" > >> > primitive resDRBD_0 ocf:linbit:drbd \ > >> > params drbd_resource="backup" \ > >> > operations $id="resDRBD_0-operations" \ > >> > op start interval="0" timeout="240" \ > >> > op stop interval="0" timeout="100" \ > >> > op monitor interval="20" role="Master" timeout="20" \ > >> > op monitor interval="30" role="Slave" timeout="20" > >> > primitive resDRBD_1 ocf:linbit:drbd \ > >> > params drbd_resource="export" \ > >> > operations $id="resDRBD_1-operations" \ > >> > op start interval="0" timeout="240" \ > >> > op stop interval="0" timeout="100" \ > >> > op monitor interval="20" role="Master" timeout="20" \ > >> > op monitor interval="30" role="Slave" timeout="20" > >> > primitive resFS_BACKUP ocf:heartbeat:Filesystem \ > >> > params device="/dev/drbd/by-res/backup" directory="/backup" > >> > fstype="ocfs2" options="rw,noatime" \ > >> > op monitor interval="120s" > >> > primitive resFS_EXPORT ocf:heartbeat:Filesystem \ > >> > params device="/dev/drbd/by-res/export" directory="/export" > >> > fstype="ocfs2" options="rw,noatime" \ > >> > op monitor interval="120s" > >> > primitive resO2CB ocf:ocfs2:o2cb \ > >> > op monitor interval="120s" > >> > group DRBD_01 resDRBD_0 resDRBD_1 > >> > ms msDRBD_01 DRBD_01 \ > >> > meta resource-stickines="100" notify="true" master-max="2" > >> > interleave="true" target-role="Started" > >> > clone cloneDLM resDLM \ > >> > meta globally-unique="false" interleave="true" > >> > target-role="Started" > >> > clone cloneFS_BACKUP resFS_BACKUP \ > >> > meta interleave="true" ordered="true" target-role="Started" > >> > clone cloneFS_EXPORT resFS_EXPORT \ > >> > meta interleave="true" ordered="true" target-role="Started" > >> > clone cloneO2CB resO2CB \ > >> > meta globally-unique="false" interleave="true" > >> > target-role="Started" > >> > colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master colocation > >> > colFS_BACKUP-O2CB inf: cloneFS_BACKUP cloneO2CB colocation > >> > colFS_EXPORT-O2CB inf: cloneFS_EXPORT cloneO2CB colocation > >> > colO2CBDLM inf: cloneO2CB cloneDLM order > >> ordDLMO2CB 0: cloneDLM cloneO2CB order ordDRBDDLM 0: > >> msDRBD_01:promote cloneDLM:start order ordO2CB- FS_BACKUP 0: cloneO2CB > >> cloneFS_BACKUP order > ordO2CB-FS_EXPORT 0: > >> > cloneO2CB cloneFS_EXPORT > >> > > >> > As the DRBD devices were brought up by hand, Pacemaker doesn't > >> > recognize they are up and so it doesn't start the DLM resource and > >> > all resources that depends on it > >> stay stopped. > >> > Is there any way I can circumvent this issue? > >> > Is it possible to bring the OCFS2 resources working on standalone mode? > >> > Please, any help will be very welcome. > >> > > >> > Best regards, > >> > Carlos. _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
