Hi, some updates about this? Thank you Il Mer 4 Set 2019, 10:46 Marco Marino <marino....@gmail.com> ha scritto:
> First of all, thank you for your support. > Andrey: sure, I can reach machines through IPMI. > Here is a short "log": > > #From ld1 trying to contact ld1 > [root@ld1 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XXXXXX > sdr elist all > SEL | 72h | ns | 7.1 | No Reading > Intrusion | 73h | ok | 7.1 | > iDRAC8 | 00h | ok | 7.1 | Dynamic MC @ 20h > ... > > #From ld1 trying to contact ld2 > ipmitool -I lanplus -H 192.168.254.251 -U root -P XXXXXX sdr elist all > SEL | 72h | ns | 7.1 | No Reading > Intrusion | 73h | ok | 7.1 | > iDRAC7 | 00h | ok | 7.1 | Dynamic MC @ 20h > ....... > > > #From ld2 trying to contact ld1: > root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XXXXX sdr > elist all > SEL | 72h | ns | 7.1 | No Reading > Intrusion | 73h | ok | 7.1 | > iDRAC8 | 00h | ok | 7.1 | Dynamic MC @ 20h > System Board | 00h | ns | 7.1 | Logical FRU @00h > ..... > > #From ld2 trying to contact ld2 > [root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.251 -U root -P XXXX sdr > elist all > SEL | 72h | ns | 7.1 | No Reading > Intrusion | 73h | ok | 7.1 | > iDRAC7 | 00h | ok | 7.1 | Dynamic MC @ 20h > System Board | 00h | ns | 7.1 | Logical FRU @00h > ........ > > Jan: Actually the cluster uses /etc/hosts in order to resolve names: > 172.16.77.10 ld1.mydomain.it ld1 > 172.16.77.11 ld2.mydomain.it ld2 > > Furthermore I'm using ip addresses for ipmi interfaces in the > configuration: > [root@ld1 ~]# pcs stonith show fence-node1 > Resource: fence-node1 (class=stonith type=fence_ipmilan) > Attributes: ipaddr=192.168.254.250 lanplus=1 login=root passwd=XXXXX > pcmk_host_check=static-list pcmk_host_list=ld1.mydomain.it > Operations: monitor interval=60s (fence-node1-monitor-interval-60s) > > > Any idea? > How can I reset the state of the cluster without downtime? "pcs resource > cleanup" is enough? > Thank you, > Marco > > > Il giorno mer 4 set 2019 alle ore 10:29 Jan Pokorný <jpoko...@redhat.com> > ha scritto: > >> On 03/09/19 20:15 +0300, Andrei Borzenkov wrote: >> > 03.09.2019 11:09, Marco Marino пишет: >> >> Hi, I have a problem with fencing on a two node cluster. It seems that >> >> randomly the cluster cannot complete monitor operation for fence >> devices. >> >> In log I see: >> >> crmd[8206]: error: Result of monitor operation for fence-node2 on >> >> ld2.mydomain.it: Timed Out >> > >> > Can you actually access IP addresses of your IPMI ports? >> >> [ >> Tangentially, interesting aspect beyond that and applicable for any >> non-IP cross-host referential needs, which I haven't seen mentioned >> anywhere so far, is the risk of DNS resolution (when /etc/hosts will >> come short) getting to troubles (stale records, port blocked, DNS >> server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW >> cannot handle gracefully, etc.). In any case, just a single DNS >> server would apparently be an undesired SPOF, and would be unfortunate >> when unable to fence a node because of that. >> >> I think the most robust approach is to use IP addresses whenever >> possible, and unambiguous records in /etc/hosts when practical. >> ] >> >> >> As attachment there is >> >> - /var/log/messages for node1 (only the important part) >> >> - /var/log/messages for node2 (only the important part) <-- Problem >> starts >> >> here >> >> - pcs status >> >> - pcs stonith show (for both fence devices) >> >> >> >> I think it could be a timeout problem, so how can I see timeout value >> for >> >> monitor operation in stonith devices? >> >> Please, someone can help me with this problem? >> >> Furthermore, how can I fix the state of fence devices without downtime? >> >> -- >> Jan (Poki) >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/