Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error
Hi Bob, > -Original Message- > From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Bob > Peterson > Sent: 2019年10月21日 21:02 > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] gfs2: fsid=:work.3: fatal: filesystem > consistency > error > > - Original Message - > > Hello List, > > > > I got gfs2 file system consistency error from one user, who is using > > kernel 4.12.14-95.29-default on SLE12SP4(x86_64). > > The error message is as below, > > 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2: > > fsid=:work.3: fatal: filesystem consistency error > > 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234] > inode = 280 > > 342097926 > > 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234] > function = > > gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459 > > 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2: > > fsid=:work.3: about to withdraw this file system > > > > I cat the super.c file, the related code is, > > 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip) > > 1452 { > > 1453 struct gfs2_sbd *sdp = GFS2_SB(>i_inode); > > 1454 struct gfs2_rgrpd *rgd; > > 1455 struct gfs2_holder gh; > > 1456 int error; > > 1457 > > 1458 if (gfs2_get_inode_blocks(>i_inode) != 1) { > > 1459 gfs2_consist_inode(ip); <<== here > > 1460 return -EIO; > > 1461 } > > > > > > It looks the upstream has fixed this bug? who can help to point out > > which patches to be needed for back-port? > > > > Thanks > > Gang > > Hi, > > Yes, we have made lots of patches since the 4.12 kernel, some of which may > be relevant. However, that error often indicates file system corruption. > (It means the block count for a dinode became corrupt.) > > I've been working on a set of problems caused whenever gfs2 replays one of > its journals during recovery, with a wide variety of symptoms, including that > one. So it might be one of those. Some of my resulting patches are already > pushed to upstream, but I'm not yet at the point where I can push them all. > > I recommend doing a fsck.gfs2 on the volume to ensure consistency. The customer has repaired it using fsck.gfs2, however every time the application workload starts (concurrent writing), the filesystem becomes inaccessible, causing also a stop operation failure of the app resource, consequently causing a fence. Do you have any suggestion in this case? It looks there is a serious bug in case concurrent writing with some stress. Thanks Gang > > Regards, > > Bob Peterson > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: Re: Antw: Coming in Pacemaker 2.0.3: crm_mon output changes
On Wed, 2019-10-16 at 08:08 +0200, Ulrich Windl wrote: > > > > Why not replace "--weg-cgi" with "--output-format=cgi"? > > > > CGI output is identical to HTML, with just a header change, so it > > was > > logically more of an option to the HTML implementation rather than > > a > > separate one. > > > > With the new approach, each format type may define any additional > > options to modify its behavior, and all tools automatically inherit > > those options. These will be grouped together in the help/man page. > > For > > example, the HTML option help is: > > > > Output Options (html): > > --output-cgi Add text needed to use output > > in a CGI > > program > > --output-meta-refresh=SECONDS How often to refresh > > God bless the long options, but considering that the only thing that > is > refreshed in crm_mon's output is... well the output,,, why not just > have > --refresh or --refresh-interval. One of the goals is to have options that are consistent across all tools. We came up with the "--output-" prefix to make it easy to avoid conflicts with existing/future tool options. However, I think you're right that it's confusing. I'm thinking that instead, we can reserve each of the format types as an option prefix. For example, for html it would become: Output Options (html): --html-cgi Add text needed to use output in a CGI program --html-stylesheet=URI Link to an external CSS stylesheet --html-title=TITLE Page title which I think is a little shorter and more intuitive. There are a few existing --xml-* options we'd have to work around but I don't think that's a problem. Does that make more sense? BTW we decided to get rid of --output-meta-refresh altogether, and just continue using the existing --interval option for that purpose. > Also it wouldn't be too har > d (if there's any demand) to allow suffixes like > 's' for seconds, 'm' for minutes, and most likely more do not make > sense for a > refresh interval. Actually it already does, it's just not in the help description. We'll update the help. > > > > When called with ‑‑as‑xml, crm_mon's XML output will be > > > > identical > > > > to > > > > previous versions. When called with the new ‑‑output‑as=xml > > > > option, > > > > it > > > > will be slightly different: the outmost element will be a > > > > > > > result> element, which will be consistent across all tools. The > > > > old > > > > XML > > > > > > Why not as simple "status" element? "-result" doesn't really add > > > anything > > > useful. > > > > We wanted the design to allow for future flexibility in how users > > ask > > pacemaker to do something. The XML output would be the same whether > > the > > request came from a command-line tool, GUI, C API client > > application, > > REST API client, or any other future interface. The idea is that > > might be a response to a . > > But most likely any response will be a kind of result, so why have > "result" > explicitly? Also as it's all about pacemaker, why have "pacemaker" in > it? > (Remember how easy it was to get rid of "heartbeat"? ;-)) > So my argument for "status" simply is that the data describes the > status. The idea is that if the output is saved to a file, someone looking at that file later could easily figure out where it came from, even without any other context. > > All of the format options start with "--output-" so we can reserve > > those option names across all tools. > > Do you actually have a big matrix of all options available across the > tools? > I'd like to see! Me too. :) Not yet, we just grep for a new option name we're thinking of using. That's why we went with the "--output-" prefix, it was easy to make them unique. :) -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error
- Original Message - > Hello List, > > I got gfs2 file system consistency error from one user, who is using kernel > 4.12.14-95.29-default on SLE12SP4(x86_64). > The error message is as below, > 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2: > fsid=:work.3: fatal: filesystem consistency error > 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234] inode = 280 > 342097926 > 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234] function = > gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459 > 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2: > fsid=:work.3: about to withdraw this file system > > I cat the super.c file, the related code is, > 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip) > 1452 { > 1453 struct gfs2_sbd *sdp = GFS2_SB(>i_inode); > 1454 struct gfs2_rgrpd *rgd; > 1455 struct gfs2_holder gh; > 1456 int error; > 1457 > 1458 if (gfs2_get_inode_blocks(>i_inode) != 1) { > 1459 gfs2_consist_inode(ip); <<== here > 1460 return -EIO; > 1461 } > > > It looks the upstream has fixed this bug? who can help to point out which > patches to be needed for back-port? > > Thanks > Gang Hi, Yes, we have made lots of patches since the 4.12 kernel, some of which may be relevant. However, that error often indicates file system corruption. (It means the block count for a dinode became corrupt.) I've been working on a set of problems caused whenever gfs2 replays one of its journals during recovery, with a wide variety of symptoms, including that one. So it might be one of those. Some of my resulting patches are already pushed to upstream, but I'm not yet at the point where I can push them all. I recommend doing a fsck.gfs2 on the volume to ensure consistency. Regards, Bob Peterson ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: SBD fencing and crashkernel question
On 10/21/19 8:31 AM, Ulrich Windl wrote: Strahil Nikolov schrieb am 20.10.2019 um 01:03 in > Nachricht <1223585818.2655058.1571526232...@mail.yahoo.com>: >> Hello Community, >> I have a question about the stack in newer version compared to our SLES 11 >> openais stack.Can someone clarify if a node with SBD will invoke a >> crashkernel before self killing ? >> According to my tests on SLES 11 ,when another node kills the unresponsive >> one - crashkernel is invoked and a dump is present at /var/crash , but if >> the >> node stucks for some reason (naughty admin) - there is no sign of a crash >> (checked on the iLO to be sure). Can't help with SLES-specifics here but the difference between the 2 cases you describe is probably that in one case sbd-daemon is still alive enough to call a reboot, write on sysrq-trigger or whatever is configured (using poison-pill? you can configure what should happen if sbd-daemon is triggering the timeout-action - with current sbd even in a consistent manner as long as sbd-daemon is alive.) In the other case it is probably a hardware-watchdog kicking in. Regards, Klaus >> I'm not sure if this behaviour is the same on newer software version (SLES >> 12/15) and if I can workaround it - as we still struggle to find the reason >> why our clusters fence on a very specific situation (the clusters are using >> MDADM raid1-s on a dual-DC environment instead of SAN replication) where >> remote DC is unavailable for 20-30s until SAN/Network is rerouted. We have >> enabled crashdump on some of the systems , but we are pending a reboot and >> then a real DC<->DC connectivity outage to gather valuable info,as corosync >> is >> using dual-rings and is not affected, SBD is using survive on pacemaker and >> we suspect that the nodes suicide. >> Best Regards,Strahil Nikolov > So basically you want to know why your node is fenced? I couldn't quite > understand the environment you set up, nor what types of problems you are > seeing. > Actually in the time of many gigabytes of RAM is see little sense in crash > dumps, because they will just consume a lot of time to get done. > > Regards, > Ulrich > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: Re: Ocassionally IPaddr2 resource fails to start
>>> Donat Zenichev schrieb am 21.10.2019 um 09:12 in Nachricht : > Hello and sorry for soo late response of mine, I somehow missed your answer. > > Sure let me share a bit of useful information on the count. > First of all the system specific things are: > - Hypervisor is a usual VMware product - VSphere > - VMs OS is: Ubuntu 18.04 LTS > - Pacemaker is of version: 1.1.18-0ubuntu1.1 > > And yes it's IProute, that has a version - 4.15.0-2ubuntu1 > > To be mentioned that after I moved to another way of handling this (with > set failure-timeout ) I haven't seen any errors so far, on-fail action > still remains "restart". > But it's obvious, failure-timeout just clears all fail counters for me, so > I don't see any fails now. Failures should be logged in logfiles still. failure-timeout also does not prevent a restart on failure; it just extends the number of restart attempts. > > Another thing to be mentioned, that monitor functionality for IPaddr2 > resource was failing in the years past as well, I just didn't pay much > attention on that. > That time VM machines under my control were working over Ubuntu 14.04 and > hypervisor was - Proxmox of the branch 5+ (cannot exactly remember the > version, perhaps that was 5.4+). > > For one this could be a critical case indeed, since sometimes an absence of > IP address (for a certain DB for e.g. with loading of hundreds of thousands > SQL requests) can lead to a huge out age. > I don't have the first idea of how to investigate this further. But, I have > a staging setup where my hands are not tied, so let me know if we can > research something. We had a similar case for the NFS server, and I added a script that does the same monitoring as the RA, but logs what the command outputs in case the output changed. Unfortunately I did not see the error since I added the script ;-) > > And have a nice day! Regards, Ulrich > > On Mon, Oct 7, 2019 at 7:21 PM Jan Pokorný wrote: > >> Donat, >> >> On 07/10/19 09:24 -0500, Ken Gaillot wrote: >> > If this always happens when the VM is being snapshotted, you can put >> > the cluster in maintenance mode (or even unmanage just the IP >> > resource) while the snapshotting is happening. I don't know of any >> > reason why snapshotting would affect only an IP, though. >> >> it might be interesting if you could share the details to grow the >> shared knowledge and experience in case there are some instances of >> these problems reported in the future. >> >> In particular, it'd be interesting to hear: >> >> - hypervisor >> >> - VM OS + if plain oblivious to running virtualized, >> or "the optimal arrangement" (e.g., specialized drivers, virtio, >> "guest additions", etc.) >> >> (I think IPaddr2 is iproute2-only, hence in turn, VM OS must be Linux) >> >> Of course, there might be more specific things to look at if anyone >> here is an expert with particular hypervisor technology and the way >> the networking works with it (no, not me at all). >> >> -- >> Poki >> ___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > > > > -- > > Best regards, > Donat Zenichev ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Ocassionally IPaddr2 resource fails to start
Hello and sorry for soo late response of mine, I somehow missed your answer. Sure let me share a bit of useful information on the count. First of all the system specific things are: - Hypervisor is a usual VMware product - VSphere - VMs OS is: Ubuntu 18.04 LTS - Pacemaker is of version: 1.1.18-0ubuntu1.1 And yes it's IProute, that has a version - 4.15.0-2ubuntu1 To be mentioned that after I moved to another way of handling this (with set failure-timeout ) I haven't seen any errors so far, on-fail action still remains "restart". But it's obvious, failure-timeout just clears all fail counters for me, so I don't see any fails now. Another thing to be mentioned, that monitor functionality for IPaddr2 resource was failing in the years past as well, I just didn't pay much attention on that. That time VM machines under my control were working over Ubuntu 14.04 and hypervisor was - Proxmox of the branch 5+ (cannot exactly remember the version, perhaps that was 5.4+). For one this could be a critical case indeed, since sometimes an absence of IP address (for a certain DB for e.g. with loading of hundreds of thousands SQL requests) can lead to a huge out age. I don't have the first idea of how to investigate this further. But, I have a staging setup where my hands are not tied, so let me know if we can research something. And have a nice day! On Mon, Oct 7, 2019 at 7:21 PM Jan Pokorný wrote: > Donat, > > On 07/10/19 09:24 -0500, Ken Gaillot wrote: > > If this always happens when the VM is being snapshotted, you can put > > the cluster in maintenance mode (or even unmanage just the IP > > resource) while the snapshotting is happening. I don't know of any > > reason why snapshotting would affect only an IP, though. > > it might be interesting if you could share the details to grow the > shared knowledge and experience in case there are some instances of > these problems reported in the future. > > In particular, it'd be interesting to hear: > > - hypervisor > > - VM OS + if plain oblivious to running virtualized, > or "the optimal arrangement" (e.g., specialized drivers, virtio, > "guest additions", etc.) > > (I think IPaddr2 is iproute2-only, hence in turn, VM OS must be Linux) > > Of course, there might be more specific things to look at if anyone > here is an expert with particular hypervisor technology and the way > the networking works with it (no, not me at all). > > -- > Poki > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Best regards, Donat Zenichev ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: Safe way to stop pacemaker on both nodes of a two node cluster
>>> "Dileep V Nair" schrieb am 20.10.2019 um 17:54 in Nachricht : > Hi, > > I am confused about the best way to stop pacemaker on both nodes of a > two node cluster. The options I know of are > 1. Put the cluster in Maintenance Mode, stop the applications manually and > then stop pacemaker on both nodes. For this I need the application to be > stopped manually I think stopping cluster resources that way is a bad idea, because the cluster when started again thinks the apps are up. There has been a "stop cluster" thread recently. AFAIR there was not perfect solution. Maybe try to find that thread. > 2. Stop pacemaker on one node, wait for all resources to come up on second > node, then stop pacemaker on second node. This might cause a significant > delay because all resources has to come up on second node. I think there is a "stop all resources" somewhere that might avoid that. > > Is there any other way to stop pacemaker on both nodes gracefully ? Not a perfect one, I'm afraid. Regards, Ulrich > Thanks in advance. > > Thanks & Regards > > Dileep Nair > Squad Lead ‑ SAP Base > Togaf Certified Enterprise Architect > IBM Services for Managed Applications > +91 98450 22258 Mobile > dilen...@in.ibm.com > > IBM Services ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: SBD fencing and crashkernel question
>>> Strahil Nikolov schrieb am 20.10.2019 um 01:03 in Nachricht <1223585818.2655058.1571526232...@mail.yahoo.com>: > Hello Community, > I have a question about the stack in newer version compared to our SLES 11 > openais stack.Can someone clarify if a node with SBD will invoke a > crashkernel before self killing ? > According to my tests on SLES 11 ,when another node kills the unresponsive > one - crashkernel is invoked and a dump is present at /var/crash , but if the > node stucks for some reason (naughty admin) - there is no sign of a crash > (checked on the iLO to be sure). > I'm not sure if this behaviour is the same on newer software version (SLES > 12/15) and if I can workaround it - as we still struggle to find the reason > why our clusters fence on a very specific situation (the clusters are using > MDADM raid1-s on a dual-DC environment instead of SAN replication) where > remote DC is unavailable for 20-30s until SAN/Network is rerouted. We have > enabled crashdump on some of the systems , but we are pending a reboot and > then a real DC<->DC connectivity outage to gather valuable info,as corosync > is > using dual-rings and is not affected, SBD is using survive on pacemaker and > we suspect that the nodes suicide. > Best Regards,Strahil Nikolov So basically you want to know why your node is fenced? I couldn't quite understand the environment you set up, nor what types of problems you are seeing. Actually in the time of many gigabytes of RAM is see little sense in crash dumps, because they will just consume a lot of time to get done. Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/