Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error

2019-10-21 Thread Gang He
Hi Bob,

> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Bob
> Peterson
> Sent: 2019年10月21日 21:02
> To: Cluster Labs - All topics related to open-source clustering welcomed
> 
> Subject: Re: [ClusterLabs] gfs2: fsid=:work.3: fatal: filesystem 
> consistency
> error
> 
> - Original Message -
> > Hello List,
> >
> > I got gfs2 file system consistency error from one user, who is using
> > kernel 4.12.14-95.29-default on SLE12SP4(x86_64).
> > The error message is as below,
> > 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2:
> > fsid=:work.3: fatal: filesystem consistency error
> > 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234]
> inode = 280
> > 342097926
> > 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234]
> function =
> > gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459
> > 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2:
> > fsid=:work.3: about to withdraw this file system
> >
> > I cat the super.c file, the related code is,
> > 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
> > 1452 {
> > 1453 struct gfs2_sbd *sdp = GFS2_SB(>i_inode);
> > 1454 struct gfs2_rgrpd *rgd;
> > 1455 struct gfs2_holder gh;
> > 1456 int error;
> > 1457
> > 1458 if (gfs2_get_inode_blocks(>i_inode) != 1) {
> > 1459 gfs2_consist_inode(ip);   <<== here
> > 1460 return -EIO;
> > 1461 }
> >
> >
> > It looks the upstream has fixed this bug? who can help to point out
> > which patches to be needed for back-port?
> >
> > Thanks
> > Gang
> 
> Hi,
> 
> Yes, we have made lots of patches since the 4.12 kernel, some of which may
> be relevant. However, that error often indicates file system corruption.
> (It means the block count for a dinode became corrupt.)
> 
> I've been working on a set of problems caused whenever gfs2 replays one of
> its journals during recovery, with a wide variety of symptoms, including that
> one. So it might be one of those. Some of my resulting patches are already
> pushed to upstream, but I'm not yet at the point where I can push them all.
> 
> I recommend doing a fsck.gfs2 on the volume to ensure consistency.

The customer has repaired it using fsck.gfs2, however every time the 
application workload starts (concurrent writing), 
the filesystem becomes inaccessible, causing also a stop operation failure of 
the app resource, consequently causing a fence.
Do you have any suggestion in this case? It looks there is a serious bug in 
case concurrent writing with some stress.

Thanks
Gang 

> 
> Regards,
> 
> Bob Peterson
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: Coming in Pacemaker 2.0.3: crm_mon output changes

2019-10-21 Thread Ken Gaillot
On Wed, 2019-10-16 at 08:08 +0200, Ulrich Windl wrote:

> > > > Why not replace "--weg-cgi" with "--output-format=cgi"?
> > 
> > CGI output is identical to HTML, with just a header change, so it
> > was
> > logically more of an option to the HTML implementation rather than
> > a
> > separate one.
> > 
> > With the new approach, each format type may define any additional
> > options to modify its behavior, and all tools automatically inherit
> > those options. These will be grouped together in the help/man page.
> > For
> > example, the HTML option help is:
> > 
> > Output Options (html):
> >   --output-cgi  Add text needed to use output
> > in a CGI 
> > program
> >   --output-meta-refresh=SECONDS How often to refresh
> 
> God bless the long options, but considering that the only thing that
> is
> refreshed in crm_mon's output is... well the output,,, why not just
> have
> --refresh or --refresh-interval.

One of the goals is to have options that are consistent across all
tools. We came up with the "--output-" prefix to make it easy to avoid
conflicts with existing/future tool options.

However, I think you're right that it's confusing. I'm thinking that
instead, we can reserve each of the format types as an option prefix.
For example, for html it would become:

Output Options (html):
  --html-cgi Add text needed to use output in a CGI program
  --html-stylesheet=URI  Link to an external CSS stylesheet
  --html-title=TITLE Page title

which I think is a little shorter and more intuitive. There are a few
existing --xml-* options we'd have to work around but I don't think
that's a problem.

Does that make more sense?

BTW we decided to get rid of --output-meta-refresh altogether, and just
continue using the existing --interval option for that purpose.

> Also it wouldn't be too har
> d (if there's any demand) to allow suffixes like
> 's' for seconds, 'm' for minutes, and most likely more do not make
> sense for a
> refresh interval.

Actually it already does, it's just not in the help description. We'll
update the help.


> > > > When called with ‑‑as‑xml, crm_mon's XML output will be
> > > > identical
> > > > to
> > > > previous versions. When called with the new ‑‑output‑as=xml
> > > > option,
> > > > it
> > > > will be slightly different: the outmost element will be a
> > > >  > > > result> element, which will be consistent across all tools. The
> > > > old
> > > > XML
> > > 
> > > Why not as simple "status" element? "-result" doesn't really add
> > > anything
> > > useful.
> > 
> > We wanted the design to allow for future flexibility in how users
> > ask
> > pacemaker to do something. The XML output would be the same whether
> > the
> > request came from a command-line tool, GUI, C API client
> > application,
> > REST API client, or any other future interface. The idea is that
> >  might be a response to a .
> 
> But most likely any response will be a kind of result, so why have
> "result"
> explicitly? Also as it's all about pacemaker, why have "pacemaker" in
> it?
> (Remember how easy it was to get rid of "heartbeat"? ;-))
> So my argument for "status" simply is that the data describes the
> status.

The idea is that if the output is saved to a file, someone looking at
that file later could easily figure out where it came from, even
without any other context.

> > All of the format options start with "--output-" so we can reserve
> > those option names across all tools.
> 
> Do you actually have a big matrix of all options available across the
> tools?
> I'd like to see!

Me too. :) Not yet, we just grep for a new option name we're thinking
of using. That's why we went with the "--output-" prefix, it was easy
to make them unique. :)
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error

2019-10-21 Thread Bob Peterson
- Original Message -
> Hello List,
> 
> I got gfs2 file system consistency error from one user, who is using kernel
> 4.12.14-95.29-default on SLE12SP4(x86_64).
> The error message is as below,
> 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2:
> fsid=:work.3: fatal: filesystem consistency error
> 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234]   inode = 280
> 342097926
> 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234]   function =
> gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459
> 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2:
> fsid=:work.3: about to withdraw this file system
> 
> I cat the super.c file, the related code is,
> 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
> 1452 {
> 1453 struct gfs2_sbd *sdp = GFS2_SB(>i_inode);
> 1454 struct gfs2_rgrpd *rgd;
> 1455 struct gfs2_holder gh;
> 1456 int error;
> 1457
> 1458 if (gfs2_get_inode_blocks(>i_inode) != 1) {
> 1459 gfs2_consist_inode(ip);   <<== here
> 1460 return -EIO;
> 1461 }
> 
> 
> It looks the upstream has fixed this bug? who can help to point out which
> patches to be needed for back-port?
>  
> Thanks
> Gang

Hi,

Yes, we have made lots of patches since the 4.12 kernel, some of which may be
relevant. However, that error often indicates file system corruption.
(It means the block count for a dinode became corrupt.)

I've been working on a set of problems caused whenever gfs2 replays one
of its journals during recovery, with a wide variety of symptoms, including
that one. So it might be one of those. Some of my resulting patches are already
pushed to upstream, but I'm not yet at the point where I can push them all.

I recommend doing a fsck.gfs2 on the volume to ensure consistency.

Regards,

Bob Peterson

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: SBD fencing and crashkernel question

2019-10-21 Thread Klaus Wenninger
On 10/21/19 8:31 AM, Ulrich Windl wrote:
 Strahil Nikolov  schrieb am 20.10.2019 um 01:03 in
> Nachricht <1223585818.2655058.1571526232...@mail.yahoo.com>:
>> Hello Community,
>> I have a question about the stack in newer version compared to our SLES 11 
>> openais stack.Can someone clarify if a node with SBD will invoke a 
>> crashkernel before self killing ?
>> According to my tests on SLES 11 ,when another node kills the unresponsive 
>> one - crashkernel is invoked and a dump is present at /var/crash , but if 
>> the 
>> node stucks for some reason (naughty admin) - there is no sign of a crash 
>> (checked on the iLO to be sure).
Can't help with SLES-specifics here but the difference between
the 2 cases you describe is probably that in one case sbd-daemon
is still alive enough to call a reboot, write on sysrq-trigger or whatever
is configured (using poison-pill? you can configure what should
happen if sbd-daemon is triggering the timeout-action - with current
sbd even in a consistent manner as long as sbd-daemon is alive.)
In the other case it is probably a hardware-watchdog kicking in.

Regards,
Klaus


>> I'm not sure if this behaviour is the same on newer software version (SLES 
>> 12/15) and if I can workaround it - as we still struggle to find the reason 
>> why our clusters fence on a very specific situation (the clusters are using 
>> MDADM raid1-s on a dual-DC environment instead of SAN replication) where 
>> remote DC is unavailable for 20-30s until SAN/Network is rerouted. We have 
>> enabled crashdump on some of the systems , but we are pending a reboot and 
>> then a real DC<->DC connectivity outage to gather valuable info,as corosync 
>> is 
>> using dual-rings and is not affected, SBD is using survive on pacemaker and 
>> we suspect that the nodes suicide.
>> Best Regards,Strahil Nikolov
> So basically you want to know why your node is fenced? I couldn't quite  
> understand the environment you set up, nor what types of problems you are 
> seeing.
> Actually in the time of many gigabytes of RAM is see little sense in crash 
> dumps, because they will just consume a lot of time to get done.
>
> Regards,
> Ulrich
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Ocassionally IPaddr2 resource fails to start

2019-10-21 Thread Ulrich Windl
>>> Donat Zenichev  schrieb am 21.10.2019 um 09:12
in
Nachricht
:
> Hello and sorry for soo late response of mine, I somehow missed your
answer.
> 
> Sure let me share a bit of useful information on the count.
> First of all the system specific things are:
> - Hypervisor is a usual VMware product - VSphere
> - VMs OS is: Ubuntu 18.04 LTS
> - Pacemaker is of version: 1.1.18-0ubuntu1.1
> 
> And yes it's IProute, that has a version - 4.15.0-2ubuntu1
> 
> To be mentioned that after I moved to another way of handling this (with
> set failure-timeout ) I haven't seen any errors so far, on-fail action
> still remains "restart".
> But it's obvious, failure-timeout just clears all fail counters for me, so
> I don't see any fails now.

Failures should be logged in logfiles still. failure-timeout also does not
prevent a restart on failure; it just extends the number of restart attempts.

> 
> Another thing to be mentioned, that monitor functionality for IPaddr2
> resource was failing in the years past as well, I just didn't pay much
> attention on that.
> That time VM machines under my control were working over Ubuntu 14.04 and
> hypervisor was - Proxmox of the branch 5+ (cannot exactly remember the
> version, perhaps that was 5.4+).
> 
> For one this could be a critical case indeed, since sometimes an absence of
> IP address (for a certain DB for e.g. with loading of hundreds of thousands
> SQL requests) can lead to a huge out age.
> I don't have the first idea of how to investigate this further. But, I have
> a staging setup where my hands are not tied, so let me know if we can
> research something.

We had a similar case for the NFS server, and I added a script that does the
same monitoring as the RA, but logs what the command outputs in case the output
changed. Unfortunately I did not see the error since I added the script ;-)

> 
> And have a nice day!

Regards,
Ulrich

> 
> On Mon, Oct 7, 2019 at 7:21 PM Jan Pokorný  wrote:
> 
>> Donat,
>>
>> On 07/10/19 09:24 -0500, Ken Gaillot wrote:
>> > If this always happens when the VM is being snapshotted, you can put
>> > the cluster in maintenance mode (or even unmanage just the IP
>> > resource) while the snapshotting is happening. I don't know of any
>> > reason why snapshotting would affect only an IP, though.
>>
>> it might be interesting if you could share the details to grow the
>> shared knowledge and experience in case there are some instances of
>> these problems reported in the future.
>>
>> In particular, it'd be interesting to hear:
>>
>> - hypervisor
>>
>> - VM OS + if plain oblivious to running virtualized,
>>   or "the optimal arrangement" (e.g., specialized drivers, virtio,
>>   "guest additions", etc.)
>>
>> (I think IPaddr2 is iproute2-only, hence in turn, VM OS must be Linux)
>>
>> Of course, there might be more specific things to look at if anyone
>> here is an expert with particular hypervisor technology and the way
>> the networking works with it (no, not me at all).
>>
>> --
>> Poki
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> -- 
> 
> Best regards,
> Donat Zenichev



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Ocassionally IPaddr2 resource fails to start

2019-10-21 Thread Donat Zenichev
Hello and sorry for soo late response of mine, I somehow missed your answer.

Sure let me share a bit of useful information on the count.
First of all the system specific things are:
- Hypervisor is a usual VMware product - VSphere
- VMs OS is: Ubuntu 18.04 LTS
- Pacemaker is of version: 1.1.18-0ubuntu1.1

And yes it's IProute, that has a version - 4.15.0-2ubuntu1

To be mentioned that after I moved to another way of handling this (with
set failure-timeout ) I haven't seen any errors so far, on-fail action
still remains "restart".
But it's obvious, failure-timeout just clears all fail counters for me, so
I don't see any fails now.

Another thing to be mentioned, that monitor functionality for IPaddr2
resource was failing in the years past as well, I just didn't pay much
attention on that.
That time VM machines under my control were working over Ubuntu 14.04 and
hypervisor was - Proxmox of the branch 5+ (cannot exactly remember the
version, perhaps that was 5.4+).

For one this could be a critical case indeed, since sometimes an absence of
IP address (for a certain DB for e.g. with loading of hundreds of thousands
SQL requests) can lead to a huge out age.
I don't have the first idea of how to investigate this further. But, I have
a staging setup where my hands are not tied, so let me know if we can
research something.

And have a nice day!

On Mon, Oct 7, 2019 at 7:21 PM Jan Pokorný  wrote:

> Donat,
>
> On 07/10/19 09:24 -0500, Ken Gaillot wrote:
> > If this always happens when the VM is being snapshotted, you can put
> > the cluster in maintenance mode (or even unmanage just the IP
> > resource) while the snapshotting is happening. I don't know of any
> > reason why snapshotting would affect only an IP, though.
>
> it might be interesting if you could share the details to grow the
> shared knowledge and experience in case there are some instances of
> these problems reported in the future.
>
> In particular, it'd be interesting to hear:
>
> - hypervisor
>
> - VM OS + if plain oblivious to running virtualized,
>   or "the optimal arrangement" (e.g., specialized drivers, virtio,
>   "guest additions", etc.)
>
> (I think IPaddr2 is iproute2-only, hence in turn, VM OS must be Linux)
>
> Of course, there might be more specific things to look at if anyone
> here is an expert with particular hypervisor technology and the way
> the networking works with it (no, not me at all).
>
> --
> Poki
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 

Best regards,
Donat Zenichev
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Safe way to stop pacemaker on both nodes of a two node cluster

2019-10-21 Thread Ulrich Windl
>>> "Dileep V Nair"  schrieb am 20.10.2019 um 17:54 in
Nachricht
:

> Hi,
> 
>   I am confused about the best way to stop pacemaker on both nodes of a
> two node cluster. The options I know of are
> 1. Put the cluster in Maintenance Mode, stop the applications manually and
> then stop pacemaker on both nodes. For this I need the application to  be
> stopped manually

I think stopping cluster resources that way is a bad idea, because the cluster
when started again thinks the apps are up.

There has been a "stop cluster" thread recently. AFAIR there was not perfect
solution. Maybe try to find that thread.

> 2. Stop pacemaker on one node, wait for all resources to come up on second
> node, then stop pacemaker on second node. This might cause a significant
> delay because all resources has to come up on second node.

I think there is a "stop  all resources" somewhere that might avoid that.

> 
>   Is there any other way to stop pacemaker on both nodes gracefully ?

Not a perfect one, I'm afraid.

Regards,
Ulrich


> Thanks in advance.
> 
> Thanks & Regards
> 
> Dileep Nair
> Squad Lead ‑ SAP Base
> Togaf Certified Enterprise Architect
> IBM Services for Managed Applications
> +91 98450 22258 Mobile
> dilen...@in.ibm.com 
> 
> IBM Services



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: SBD fencing and crashkernel question

2019-10-21 Thread Ulrich Windl
>>> Strahil Nikolov  schrieb am 20.10.2019 um 01:03 in
Nachricht <1223585818.2655058.1571526232...@mail.yahoo.com>:
> Hello Community,
> I have a question about the stack in newer version compared to our SLES 11 
> openais stack.Can someone clarify if a node with SBD will invoke a 
> crashkernel before self killing ?
> According to my tests on SLES 11 ,when another node kills the unresponsive 
> one - crashkernel is invoked and a dump is present at /var/crash , but if the 
> node stucks for some reason (naughty admin) - there is no sign of a crash 
> (checked on the iLO to be sure).
> I'm not sure if this behaviour is the same on newer software version (SLES 
> 12/15) and if I can workaround it - as we still struggle to find the reason 
> why our clusters fence on a very specific situation (the clusters are using 
> MDADM raid1-s on a dual-DC environment instead of SAN replication) where 
> remote DC is unavailable for 20-30s until SAN/Network is rerouted. We have 
> enabled crashdump on some of the systems , but we are pending a reboot and 
> then a real DC<->DC connectivity outage to gather valuable info,as corosync 
> is 
> using dual-rings and is not affected, SBD is using survive on pacemaker and 
> we suspect that the nodes suicide.
> Best Regards,Strahil Nikolov

So basically you want to know why your node is fenced? I couldn't quite  
understand the environment you set up, nor what types of problems you are 
seeing.
Actually in the time of many gigabytes of RAM is see little sense in crash 
dumps, because they will just consume a lot of time to get done.

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/