Re: [Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-08-01 Thread Kyle O'Donnell
I tried disabling fstrim on all but one server and had the exact same
issue as I did when cron enabled it on all servers.

- Original Message -
From: "Nick Stallman" <1681...@bugs.launchpad.net>
To: "Kyle O'Donnell" 
Sent: Tuesday, August 1, 2017 7:49:49 PM
Subject: [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

I think we've also had a related issue.
We haven't had any serious corruption but we have had random locks that never 
get released which requires a server reboot to clear.

OCFS2 does support trim, as does our SAN. I think the issue may be related to 
running fstrim in parallel however.
I didn't realise fstrim was in cron.weekly on all 3 servers that had OCFS2 
mounted, causing them to run it at basically the exact same time.

After disabling that when I finally noticed it running at one point I
haven't had any further issues (mind you it's only been a few days).

Running fstrim by default is probably a bad idea on these more advanced 
filesystems since the liklihood of it running multiple times at once is there.
It's safer to assume that the sysadmin knows about their SAN's fstrim 
capability and can schedule it in a more controlled manner.

-- 
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Expired

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Expired

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : 

[Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-08-01 Thread Nick Stallman
I think we've also had a related issue.
We haven't had any serious corruption but we have had random locks that never 
get released which requires a server reboot to clear.

OCFS2 does support trim, as does our SAN. I think the issue may be related to 
running fstrim in parallel however.
I didn't realise fstrim was in cron.weekly on all 3 servers that had OCFS2 
mounted, causing them to run it at basically the exact same time.

After disabling that when I finally noticed it running at one point I
haven't had any further issues (mind you it's only been a few days).

Running fstrim by default is probably a bad idea on these more advanced 
filesystems since the liklihood of it running multiple times at once is there.
It's safer to assume that the sysadmin knows about their SAN's fstrim 
capability and can schedule it in a more controlled manner.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Expired

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-06-19 Thread jdh239
Seeing the same thing with OCFS2 on Oracle VM Server 3.3.3.  Clustered
OCFS2, but on one device and only one node (Clustered when setup,
but then the cluster portion wasn't utilized due to licensing)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Expired

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-06-09 Thread Launchpad Bug Tracker
[Expired for util-linux (Ubuntu) because there has been no activity for
60 days.]

** Changed in: util-linux (Ubuntu)
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Expired

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-04-10 Thread Kyle O'Donnell
It is one device.

We have 2 luns for 2 different ocfs2 filesystems mounted on all servers
(6) in the cluster.  It is presented via fiber channel from our SAN.

I think the issue is that if you run fstrim from all servers which are
mounting the same ocfs2 filesystem at the same time, bad stuff happens.

We are using multipth:

WWPN-THINGEE-HERE  dm-3 TEGILE,INTELLIFLASH
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:0:16 sdb 8:16  active ready running
| `- 1:0:0:16 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  |- 0:0:1:16 sdd 8:48  active ready running
  `- 1:0:1:16 sdh 8:112 active ready running
WWPN-THINGEE-HERE dm-2 TEGILE,INTELLIFLASH
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:1:15 sdc 8:32  active ready running
| `- 1:0:1:15 sdg 8:96  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  |- 0:0:0:15 sda 8:0   active ready running
  `- 1:0:0:15 sde 8:64  active ready running

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Incomplete

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-04-10 Thread Phillip Susi
I'm thinking that the bug is in the OCFS2 filesystem driver.  Since it
can span multiple disks, both local and remote, it can not give a
sensible answer to the FIBMAP ioctl when fstrim asks what blocks a file
is located in.  Please test this by creating a file and checking where
FIBMAP says it is located and see if the data is really there:

echo hello > foo
hdparm --fibmap foo
dd count=1 bs=size if=/dev/dm-2 skip=offset | hd

Where size is whatever the block size of the filesystem is ( 4k? ), and
offset is the block number given by the hdparm call.

Am I correct in assuming this filesystem spans at least two devices?
dm-2 and dm-3?


** Changed in: util-linux (Ubuntu)
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Incomplete

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-04-10 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: util-linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Confirmed

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp