Re: [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-08-01 Thread Kyle O'Donnell
I tried disabling fstrim on all but one server and had the exact same
issue as I did when cron enabled it on all servers.

- Original Message -
From: "Nick Stallman" <1681...@bugs.launchpad.net>
To: "Kyle O'Donnell" 
Sent: Tuesday, August 1, 2017 7:49:49 PM
Subject: [Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

I think we've also had a related issue.
We haven't had any serious corruption but we have had random locks that never 
get released which requires a server reboot to clear.

OCFS2 does support trim, as does our SAN. I think the issue may be related to 
running fstrim in parallel however.
I didn't realise fstrim was in cron.weekly on all 3 servers that had OCFS2 
mounted, causing them to run it at basically the exact same time.

After disabling that when I finally noticed it running at one point I
haven't had any further issues (mind you it's only been a few days).

Running fstrim by default is probably a bad idea on these more advanced 
filesystems since the liklihood of it running multiple times at once is there.
It's safer to assume that the sysadmin knows about their SAN's fstrim 
capability and can schedule it in a more controlled manner.

-- 
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

Status in util-linux package in Ubuntu:
  Expired

Bug description:
  Recently upgraded from trusty to xenial and found that our ocfs2
  filesystems, which are mounted across a number of nodes
  simultaneously, would become corrupt on the weekend:

  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
  [Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): 
ocfs2_validate_gd_self: Group descriptor #516096 has bad signature 
  [Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run 
fsck.ocfs2 once the filesystem is unmounted.
  [Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
  [Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status 
= -30

  We found the cron.weekly job which is pretty close to the timing:
  47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

  # cat /etc/cron.weekly/fstrim 
  #!/bin/sh
  # trim all mounted file systems which support it
  /sbin/fstrim --all || true

  
  We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered

2017-04-10 Thread Kyle O'Donnell
It is one device.

We have 2 luns for 2 different ocfs2 filesystems mounted on all servers
(6) in the cluster.  It is presented via fiber channel from our SAN.

I think the issue is that if you run fstrim from all servers which are
mounting the same ocfs2 filesystem at the same time, bad stuff happens.

We are using multipth:

WWPN-THINGEE-HERE  dm-3 TEGILE,INTELLIFLASH
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:0:16 sdb 8:16  active ready running
| `- 1:0:0:16 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  |- 0:0:1:16 sdd 8:48  active ready running
  `- 1:0:1:16 sdh 8:112 active ready running
WWPN-THINGEE-HERE dm-2 TEGILE,INTELLIFLASH
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:1:15 sdc 8:32  active ready running
| `- 1:0:1:15 sdg 8:96  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  |- 0:0:0:15 sda 8:0   active ready running
  `- 1:0:0:15 sde 8:64  active ready running

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1681410] [NEW] fstrim corrupts ocfs2 filesystems when clustered

2017-04-10 Thread Kyle O'Donnell
Public bug reported:

Recently upgraded from trusty to xenial and found that our ocfs2
filesystems, which are mounted across a number of nodes simultaneously,
would become corrupt on the weekend:

[Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-2): ocfs2_validate_gd_self: 
Group descriptor #516096 has bad signature 
[Sun Apr  9 06:46:35 2017] On-disk corruption discovered. Please run fsck.ocfs2 
once the filesystem is unmounted.
[Sun Apr  9 06:46:35 2017] OCFS2: File system is now read-only.
[Sun Apr  9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = 
-30
[Sun Apr  9 06:46:35 2017] OCFS2: ERROR (device dm-3): ocfs2_validate_gd_self: 
Group descriptor #516096 has bad signature 
[Sun Apr  9 06:46:36 2017] On-disk corruption discovered. Please run fsck.ocfs2 
once the filesystem is unmounted.
[Sun Apr  9 06:46:36 2017] OCFS2: File system is now read-only.
[Sun Apr  9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status = 
-30

We found the cron.weekly job which is pretty close to the timing:
47 6* * 7   roottest -x /usr/sbin/anacron || ( cd / && run-parts 
--report /etc/cron.weekly )

# cat /etc/cron.weekly/fstrim 
#!/bin/sh
# trim all mounted file systems which support it
/sbin/fstrim --all || true


We have disabled this job across our servers running clustered ocfs2 
filesystems.  I think either the utility or the cronjob should ignore ocfs2 
(gfs too?) filesystems.

** Affects: util-linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: fstrim ocfs2

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1681410

Title:
  fstrim corrupts ocfs2 filesystems when clustered

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1358226] Re: kernel and lockd xprt_adjust_timeout rq_timeout

2015-03-17 Thread Kyle O'Donnell
looks like this made it into 3.13.0-48.80

https://launchpad.net/ubuntu/trusty/+source/linux/+changelog

  * LOCKD: Fix a race when initialising nlmsvc_timeout

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1427438

Upgraded a few days ago, haven't seen the errors

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1358226

Title:
  kernel and lockd xprt_adjust_timeout rq_timeout

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1358226/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1358226] Re: kernel and lockd xprt_adjust_timeout rq_timeout

2015-01-19 Thread Kyle O'Donnell
Does anyone know if/when this will make it into the trusty kernel?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1358226

Title:
  kernel and lockd xprt_adjust_timeout rq_timeout

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1358226/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1358226] Re: kernel and lockd xprt_adjust_timeout rq_timeout

2015-01-05 Thread Kyle O'Donnell
Looks like there may be a patch

http://www.spinics.net/lists/linux-nfs/msg48575.html

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1358226

Title:
  kernel and lockd xprt_adjust_timeout rq_timeout

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1358226/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1376245] Re: remove_proc_entry+0x139/0x1b0() -- name 'fs/nfsfs'

2014-10-17 Thread Kyle O'Donnell
We have this same problem.  Just upgraded to -38 , no change:


[184207.109594] [ cut here ]
[184207.109604] WARNING: CPU: 15 PID: 10701 at 
/build/buildd/linux-3.13.0/fs/proc/generic.c:511 remove_proc_entry+0x139/0x1b0()
[184207.109606] name 'fs/nfsfs'
[184207.109608] Modules linked in: nfsv3 ipmi_devintf veth xt_conntrack 
ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables 
autofs4 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache bridge gpio_ich 
dcdbas intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd serio_raw joydev lpc_ich i7core_edac edac_core ipmi_si bonding wmi 8021q 
acpi_power_meter mac_hid garp stp mrp llc ses enclosure hid_generic usbhid 
psmouse hid megaraid_sas bnx2
[184207.109664] CPU: 15 PID: 10701 Comm: kworker/u66:1 Tainted: GW I   
3.13.0-38-generic #65-Ubuntu
[184207.109666] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 
07/23/2013
[184207.109671] Workqueue: netns cleanup_net
[184207.109673]  0009 880bfaefbc80 8171ece7 
880bfaefbcc8
[184207.109679]  880bfaefbcb8 8106773d  
0005
[184207.109683]  a02f58a8 880c0f802db0 0840 
880bfaefbd18
[184207.109687] Call Trace:
[184207.109693]  [] dump_stack+0x45/0x56
[184207.109698]  [] warn_slowpath_common+0x7d/0xa0
[184207.109702]  [] warn_slowpath_fmt+0x4c/0x50
[184207.109707]  [] remove_proc_entry+0x139/0x1b0
[184207.109721]  [] nfs_fs_proc_net_exit+0x62/0x70 [nfs]
[184207.109732]  [] nfs_net_exit+0x12/0x20 [nfs]
[184207.109735]  [] ops_exit_list.isra.1+0x39/0x60
[184207.109739]  [] cleanup_net+0x110/0x250
[184207.109745]  [] process_one_work+0x182/0x450
[184207.109749]  [] worker_thread+0x121/0x410
[184207.109753]  [] ? rescuer_thread+0x430/0x430
[184207.109757]  [] kthread+0xd2/0xf0
[184207.109761]  [] ? kthread_create_on_node+0x1c0/0x1c0
[184207.109765]  [] ret_from_fork+0x7c/0xb0
[184207.109769]  [] ? kthread_create_on_node+0x1c0/0x1c0
[184207.109771] ---[ end trace bad0608c86108035 ]---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1376245

Title:
  remove_proc_entry+0x139/0x1b0() -- name 'fs/nfsfs'

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1376245/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1356809] [NEW] unable to load nvidia kernel module in kernels newer than3.13.0-30

2014-08-14 Thread Kyle O'Donnell
Public bug reported:

I've tried using -31,-32 and -33 revisions of the kernel and eahc time i
am unable to load the kernel module, it errors with some kind of drm
message (when i have time to upgrade and provide debug info i will
update this bug

** Affects: nvidia-graphics-drivers-331-updates (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1356809

Title:
  unable to load nvidia kernel module in kernels newer than3.13.0-30

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-331-updates/+bug/1356809/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs