Public bug reported:

What Happened
-------------

I have a setup with my home NAS server mounted via CIFS to an Ubuntu
18.04 VPS over OpenVPN. The CIFS share is served by Nginx as an open
file directory, with fscache/cachefilesd being used to cache the files
locally, to reduce the load on my home connection.

I was using this setup for web development by editing files directly on
the CIFS share from another machine (mounted via LAN, independently of
the OpenVPN server). After making changes to an HTML file, I noticed
that the Nginx server was no longer responding.

I connected to the server via SSH to diagnose the issue and found that
trying to perform a file listing (using ls) on the fileshare mount froze
my shell, so I opened another SSH connection. I ran dmesg, expecting to
find a CIFS timeout error, but instead there was an fscache error trace.
Thinking it would help resolve the issue, I ran `systemd restart
cachefilesd` to restart cachefilesd. My SSH connection froze again, but
this time the server was no longer responding to SSH connections and
pings. I was forced to perform a hard reset to restore operation.

Upon further inspection of the logs, I noticed that the NAS reconnected
to the OpenVPN server between my first and second SSH connection, so
it's possible that a network issue caused the initial fscache error. I
have no way of confirming this, that's why I put "presumed" in the
title.

I suspect the issue here might be a race condition when invalidating a
cache object (the updated HTML file) during a network issue.

What I expected to happen
-------------------------

Accessing an updated file shouldn't cause an fscache error, even when a
network error occurs.

Restarting cachefilesd shouldn't cause a fatal panic, even when a
network error occurs.

Machine information
-------------------

/proc/version_signature: Ubuntu 4.15.0-20.21-generic 4.15.17

lsb_release: 
Description: Ubuntu 18.04.1 LTS 
Release:        18.04

apt-cache policy linux-image-4.15.0-20-generic:
linux-image-4.15.0-20-generic:
  Installed: 4.15.0-20.21
  Candidate: 4.15.0-20.21
  Version table:
 *** 4.15.0-20.21 500
        500 http://gb.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status

apt-cache policy cachefilesd cachefilesd:                                       
                                                                                
                                                                                
                                                          
cachefilesd:
  Installed: 0.10.10-0.1
  Candidate: 0.10.10-0.1
  Version table:
 *** 0.10.10-0.1 500
        500 http://gb.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages
        100 /var/lib/dpkg/status

apt-cache policy libsmbclient:
libsmbclient:
  Installed: (none)
  Candidate: 2:4.7.6+dfsg~ubuntu-0ubuntu2.4
  Version table:
     2:4.7.6+dfsg~ubuntu-0ubuntu2.4 500
        500 http://gb.archive.ubuntu.com/ubuntu bionic-updates/main amd64 
Packages
     2:4.7.6+dfsg~ubuntu-0ubuntu2.2 500
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 
Packages
     2:4.7.6+dfsg~ubuntu-0ubuntu2 500
        500 http://gb.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

apt-cache policy cifs-utils:
cifs-utils:
  Installed: 2:6.8-1
  Candidate: 2:6.8-1
  Version table:
 *** 2:6.8-1 500
        500 http://gb.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status


The VPS provider is UK2, I believe they use KVM.

journalctl output
-----------------

Private and irrelevant information removed.

Nov 26 00:30:31 hostname kernel: [kworke] preemptive burial: OBJ66
[DROP_OBJECT] 000000004b2987a1

--- [snip] (CacheFiles error + kernel trace) ---

Nov 26 00:34:01 hostname kernel: CacheFiles: Error: Object already active
Nov 26 00:34:01 hostname kernel: CacheFiles: object: OBJ6d
Nov 26 00:34:01 hostname kernel: CacheFiles: objstate=LOOK_UP_OBJECT fl=18 
wbusy=2 ev=0[0]
Nov 26 00:34:01 hostname kernel: CacheFiles: ops=0 inp=0 exc=0
Nov 26 00:34:01 hostname kernel: CacheFiles: parent=00000000316fbe48
Nov 26 00:34:01 hostname kernel: CacheFiles: cookie=000000001fa898cd 
[pr=00000000f0382671 nd=00000000ba69f109 fl=22]
Nov 26 00:34:01 hostname kernel: CacheFiles: key=[8] 'dd05000000000000'
Nov 26 00:34:01 hostname kernel: ------------[ cut here ]------------
Nov 26 00:34:01 hostname kernel: kernel BUG at 
/build/linux-5s7Xkn/linux-4.15.0/fs/cachefiles/namei.c:164!
Nov 26 00:34:01 hostname kernel: invalid opcode: 0000 [#1] SMP PTI
Nov 26 00:34:01 hostname kernel: Modules linked in: binfmt_misc cmac arc4 md4 
nls_utf8 cifs ccm cachefiles fscache ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 
xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 
nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_addrtype ipt_MASQUERADE 
nf_nat_masquerade_ip
Nov 26 00:34:01 hostname kernel:  async_xor async_tx xor raid6_pq libcrc32c 
raid1 raid0 multipath linear hid_generic usbhid hid cirrus ttm psmouse 
drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops 
virtio_blk drm i2c_piix4 pata_acpi floppy
Nov 26 00:34:01 hostname kernel: CPU: 0 PID: 22679 Comm: kworker/u64:2 Not 
tainted 4.15.0-20-generic #21-Ubuntu
Nov 26 00:34:01 hostname kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 
01/01/2011
Nov 26 00:34:01 hostname kernel: Workqueue: fscache_object 
fscache_object_work_func [fscache]
Nov 26 00:34:01 hostname kernel: RIP: 
0010:cachefiles_walk_to_object+0x820/0x8b0 [cachefiles]
Nov 26 00:34:01 hostname kernel: RSP: 0018:ffffa36bc5a87d40 EFLAGS: 00010202
Nov 26 00:34:01 hostname kernel: RAX: ffff96b739444801 RBX: ffff96b732f5e3a8 
RCX: 0000000000847bda
Nov 26 00:34:01 hostname kernel: RDX: 0000000000847bd9 RSI: ffff96b73be27140 
RDI: ffff96b73b807000
Nov 26 00:34:01 hostname kernel: RBP: ffffa36bc5a87de8 R08: 0000000000027140 
R09: ffffffffc0709f4b
Nov 26 00:34:01 hostname kernel: R10: ffffd9cc40e51100 R11: 0000000000010101 
R12: ffff96b732f5d200
Nov 26 00:34:01 hostname kernel: R13: ffff96b732f5d310 R14: ffff96b70730ec00 
R15: ffff96b732f5e3a8
Nov 26 00:34:01 hostname kernel: FS:  0000000000000000(0000) 
GS:ffff96b73be00000(0000) knlGS:0000000000000000
Nov 26 00:34:01 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Nov 26 00:34:01 hostname kernel: CR2: 00007ff8ad9d9f60 CR3: 000000002d522000 
CR4: 00000000000006f0
Nov 26 00:34:01 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 
DR2: 0000000000000000
Nov 26 00:34:01 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 
DR7: 0000000000000400
Nov 26 00:34:01 hostname kernel: Call Trace:
Nov 26 00:34:01 hostname kernel:  ? wait_woken+0x80/0x80
Nov 26 00:34:01 hostname kernel:  cachefiles_lookup_object+0x4d/0xe0 
[cachefiles]
Nov 26 00:34:01 hostname kernel:  fscache_look_up_object+0xbd/0x1c0 [fscache]
Nov 26 00:34:01 hostname kernel:  fscache_object_work_func+0x63/0x1f0 [fscache]
Nov 26 00:34:01 hostname kernel:  process_one_work+0x1de/0x410
Nov 26 00:34:01 hostname kernel:  worker_thread+0x32/0x410
Nov 26 00:34:01 hostname kernel:  kthread+0x121/0x140
Nov 26 00:34:01 hostname kernel:  ? process_one_work+0x410/0x410
Nov 26 00:34:01 hostname kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Nov 26 00:34:01 hostname kernel:  ? do_syscall_64+0x73/0x130
Nov 26 00:34:01 hostname kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Nov 26 00:34:01 hostname kernel:  ret_from_fork+0x35/0x40
Nov 26 00:34:01 hostname kernel: Code: 1e ca 48 c7 c7 70 b1 70 c0 e8 ed 58 1e 
ca 0f 0b e8 d6 3a 18 ca 48 c7 c7 30 b2 70 c0 e8 da 58 1e ca 31 f6 4c 89 e7 e8 
e8 2a 00 00 <0f> 0b 80 bb 1c 01 00 00 00 0f 85 6c f9 ff ff 3d 00 00 20 00 0f 
Nov 26 00:34:01 hostname kernel: RIP: cachefiles_walk_to_object+0x820/0x8b0 
[cachefiles] RSP: ffffa36bc5a87d40
Nov 26 00:34:01 hostname kernel: ---[ end trace 788b2ad4e37538d5 ]---

--- [snip] (fileshare server reconnects to OpenVPN) ---

Nov 26 00:34:43 hostname openvpn[15768]: [snip]:[snip] TLS: tls_process: killed 
expiring key
Nov 26 00:34:46 hostname openvpn[15768]: [snip]:[snip] TLS: soft reset sec=0 
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY OK: depth=1, 
CN=[snip]
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY KU OK
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Validating certificate 
extended key usage
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] ++ Certificate has EKU 
(str) TLS Web Client Authentication, expects TLS Web Client Authentication
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY EKU OK
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY OK: depth=0, 
CN=[snip]
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] peer info: IV_VER=2.3.11
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] peer info: IV_PLAT=linux
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] peer info: IV_PROTO=2
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Outgoing Data Channel: 
Cipher 'AES-256-CBC' initialized with 256 bit key
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Outgoing Data Channel: 
Using 256 bit message hash 'SHA256' for HMAC authentication
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Incoming Data Channel: 
Cipher 'AES-256-CBC' initialized with 256 bit key
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Incoming Data Channel: 
Using 256 bit message hash 'SHA256' for HMAC authentication
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Control Channel: 
TLSv1.2, cipher TLSv1.2 DHE-RSA-AES256-GCM-SHA384, 2048 bit RSA

--- [snip] (i connect via ssh) ---

Nov 26 00:35:25 hostname sshd[23061]: Connection from [snip] port [snip] on 
[snip] port [snip]
Nov 26 00:35:25 hostname sshd[23061]: Postponed publickey for tjg from [snip] 
port [snip] ssh2 [preauth]
Nov 26 00:35:26 hostname sshd[23061]: Accepted publickey for tjg from [snip] 
port [snip] ssh2: RSA SHA256:/[snip]
Nov 26 00:35:26 hostname sshd[23061]: pam_unix(sshd:session): session opened 
for user tjg by (uid=0)
Nov 26 00:35:26 hostname systemd-logind[604]: New session 214 of user tjg.
Nov 26 00:35:26 hostname systemd[1]: Started Session 214 of user tjg.
Nov 26 00:35:27 hostname sshd[23061]: User child is on pid 23141
Nov 26 00:35:27 hostname sshd[23141]: Starting session: shell on pts/1 for tjg 
from [snip] port [snip] id 0

-- [snip] (i try to restart cachefilesd) --

Nov 26 00:35:51 hostname sudo[23179]:      tjg : TTY=pts/1 ; PWD=/home/tjg ; 
USER=root ; COMMAND=/bin/systemctl restart cachefilesd.service
Nov 26 00:35:51 hostname sudo[23179]: pam_unix(sudo:session): session opened 
for user root by tjg(uid=0)
Nov 26 00:35:51 hostname systemd[1]: Stopping LSB: CacheFiles daemon...
Nov 26 00:35:51 hostname cachefilesd[23183]:  * Stopping FilesCache daemon 
cachefilesd
Nov 26 00:35:51 hostname cachefilesd[903]: Daemon Terminated
Nov 26 00:35:51 hostname kernel: CacheFiles: File cache on vda1 unregistering
Nov 26 00:35:51 hostname kernel: FS-Cache: Withdrawing cache "mycache"

--- [crash] (no more logs after this point until hard reset 5 minutes
later) ---

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "output of lspci -vnvn"
   
https://bugs.launchpad.net/bugs/1805062/+attachment/5216296/+files/lspci-vnvn.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805062

Title:
  FSCache kernel panic on presumed CIFS share disconnect

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to