[Group.of.nepali.translators] [Bug 1775165] Re: fanotify07 in LTP syscall test generates kernel trace with T/X kernel
As we have the ubuntu_ltp_syscalls test changed, we're no longer blacklisting this test, I will put this bug back to confirmed. ** Changed in: ubuntu-kernel-tests Status: Fix Released => Confirmed ** Changed in: ubuntu-kernel-tests Assignee: Po-Hsu Lin (cypressyew) => (unassigned) -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1775165 Title: fanotify07 in LTP syscall test generates kernel trace with T/X kernel Status in ubuntu-kernel-tests: Confirmed Status in linux package in Ubuntu: Fix Released Status in linux source package in Trusty: Won't Fix Status in linux source package in Xenial: Won't Fix Bug description: BugLink: https://bugs.launchpad.net/bugs/1775165 [Impact] When userspace tasks which are processing fanotify permission events act incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes the whole notification subsystem to hang. This has been seen in production, and it can also be seen when running the Linux Test Project testsuite, specifically fanotify07. [Fix] Instead of holding the SRCU lock while waiting for userspace to respond, which may never happen, or not in the order we are expecting, we drop the fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then reacquire the lock again when userspace responds. The fixes are from a series of upstream commits: 05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick) 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport) abc77577a669f424c5d0c185b9994f2621c52aa4 (backport) The following are upstream commits necessary for the fixes to function: 35e481761cdc688dbee0ef552a13f49af8eba6cc (backport) 0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick) [Testcase] You can reproduce the problem pretty quickly with the Linux Test Project: Steps (with root): 1. sudo apt-get install git xfsprogs -y 2. git clone --depth=1 https://github.com/linux-test-project/ltp.git 3. cd ltp 4. make autotools 5. ./configure 6. make; make install 7. cd /opt/ltp 8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs 9. ./runltp -f /tmp/jobs On a stock Xenial kernel, the system will hang, and the testcase will look like: <<>> tag=fanotify07 stime=1554326200 cmdline="fanotify07 " contacts="" analysis=exit <<>> tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Cannot kill test processes! Congratulation, likely test hit a kernel bug. Exitting uncleanly... <<>> initiation_status="ok" duration=350 termination_type=exited termination_id=1 corefile=no cutime=0 cstime=0 <<>> Looking at dmesg, we see the following call stack [ 790.772792] LTP: starting fanotify07 (fanotify07 ) [ 960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds. [ 960.140867] Not tainted 4.4.0-142-generic #168-Ubuntu [ 960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 960.141498] fsnotify_mark D 8800b6703c98 036 2 0x [ 960.141516] 8800b6703c98 88013a558a00 8800b7797000 8800b66f8000 [ 960.141524] 8800b6704000 7fff 8800b6703de0 8800b66f8000 [ 960.141528] 8800b6703cb0 8185cb45 8800b6703de8 [ 960.141532] Call Trace: [ 960.141580] [] schedule+0x35/0x80 [ 960.141588] [] schedule_timeout+0x1b4/0x270 [ 960.141617] [] ? mod_timer+0x10c/0x240 [ 960.141621] [] ? __schedule+0x30d/0x810 [ 960.141625] [] wait_for_completion+0xb2/0x190 [ 960.141636] [] ? wake_up_q+0x70/0x70 [ 960.141641] [] __synchronize_srcu+0x100/0x1a0 [ 960.141645] [] ? trace_raw_output_rcu_utilization+0x60/0x60 [ 960.141664] [] ? fsnotify_put_mark+0x40/0x40 [ 960.141669] [] synchronize_srcu+0x24/0x30 [ 960.141672] [] fsnotify_mark_destroy+0x84/0x130 [ 960.141680] [] ? wake_atomic_t_function+0x60/0x60 [ 960.141691] [] kthread+0xe7/0x100 [ 960.141694] [] ? __schedule+0x301/0x810 [ 960.141699] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 960.141703] [] ret_from_fork+0x55/0x80 [ 960.141706] [] ? kthread_create_on_node+0x1e0/0x1e0 The vanilla 4.4 kernel also shows the same call stack. On a patched kernel, the test will pass successfully, and there will be no messages in dmesg. [Regression Potential] This makes modifications to how
[Group.of.nepali.translators] [Bug 1775165] Re: fanotify07 in LTP syscall test generates kernel trace with T/X kernel
Test blacklisted for older kernels https://kernel.ubuntu.com/git/ubuntu/autotest-client-tests.git/commit/?id=1e99a9a95ba6e99440b46b9f0911fbd63d4dc95f ** Changed in: ubuntu-kernel-tests Status: New => Fix Released ** Changed in: ubuntu-kernel-tests Assignee: (unassigned) => Po-Hsu Lin (cypressyew) -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1775165 Title: fanotify07 in LTP syscall test generates kernel trace with T/X kernel Status in ubuntu-kernel-tests: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Trusty: Won't Fix Status in linux source package in Xenial: Won't Fix Bug description: BugLink: https://bugs.launchpad.net/bugs/1775165 [Impact] When userspace tasks which are processing fanotify permission events act incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes the whole notification subsystem to hang. This has been seen in production, and it can also be seen when running the Linux Test Project testsuite, specifically fanotify07. [Fix] Instead of holding the SRCU lock while waiting for userspace to respond, which may never happen, or not in the order we are expecting, we drop the fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then reacquire the lock again when userspace responds. The fixes are from a series of upstream commits: 05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick) 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport) abc77577a669f424c5d0c185b9994f2621c52aa4 (backport) The following are upstream commits necessary for the fixes to function: 35e481761cdc688dbee0ef552a13f49af8eba6cc (backport) 0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick) [Testcase] You can reproduce the problem pretty quickly with the Linux Test Project: Steps (with root): 1. sudo apt-get install git xfsprogs -y 2. git clone --depth=1 https://github.com/linux-test-project/ltp.git 3. cd ltp 4. make autotools 5. ./configure 6. make; make install 7. cd /opt/ltp 8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs 9. ./runltp -f /tmp/jobs On a stock Xenial kernel, the system will hang, and the testcase will look like: <<>> tag=fanotify07 stime=1554326200 cmdline="fanotify07 " contacts="" analysis=exit <<>> tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Cannot kill test processes! Congratulation, likely test hit a kernel bug. Exitting uncleanly... <<>> initiation_status="ok" duration=350 termination_type=exited termination_id=1 corefile=no cutime=0 cstime=0 <<>> Looking at dmesg, we see the following call stack [ 790.772792] LTP: starting fanotify07 (fanotify07 ) [ 960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds. [ 960.140867] Not tainted 4.4.0-142-generic #168-Ubuntu [ 960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 960.141498] fsnotify_mark D 8800b6703c98 036 2 0x [ 960.141516] 8800b6703c98 88013a558a00 8800b7797000 8800b66f8000 [ 960.141524] 8800b6704000 7fff 8800b6703de0 8800b66f8000 [ 960.141528] 8800b6703cb0 8185cb45 8800b6703de8 [ 960.141532] Call Trace: [ 960.141580] [] schedule+0x35/0x80 [ 960.141588] [] schedule_timeout+0x1b4/0x270 [ 960.141617] [] ? mod_timer+0x10c/0x240 [ 960.141621] [] ? __schedule+0x30d/0x810 [ 960.141625] [] wait_for_completion+0xb2/0x190 [ 960.141636] [] ? wake_up_q+0x70/0x70 [ 960.141641] [] __synchronize_srcu+0x100/0x1a0 [ 960.141645] [] ? trace_raw_output_rcu_utilization+0x60/0x60 [ 960.141664] [] ? fsnotify_put_mark+0x40/0x40 [ 960.141669] [] synchronize_srcu+0x24/0x30 [ 960.141672] [] fsnotify_mark_destroy+0x84/0x130 [ 960.141680] [] ? wake_atomic_t_function+0x60/0x60 [ 960.141691] [] kthread+0xe7/0x100 [ 960.141694] [] ? __schedule+0x301/0x810 [ 960.141699] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 960.141703] [] ret_from_fork+0x55/0x80 [ 960.141706] [] ? kthread_create_on_node+0x1e0/0x1e0 The vanilla 4.4 kernel also shows the same call stack. On a patched kernel, the test will pass successfully, and there will be no messages in dmesg. [Regression Potential] This makes