[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-08-12 Thread Sean Feole
** Tags added: linux-oracle oracle sru-20190722

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Won't Fix

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 88013a558a00 8800b7797000 
8800b66f8000
  [  960.141524]  8800b6704000 7fff 8800b6703de0 
8800b66f8000
  [  960.141528]   8800b6703cb0 8185cb45 
8800b6703de8
  [  960.141532] Call Trace:
  [  960.141580]  [] schedule+0x35/0x80
  [  960.141588]  [] schedule_timeout+0x1b4/0x270
  [  960.141617]  [] ? mod_timer+0x10c/0x240
  [  960.141621]  [] ? __schedule+0x30d/0x810
  [  960.141625]  [] wait_for_completion+0xb2/0x190
  [  960.141636]  [] ? wake_up_q+0x70/0x70
  [  960.141641]  [] __synchronize_srcu+0x100/0x1a0
  [  960.141645]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
  [  960.141664]  [] ? fsnotify_put_mark+0x40/0x40
  [  960.141669]  [] synchronize_srcu+0x24/0x30
  [  960.141672]  [] fsnotify_mark_destroy+0x84/0x130
  [  960.141680]  [] ? wake_atomic_t_function+0x60/0x60
  [  960.141691]  [] kthread+0xe7/0x100
  [  960.141694]  [] ? __schedule+0x301/0x810
  [  960.141699]  [] ? kthread_create_on_node+0x1e0/0x1e0
  [  960.141703]  [] ret_from_fork+0x55/0x80
  [  960.141706]  [] ? kthread_create_on_node+0x1e0/0x1e0

  The vanilla 4.4 kernel also shows the same call stack.

  On a patched kernel, the test will pass successfully, and there will be no
  messages in dmesg. 

  [Regression Potential]

  This makes modifications to how locking is performed in fsnotify / fanotify 
and 
  there may be some cause for regression. Running all fanotify Linux Test 
Project
  tests shows that there are no extra failures caused by the patches, and 
instead
  fewer failures are seen due to the bugfix. 

  Running the entire 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Won't Fix

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 88013a558a00 8800b7797000 
8800b66f8000
  [  960.141524]  8800b6704000 7fff 8800b6703de0 
8800b66f8000
  [  960.141528]   8800b6703cb0 8185cb45 
8800b6703de8
  [  960.141532] Call Trace:
  [  960.141580]  [] schedule+0x35/0x80
  [  960.141588]  [] schedule_timeout+0x1b4/0x270
  [  960.141617]  [] ? mod_timer+0x10c/0x240
  [  960.141621]  [] ? __schedule+0x30d/0x810
  [  960.141625]  [] wait_for_completion+0xb2/0x190
  [  960.141636]  [] ? wake_up_q+0x70/0x70
  [  960.141641]  [] __synchronize_srcu+0x100/0x1a0
  [  960.141645]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
  [  960.141664]  [] ? fsnotify_put_mark+0x40/0x40
  [  960.141669]  [] synchronize_srcu+0x24/0x30
  [  960.141672]  [] fsnotify_mark_destroy+0x84/0x130
  [  960.141680]  [] ? wake_atomic_t_function+0x60/0x60
  [  960.141691]  [] kthread+0xe7/0x100
  [  960.141694]  [] ? __schedule+0x301/0x810
  [  960.141699]  [] ? kthread_create_on_node+0x1e0/0x1e0
  [  960.141703]  [] ret_from_fork+0x55/0x80
  [  960.141706]  [] ? kthread_create_on_node+0x1e0/0x1e0

  The vanilla 4.4 kernel also shows the same call stack.

  On a patched kernel, the test will pass successfully, and there will be no
  messages in dmesg. 

  [Regression Potential]

  This makes modifications to how locking is performed in fsnotify / fanotify 
and 
  there may be some cause for regression. Running all fanotify Linux Test 
Project
  tests shows that there are no extra failures caused by the patches, and 
instead
  fewer failures are seen due to the bugfix. 

  Running the entire Linux Test Project testsuite 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-07-14 Thread Po-Hsu Lin
Hi Matthew,

I think that's a valid point to mark this bug as a Won't Fix for X
(which applies to T as well).

Thanks for working on this one.
I will bring this to the LTP, to see if we can skip this one with older kernels.

** Changed in: linux (Ubuntu Trusty)
   Status: Triaged => Won't Fix

** Changed in: linux (Ubuntu)
   Status: Triaged => Fix Released

** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Won't Fix

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 88013a558a00 8800b7797000 
8800b66f8000
  [  960.141524]  8800b6704000 7fff 8800b6703de0 
8800b66f8000
  [  960.141528]   8800b6703cb0 8185cb45 
8800b6703de8
  [  960.141532] Call Trace:
  [  960.141580]  [] schedule+0x35/0x80
  [  960.141588]  [] schedule_timeout+0x1b4/0x270
  [  960.141617]  [] ? mod_timer+0x10c/0x240
  [  960.141621]  [] ? __schedule+0x30d/0x810
  [  960.141625]  [] wait_for_completion+0xb2/0x190
  [  960.141636]  [] ? wake_up_q+0x70/0x70
  [  960.141641]  [] __synchronize_srcu+0x100/0x1a0
  [  960.141645]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
  [  960.141664]  [] ? fsnotify_put_mark+0x40/0x40
  [  960.141669]  [] synchronize_srcu+0x24/0x30
  [  960.141672]  [] fsnotify_mark_destroy+0x84/0x130
  [  960.141680]  [] ? wake_atomic_t_function+0x60/0x60
  [  960.141691]  [] kthread+0xe7/0x100
  [  960.141694]  [] ? __schedule+0x301/0x810
  [  960.141699]  [] ? kthread_create_on_node+0x1e0/0x1e0
  [  960.141703]  [] ret_from_fork+0x55/0x80
  [  960.141706]  [] ? kthread_create_on_node+0x1e0/0x1e0

  The vanilla 4.4 kernel also shows the same call stack.

  On a patched kernel, the test will pass 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-07-10 Thread Matthew Ruffell
Hi Po-Hsu Lin,

Sorry for not updating this bug earlier.

Upstream 4.4 and 4.9 are also effected by this bug, so I went and posted the
patches to be considered for upstream -stable.

Upstream 4.4 thread: [1]
Upstream 4.9 thread: [2]

Now, I got some feedback from the original author of the upstream commits that
the scenario is more complicated than I thought, and I ended up missing a lot 
of commits required to fix the problem completely.

It turns out that the fixes prevent the system from crashing, but that some
data structures silently get corrupted over time, meaning the system will
eventually require a reboot anyway.

If you are interested, you can read all about it here: [3] [4] [5]

For now, it seems the list of commits required to actually fix the problem [6]
is a little too large to include in -stable, since it changes a lot of that
subsystem dramatically, and it might introduce regressions, which I want to
avoid.

In the end Greg K-H agreed with me [7], the patches were dropped, and this
won't be getting fixed upstream.

That is the status of this bug, and it likely won't be fixed. I did try, but
there is just too much code to backport and support, its easier to tell people
to use HWE kernels if they are hitting the problem.

[1] https://www.spinics.net/lists/stable/msg296857.html
[2] https://www.spinics.net/lists/stable/msg296895.html
[3] https://www.spinics.net/lists/stable/msg296992.html
[4] https://www.spinics.net/lists/stable/msg297024.html
[5] https://www.spinics.net/lists/stable/msg297027.html
[6] https://www.spinics.net/lists/stable/msg297476.html
[7] https://www.spinics.net/lists/stable/msg297485.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Triaged
Status in linux source package in Xenial:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-07-10 Thread Po-Hsu Lin
It turns out the culprit is fanotify07.

On Xenial, if you run fanotify08 after a clean reboot it will be fine.

@Matthew
Thanks for the work, do you mind submit your patches to the mailing list 
"kernel-t...@lists.ubuntu.com"? (The title of your cover-letter needs some fix)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Triaged
Status in linux source package in Xenial:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 88013a558a00 8800b7797000 
8800b66f8000
  [  960.141524]  8800b6704000 7fff 8800b6703de0 
8800b66f8000
  [  960.141528]   8800b6703cb0 8185cb45 
8800b6703de8
  [  960.141532] Call Trace:
  [  960.141580]  [] schedule+0x35/0x80
  [  960.141588]  [] schedule_timeout+0x1b4/0x270
  [  960.141617]  [] ? mod_timer+0x10c/0x240
  [  960.141621]  [] ? __schedule+0x30d/0x810
  [  960.141625]  [] wait_for_completion+0xb2/0x190
  [  960.141636]  [] ? wake_up_q+0x70/0x70
  [  960.141641]  [] __synchronize_srcu+0x100/0x1a0
  [  960.141645]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
  [  960.141664]  [] ? fsnotify_put_mark+0x40/0x40
  [  960.141669]  [] synchronize_srcu+0x24/0x30
  [  960.141672]  [] fsnotify_mark_destroy+0x84/0x130
  [  960.141680]  [] ? wake_atomic_t_function+0x60/0x60
  [  960.141691]  [] kthread+0xe7/0x100
  [  960.141694]  [] ? __schedule+0x301/0x810
  [  960.141699]  [] ? kthread_create_on_node+0x1e0/0x1e0
  [  960.141703]  [] ret_from_fork+0x55/0x80
  [  960.141706]  [] ? kthread_create_on_node+0x1e0/0x1e0

  The vanilla 4.4 kernel also shows the same call stack.

  On a patched kernel, the test will pass successfully, and there will be no
  messages in dmesg. 

  [Regression Potential]

  This makes modifications to how locking is performed in fsnotify / fanotify 
and 
  there may 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-06-04 Thread Po-Hsu Lin
** Tags removed: ppc64le
** Tags added: ppc64el

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Triaged
Status in linux source package in Xenial:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 88013a558a00 8800b7797000 
8800b66f8000
  [  960.141524]  8800b6704000 7fff 8800b6703de0 
8800b66f8000
  [  960.141528]   8800b6703cb0 8185cb45 
8800b6703de8
  [  960.141532] Call Trace:
  [  960.141580]  [] schedule+0x35/0x80
  [  960.141588]  [] schedule_timeout+0x1b4/0x270
  [  960.141617]  [] ? mod_timer+0x10c/0x240
  [  960.141621]  [] ? __schedule+0x30d/0x810
  [  960.141625]  [] wait_for_completion+0xb2/0x190
  [  960.141636]  [] ? wake_up_q+0x70/0x70
  [  960.141641]  [] __synchronize_srcu+0x100/0x1a0
  [  960.141645]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
  [  960.141664]  [] ? fsnotify_put_mark+0x40/0x40
  [  960.141669]  [] synchronize_srcu+0x24/0x30
  [  960.141672]  [] fsnotify_mark_destroy+0x84/0x130
  [  960.141680]  [] ? wake_atomic_t_function+0x60/0x60
  [  960.141691]  [] kthread+0xe7/0x100
  [  960.141694]  [] ? __schedule+0x301/0x810
  [  960.141699]  [] ? kthread_create_on_node+0x1e0/0x1e0
  [  960.141703]  [] ret_from_fork+0x55/0x80
  [  960.141706]  [] ? kthread_create_on_node+0x1e0/0x1e0

  The vanilla 4.4 kernel also shows the same call stack.

  On a patched kernel, the test will pass successfully, and there will be no
  messages in dmesg. 

  [Regression Potential]

  This makes modifications to how locking is performed in fsnotify / fanotify 
and 
  there may be some cause for regression. Running all fanotify Linux Test 
Project
  tests shows that there are no extra failures caused by the patches, and 
instead
  fewer failures are seen due to the bugfix. 

  Running the entire Linux 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-04-10 Thread Po-Hsu Lin
** Tags added: xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775165

Title:
  fanotify07/fanotify08 in LTP syscall test generates kernel trace with
  T/X/X-AWS kernel

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Triaged
Status in linux source package in Xenial:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1775165

  [Impact]

  When userspace tasks which are processing fanotify permission events act 
  incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
  the whole notification subsystem to hang. 

  This has been seen in production, and it can also be seen when running the 
  Linux Test Project testsuite, specifically fanotify07. 

  [Fix]

  Instead of holding the SRCU lock while waiting for userspace to respond, 
  which may never happen, or not in the order we are expecting, we drop the 
  fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
  reacquire the lock again when userspace responds.

  The fixes are from a series of upstream commits:

  05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
  9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
  abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

  The following are upstream commits necessary for the fixes to
  function:

  35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
  0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

  [Testcase]

  You can reproduce the problem pretty quickly with the Linux Test
  Project:

  Steps (with root):
1. sudo apt-get install git xfsprogs -y
2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
3. cd ltp
4. make autotools
5. ./configure
6. make; make install
7. cd /opt/ltp
8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
9. ./runltp -f /tmp/jobs

  On a stock Xenial kernel, the system will hang, and the testcase will look 
like:

  <<>>
  tag=fanotify07 stime=1554326200
  cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
  tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>

  Looking at dmesg, we see the following call stack

  [  790.772792] LTP: starting fanotify07 (fanotify07 )
  [  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
  [  960.140867]   Not tainted 4.4.0-142-generic #168-Ubuntu
  [  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  960.141498] fsnotify_mark   D 8800b6703c98 036  2 
0x
  [  960.141516]  8800b6703c98 88013a558a00 8800b7797000 
8800b66f8000
  [  960.141524]  8800b6704000 7fff 8800b6703de0 
8800b66f8000
  [  960.141528]   8800b6703cb0 8185cb45 
8800b6703de8
  [  960.141532] Call Trace:
  [  960.141580]  [] schedule+0x35/0x80
  [  960.141588]  [] schedule_timeout+0x1b4/0x270
  [  960.141617]  [] ? mod_timer+0x10c/0x240
  [  960.141621]  [] ? __schedule+0x30d/0x810
  [  960.141625]  [] wait_for_completion+0xb2/0x190
  [  960.141636]  [] ? wake_up_q+0x70/0x70
  [  960.141641]  [] __synchronize_srcu+0x100/0x1a0
  [  960.141645]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
  [  960.141664]  [] ? fsnotify_put_mark+0x40/0x40
  [  960.141669]  [] synchronize_srcu+0x24/0x30
  [  960.141672]  [] fsnotify_mark_destroy+0x84/0x130
  [  960.141680]  [] ? wake_atomic_t_function+0x60/0x60
  [  960.141691]  [] kthread+0xe7/0x100
  [  960.141694]  [] ? __schedule+0x301/0x810
  [  960.141699]  [] ? kthread_create_on_node+0x1e0/0x1e0
  [  960.141703]  [] ret_from_fork+0x55/0x80
  [  960.141706]  [] ? kthread_create_on_node+0x1e0/0x1e0

  The vanilla 4.4 kernel also shows the same call stack.

  On a patched kernel, the test will pass successfully, and there will be no
  messages in dmesg. 

  [Regression Potential]

  This makes modifications to how locking is performed in fsnotify / fanotify 
and 
  there may be some cause for regression. Running all fanotify Linux Test 
Project
  tests shows that there are no extra failures caused by the patches, and 
instead
  fewer failures are seen due to the bugfix. 

  Running the entire Linux Test Project testsuite 

[Kernel-packages] [Bug 1775165] Re: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel

2019-04-09 Thread Matthew Ruffell
I have gone ahead and backported the fixes for xenial's 4.4 kernel.

This patch series is for ubuntu-xenial 4.4:
https://paste.ubuntu.com/p/Kj43J6H3Hm/

This patch series is for vanilla upstream 4.4:
https://paste.ubuntu.com/p/MzdjcHCqbz/

Both patch and compile and fix the problem.

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: sts

** Description changed:

- The "fanotify07" and "fanotify08" from the LTP syscall tests has failed
- on a testing node with Trusty kernel installed.
+ BugLink: https://bugs.launchpad.net/bugs/1775165
+ 
+ [Impact]
+ 
+ When userspace tasks which are processing fanotify permission events act 
+ incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
+ the whole notification subsystem to hang. 
+ 
+ This has been seen in production, and it can also be seen when running the 
+ Linux Test Project testsuite, specifically fanotify07. 
+ 
+ [Fix]
+ 
+ Instead of holding the SRCU lock while waiting for userspace to respond, 
+ which may never happen, or not in the order we are expecting, we drop the 
+ fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
+ reacquire the lock again when userspace responds.
+ 
+ The fixes are from a series of upstream commits:
+ 
+ 05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
+ 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
+ abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)
+ 
+ The following are upstream commits necessary for the fixes to function:
+ 
+ 35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
+ 0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)
+ 
+ [Testcase]
+ 
+ You can reproduce the problem pretty quickly with the Linux Test
+ Project:
  
  Steps (with root):
-   1. sudo apt-get install git xfsprogs -y
-   2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
-   3. cd ltp
-   4. make autotools
-   5. ./configure
-   6. make; make install
-   7. cd /opt/ltp
-   8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
-   9. ./runltp -f /tmp/jobs
+   1. sudo apt-get install git xfsprogs -y
+   2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
+   3. cd ltp
+   4. make autotools
+   5. ./configure
+   6. make; make install
+   7. cd /opt/ltp
+   8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
+   9. ./runltp -f /tmp/jobs
+   
+ On a stock Xenial kernel, the system will hang, and the testcase will look 
like:
  
  <<>>
- tag=fanotify07 stime=1528197132
- cmdline="fanotify07"
+ tag=fanotify07 stime=1554326200
+ cmdline="fanotify07 "
  contacts=""
  analysis=exit
  <<>>
- incrementing stop
- tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
+ tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Test timeouted, sending SIGKILL!
  Cannot kill test processes!
  Congratulation, likely test hit a kernel bug.
  Exitting uncleanly...
  <<>>
  initiation_status="ok"
  duration=350 termination_type=exited termination_id=1 corefile=no
  cutime=0 cstime=0
  <<>>
- INFO: ltp-pan reported some tests FAIL
- LTP Version: 20180515
  
- [  841.063676] INFO: task fanotify07:3660 blocked for more than 120 seconds.
- [  841.063692]   Not tainted 3.13.0-149-generic #199-Ubuntu
- [  841.063705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
- [  841.063723] fanotify07  D 8804584742f0 0  3660   3652 
0x
- [  841.063724]  880459e9bd00 0086 88045556b000 
00013b00
- [  841.063726]  880459e9bfd8 00013b00 88045556b000 
880459b4e690
- [  841.063728]   880458474200 8804584742f0 
0002
- [  841.063730] Call Trace:
- [  841.063731]  [] schedule+0x29/0x70
- [  841.063733]  [] fanotify_handle_event+0x110/0x1d0
- [  841.063735]  [] ? prepare_to_wait_event+0x100/0x100
- [  841.063737]  [] send_to_group+0x166/0x240
- [  841.063738]  [] ? touch_atime+0x71/0x140
- [  841.063740]  [] fsnotify+0x2e5/0x320
- [  841.063742]  [] security_file_permission+0x94/0xb0
- [  841.063743]  [] rw_verify_area+0x52/0xd0
- [  841.063745]  [] vfs_read+0x6a/0x160
- [  841.063746]  [] SyS_read+0x49/0xa0
- [  841.063748]  [] system_call_fastpath+0x1a/0x1f
- [ 1304.848642] ltp-pan[3809]: segfault at 0 ip 7f07c8aafdfa sp 
7ffc1da92078 error 4 in libc-2.19.so[7f07c8a27000+1be000]
+ Looking at