[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-06-04 Thread Launchpad Bug Tracker
[Expired for linux (Ubuntu) because there has been no activity for 60
days.]

** Changed in: linux (Ubuntu)
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534413

Title:
  Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-04-05 Thread Rafael David Tinoco
** Changed in: linux (Ubuntu)
   Status: In Progress => Incomplete

** Changed in: linux (Ubuntu)
 Assignee: Rafael David Tinoco (inaddy) => (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534413

Title:
  Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-02-01 Thread Rafael David Tinoco
Unfortunately I received 2 other kdumps from the same user and it looks like 
that other 2 kdumps were
happening because of a NULL dereference (3.2.0-79). I asked them to test latest 
3.2 kernel + the fix 
I commented above. 

User still faced kdumps with latest 3.2 but only with khungtask complaining 
about soft lockups. A first
look into the dumps showed a lock contention: 

crash> bt 883f70c78000 
PID: 41940 TASK: 883f70c78000 CPU: 13 COMMAND: "java" 
#0 [883f69093d98] __schedule at 81666f4a 
#1 [883f69093e20] schedule at 816675cf 
#2 [883f69093e30] rwsem_down_failed_common at 8166940d 
#3 [883f69093ea0] rwsem_down_write_failed at 816694c3 
#4 [883f69093eb0] call_rwsem_down_write_failed at 8131fb43 
#5 [883f69093f10] sys_mprotect at 811488f0 
#6 [883f69093f80] system_call_fastpath at 81671c62 
RIP: 7f1adc172cc7 RSP: 7f1adcc89c60 RFLAGS: 00010297 
RAX: 000a RBX: 81671c62 RCX: ef038c60 
RDX: 0003 RSI: 2000 RDI: 7f1ad49bc000 
RBP: 2010 R8: 009be000 R9:  
R10: 009bc000 R11: 0206 R12: 0ac0 
R13: 7f1ad49bb540 R14: 2000 R15: 7f1ad478 
ORIG_RAX: 000a CS: 0033 SS: 002b 

on the task monitored by khungtask thread.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534413

Title:
  Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-01-25 Thread Christopher M. Penalver
** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu)
   Importance: Critical => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534413

Title:
  Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-01-14 Thread Rafael David Tinoco
Disconsider my previous analysis please. After disassembling carefully the java
process dump I could see that we had the instruction pointer at:

if (iter.task)
put_task_struct(iter.task);

Some instructions before a loop that could be a potential candidate for 
deadlocks, which
I was referring to before.

   0x811df242 <+34>:je 0x811df24f 
   0x811df244 <+36>:lock decl 0x10(%rdx)
   0x811df248 <+40>:sete   %al
   0x811df24b <+43>:test   %al,%al

The combination of those 3 instructions (decl, sete & test) shows me that this 
is
actually the implementation of atomic_dec_and_test function. It servers to 
atomically
(CPU coordination) decrement a variable from task_struct for the given task.

Deadlock is probably NOT there.

I went for khungtask stack to check which task it was complaining about being 
locked
and I could not find it (probably taskid was in registers and not on the stack) 
BUT
I could find an output message telling me which task was locked:

[15956.371298] INFO: task jsvc:57263 blocked for more than 120 seconds.

And this comes exactly from khungtaskd code. It means that PID 57263 was NOT
scheduled for more than 120 seconds. Checking its stack:

PID: 57263
COMMAND: "jsvc"
   TASK: 883e0c4e2e00  [THREAD_INFO: 883e849c2000]
CPU: 1
  STATE: TASK_UNINTERRUPTIBLE|TASK_TRACED|EXIT_DEAD

I can see that it is in UNINTERRUPTIBLE state (it can't received signals, for 
ex).
Checking its stack:

crash> bt
PID: 57263  TASK: 883e0c4e2e00  CPU: 1   COMMAND: "jsvc"
 #0 [883e849c3c88] __schedule at 8166256a
 #1 [883e849c3d10] schedule at 81662c0f
 #2 [883e849c3d20] schedule_timeout at 8166324d
 #3 [883e849c3dd0] wait_for_common at 81662a4f
 #4 [883e849c3e70] wait_for_completion at 81662bcd
 #5 [883e849c3e80] flush_work at 8108757e
 #6 [883e849c3ed0] schedule_on_each_cpu at 81087813
 #7 [883e849c3f10] lru_add_drain_all at 81127365
 #8 [883e849c3f20] sys_fadvise64_64 at 8111e189
 #9 [883e849c3f70] sys_fadvise64 at 8111e27e
#10 [883e849c3f80] system_call_fastpath at 8166d2c2

## sys_fadvise64_64

The task entered kernel mode after the syscall: fadvise. It basically tells the 
kernel
how the file is going to be accessed so it can prepare itself regarding to 
different
caching techniques (expect page references in random order, expect page 
references
in sequential order, expect delay in accessing file contents, etc).

It turns out that the JSVC process basically told kernel that it would NOT use 
the
opened file (flag POSIX_FADV_DONTNEED) until further moment, at a certain 
offset. With
that, kernel calculated page start/end index (offset, offset+len) for the 
memory range
of given file and tried to invalidate all unlocked pages (page cache) for that 
area.
Dirty, locked, write-back or mapped pages are NOT invalidated.

After first attempt of invalidating pages from page-cache from that particular 
file,
Kernel found out that fewer pages (than the ones given to fadvise - in the form 
of
offset/len delta) than expected were invalidated (in other words, it could not 
invalidate
all pages from the the range) so it resolved calling the LRU DRAIN logic.

## lru_add_drain_all

LRU stands for Least-Recent-Used algorithm and is used for user mode address 
space and
for page cache (this case). It defines 2 lists of memory pages: active and 
inactive. The
LRU DRAIN logic schedules a drain logic on every CPU on the system in order to 
drain
per-cpu existent page vectors (pagevecs).

Page vecs are temporary small & finite data structures containing page 
descriptor pointers.
Instead of changing LRU lists (active/inactive) directly, the LRU algorithm 
uses per-cpu
page vectors until they are full (avoiding having to lock/unlock while LRU 
lists more often).

## schedule_on_each_cpu

Task 883e0c4e2e00 continued the LRU DRAIN logic by scheduling its call back 
function using
schedule_on_each_cpu:

schedule_on_each_cpu(drain_cpu_pagevecs) scheduled the drain_cpu_pagevecs on 
all cpus
and kept waiting - for more than 120 seconds - for all cpus to execute the 
drain.

While it was waiting, the task 883e0c4e2e00 was made UNINTERRUPTIBLE and 
was locked
inside this loop where it waits for drain_cpu_pagevecs to finish on all cpus 
(it never did).

## precise commit (cherry-picked from upstream) adding that logic:

commit a68692ea42825466b0ade8a94a98afc924d2564e
Author: Mel Gorman 
Date:   Fri Feb 22 16:35:59 2013 -0800

mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to
discard all pages

BugLink: http://bugs.launchpad.net/bugs/1150557

commit 67d46b296a1ba1477c0df8ff3bc5e0167a0b0732 upstream.

It was added to Precise (3.2.0-40.64).

I'm removing that logic and providing a kernel without it to see if it
mitigates the issue.

Per original commit 

[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-01-14 Thread Rafael David Tinoco
Disconsider my previous analysis please. After disassembling carefully
the java process dump I could see that we had the instruction pointer
at:

if (iter.task)
put_task_struct(iter.task);

Some instructions before a loop that could be a potential candidate for
deadlocks, which I was referring to before.

   0x811df242 <+34>:je 0x811df24f 
   0x811df244 <+36>:lock decl 0x10(%rdx)
   0x811df248 <+40>:sete   %al
   0x811df24b <+43>:test   %al,%al

The combination of those 3 instructions (decl, sete & test) shows me
that this is actually the implementation of atomic_dec_and_test
function. It servers to atomically (CPU coordination) decrement a
variable from task_struct for the given task.

Deadlock is probably NOT there.

I went for khungtask stack to check which task it was complaining about
being locked and I could not find it (probably taskid was in registers
and not on the stack) BUT I could find an output message telling me
which task was locked:

[15956.371298] INFO: task jsvc:57263 blocked for more than 120 seconds.

And this comes exactly from khungtaskd code. It means that PID 57263 was
NOT scheduled for more than 120 seconds. Checking its stack:

PID: 57263
COMMAND: "jsvc"
   TASK: 883e0c4e2e00  [THREAD_INFO: 883e849c2000]
CPU: 1
  STATE: TASK_UNINTERRUPTIBLE|TASK_TRACED|EXIT_DEAD

I can see that it is in UNINTERRUPTIBLE state (it can't received
signals, for ex). Checking its stack:

crash> bt
PID: 57263  TASK: 883e0c4e2e00  CPU: 1   COMMAND: "jsvc"
 #0 [883e849c3c88] __schedule at 8166256a
 #1 [883e849c3d10] schedule at 81662c0f
 #2 [883e849c3d20] schedule_timeout at 8166324d
 #3 [883e849c3dd0] wait_for_common at 81662a4f
 #4 [883e849c3e70] wait_for_completion at 81662bcd
 #5 [883e849c3e80] flush_work at 8108757e
 #6 [883e849c3ed0] schedule_on_each_cpu at 81087813
 #7 [883e849c3f10] lru_add_drain_all at 81127365
 #8 [883e849c3f20] sys_fadvise64_64 at 8111e189
 #9 [883e849c3f70] sys_fadvise64 at 8111e27e
#10 [883e849c3f80] system_call_fastpath at 8166d2c2

## sys_fadvise64_64

The task entered kernel mode after the syscall: fadvise. It basically
tells the kernel how the file is going to be accessed so it can prepare
itself regarding to different caching techniques (expect page references
in random order, expect page references in sequential order, expect
delay in accessing file contents, etc).

It turns out that the JSVC process basically told kernel that it would
NOT use the opened file (flag POSIX_FADV_DONTNEED) until further moment,
at a certain offset. With that, kernel calculated page start/end index
(offset, offset+len) for the memory range of given file and tried to
invalidate all unlocked pages (page cache) for that area. Dirty, locked,
write-back or mapped pages are NOT invalidated.

After first attempt of invalidating pages from page-cache from that
particular file, Kernel found out that fewer pages (than the ones given
to fadvise - in the form of offset/len delta) than expected were
invalidated (in other words, it could not invalidate all pages from the
the range) so it resolved calling the LRU DRAIN logic.

## lru_add_drain_all

LRU stands for Least-Recent-Used algorithm and is used for user mode
address space and for page cache (this case). It defines 2 lists of
memory pages: active and inactive. The LRU DRAIN logic schedules a drain
logic on every CPU on the system in order to drain per-cpu existent page
vectors (pagevecs).

Page vecs are temporary small & finite data structures containing page
descriptor pointers. Instead of changing LRU lists (active/inactive)
directly, the LRU algorithm uses per-cpu page vectors until they are
full (avoiding having to lock/unlock while LRU lists more often).

## schedule_on_each_cpu

Task 883e0c4e2e00 continued the LRU DRAIN logic by scheduling its
call back function using schedule_on_each_cpu:

schedule_on_each_cpu(drain_cpu_pagevecs) scheduled the
drain_cpu_pagevecs on all cpus and kept waiting - for more than 120
seconds - for all cpus to execute the drain.

While it was waiting, the task 883e0c4e2e00 was made UNINTERRUPTIBLE
and was locked inside this loop where it waits for drain_cpu_pagevecs to
finish on all cpus (it never did).

## precise commit (cherry-picked from upstream) adding that logic:

commit a68692ea42825466b0ade8a94a98afc924d2564e
Author: Mel Gorman 
Date:   Fri Feb 22 16:35:59 2013 -0800

mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to
discard all pages

BugLink: http://bugs.launchpad.net/bugs/1150557

commit 67d46b296a1ba1477c0df8ff3bc5e0167a0b0732 upstream.

It was added to Precise (3.2.0-40.64).

I'm removing that logic and providing a kernel without it to see if it
mitigates the issue.

Per original commit description, this was supposed 

[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-01-14 Thread Rafael David Tinoco
Because Precise 3.2.0-79 is missing debug symbols froms ddebs.ubuntu.com I had 
to 
compile a 3.2.0-79 kernel in a PPA and expect the symbols to be close to what 
they
used to be in that version. 

That led me to a wrong initial analysis that I document here for
historical purposes:

>   178  2  18  881f716bdc00  RU   0.0   0  0  [khungtaskd]
>  3680   2808  38  881f71a5c500  RU   1.6 6629520 4303188  java
> 50279  49370  31  883f0c7e8000  RU   0.0 4121160 20  java
> 50757  50322  23  881ef27eae00  RU   0.3 4149720 870892  java

crash> bt 881ef27eae00

PID: 50757  TASK: 881ef27eae00  CPU: 23  COMMAND: "java"
 #0 [881fbfba6ee0] crash_nmi_callback at 81031ac9
 #1 [881fbfba6ef0] default_do_nmi at 81666079
 #2 [881fbfba6f30] do_nmi at 816662b0
 #3 [881fbfba6f50] nmi at 81665620
[exception RIP: next_tgid+40]
RIP: 811df248  RSP: 881cd4b67da8  RFLAGS: 0202
RAX:   RBX: 81c281a0  RCX: 
RDX: 881f7283  RSI: 0074  RDI: 81c281a0
RBP: 881cd4b67df8   R8: a88d   R9: 0004
R10: 883f70922540  R11: 0001f8f579768213  R12: 0074
R13: 881f0073  R14: 81c281a0  R15: 881f73138000
ORIG_RAX:   CS: 0010  SS: 0018
---  ---
 #4 [881cd4b67da8] next_tgid at 811df248
 #5 [881cd4b67e00] proc_pid_readdir at 811e117e
 #6 [881cd4b67eb0] proc_root_readdir at 811dbe0a
 #7 [881cd4b67ee0] vfs_readdir at 8118e1d0
 #8 [881cd4b67f30] sys_getdents at 8118e4a9
 #9 [881cd4b67f80] system_call_fastpath at 8166d2c2
RIP: 7f855e94d605  RSP: 7f853a457bf0  RFLAGS: 0283
RAX: 004e  RBX: 8166d2c2  RCX: 0010
RDX: 8000  RSI: 7f8548057980  RDI: 00fb
RBP: 7f854846b140   R8: 7f8548057980   R9: 0008
R10: 0010  R11: 0246  R12: 0016
R13: ffa0  R14: 7f853a45a4d0  R15: 7f8548057950
ORIG_RAX: 004e  CS: 0033  SS: 002b

1) khungtaskd is complaining about a hung task

2) task 881ef27eae00 is in "next_tgid" from procfs vfs subsystem:

We are probably stuck here:

rcu_read_lock();
retry:
iter.task = NULL;
pid = find_ge_pid(iter.tgid, ns);
if (pid) {
iter.tgid = pid_nr_ns(pid, ns);
iter.task = pid_task(pid, PIDTYPE_PID);
/* What we to know is if the pid we have find is the
 * pid of a thread_group_leader.  Testing for task
 * being a thread_group_leader is the obvious thing
 * todo but there is a window when it fails, due to
 * the pid transfer logic in de_thread.
 *
 * So we perform the straight forward test of seeing
 * if the pid we have found is the pid of a thread
 * group leader, and don't worry if the task we have
 * found doesn't happen to be a thread group leader.
 * As we don't care in the case of readdir.
 */
if (!iter.task || !has_group_leader_pid(iter.task)) {
iter.tgid += 1;
goto retry;
}
get_task_struct(iter.task);
}
rcu_read_unlock();

Trying to find a task group leader when reading procfs structure (by the
JVM process).

I'm still analysing code.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534413

Title:
  Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

2016-01-14 Thread Rafael David Tinoco
Available PPA for testing purposes:

https://launchpad.net/~inaddy/+archive/ubuntu/lp1534413

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534413

Title:
  Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs