Re: bio linked list corruption.

2016-12-06 Thread Vegard Nossum
On 5 December 2016 at 22:33, Vegard Nossum <vegard.nos...@gmail.com> wrote:
> On 5 December 2016 at 21:35, Linus Torvalds
> <torva...@linux-foundation.org> wrote:
>> Note for Ingo and Peter: this patch has not been tested at all. But
>> Vegard did test an earlier patch of mine that just verified that yes,
>> the issue really was that wait queue entries remained on the wait
>> queue head just as we were about to return and free it.
>
> The second patch has been running for 1h+ without any problems of any
> kind. I should typically have seen 2 crashes by now. I'll let it run
> overnight to be sure.

Alright, so nearly 12 hours later I don't see either the new warning
or the original crash at all, so feel free to add:

Tested-by: Vegard Nossum <vegard.nos...@oracle.com>.

That said, my 8 VMs had all panicked in some way due to OOMs (which is
new since v4.8), although some got page allocation stalls for >20s and
died because "khugepaged blocked for more than 120 seconds", others
got "Out of memory and no killable processes".


Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio linked list corruption.

2016-12-05 Thread Vegard Nossum
On 5 December 2016 at 21:35, Linus Torvalds
 wrote:
> Note for Ingo and Peter: this patch has not been tested at all. But
> Vegard did test an earlier patch of mine that just verified that yes,
> the issue really was that wait queue entries remained on the wait
> queue head just as we were about to return and free it.

The second patch has been running for 1h+ without any problems of any
kind. I should typically have seen 2 crashes by now. I'll let it run
overnight to be sure.


Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio linked list corruption.

2016-12-05 Thread Vegard Nossum
On 5 December 2016 at 20:11, Vegard Nossum <vegard.nos...@gmail.com> wrote:
> On 5 December 2016 at 18:55, Linus Torvalds
> <torva...@linux-foundation.org> wrote:
>> On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum <vegard.nos...@gmail.com> 
>> wrote:
>> Since you apparently can recreate this fairly easily, how about trying
>> this stupid patch?
>>
>> NOTE! This is entirely untested. I may have screwed this up entirely.
>> You get the idea, though - just remove the wait queue head from the
>> list - the list entries stay around, but nothing points to the stack
>> entry (that we're going to free) any more.
>>
>> And add the warning to see if this actually ever triggers (and because
>> I'd like to see the callchain when it does, to see if it's another
>> waitqueue somewhere or what..)
>
> [ cut here ]
> WARNING: CPU: 22 PID: 14012 at mm/shmem.c:2668 shmem_fallocate+0x9a7/0xac0
> Kernel panic - not syncing: panic_on_warn set ...

So I noticed that panic_on_warn just after sending the email and I've
been waiting for it it to trigger again.

The warning has triggered twice more without panic_on_warn set and I
haven't seen any crash yet.


Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio linked list corruption.

2016-12-05 Thread Vegard Nossum
On 5 December 2016 at 18:55, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum <vegard.nos...@gmail.com> wrote:
>>
>> The warning shows that it made it past the list_empty_careful() check
>> in finish_wait() but then bugs out on the >task_list
>> dereference.
>>
>> Anything stick out?
>
> I hate that shmem waitqueue garbage. It's really subtle.
>
> I think the problem is that "wake_up_all()" in shmem_fallocate()
> doesn't necessarily wake up everything. It wakes up TASK_NORMAL -
> which does include TASK_UNINTERRUPTIBLE, but doesn't actually mean
> "everything on the list".
>
> I think that what happens is that the waiters somehow move from
> TASK_UNINTERRUPTIBLE to TASK_RUNNING early, and this means that
> wake_up_all() will ignore them, leave them on the list, and now that
> list on stack is no longer empty at the end.
>
> And the way *THAT* can happen is that the task is on some *other*
> waitqueue as well, and that other waiqueue wakes it up. That's not
> impossible, you can certainly have people on wait-queues that still
> take faults.
>
> Or somebody just uses a directed wake_up_process() or something.
>
> Since you apparently can recreate this fairly easily, how about trying
> this stupid patch?
>
> NOTE! This is entirely untested. I may have screwed this up entirely.
> You get the idea, though - just remove the wait queue head from the
> list - the list entries stay around, but nothing points to the stack
> entry (that we're going to free) any more.
>
> And add the warning to see if this actually ever triggers (and because
> I'd like to see the callchain when it does, to see if it's another
> waitqueue somewhere or what..)

[ cut here ]
WARNING: CPU: 22 PID: 14012 at mm/shmem.c:2668 shmem_fallocate+0x9a7/0xac0
Kernel panic - not syncing: panic_on_warn set ...

CPU: 22 PID: 14012 Comm: trinity-c73 Not tainted 4.9.0-rc7+ #220
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
8801e32af970 81fb08c1 83e74b60 8801e32afa48
83ed7600 847103e0 8801e32afa38 81515244
41b58ab3 844e21da 81515061 8151591e
Call Trace:
[] dump_stack+0x83/0xb2
[] panic+0x1e3/0x3ad
[] __warn+0x1bf/0x1e0
[] warn_slowpath_null+0x2c/0x40
[] shmem_fallocate+0x9a7/0xac0
[] vfs_fallocate+0x350/0x620
[] SyS_madvise+0x432/0x1290
[] do_syscall_64+0x1af/0x4d0
[] entry_SYSCALL64_slow_path+0x25/0x25
[ cut here ]

Attached a full log.


Vegard


0.txt.gz
Description: GNU Zip compressed data


Re: bio linked list corruption.

2016-12-05 Thread Vegard Nossum
On 5 December 2016 at 19:11, Andy Lutomirski <l...@kernel.org> wrote:
> On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum <vegard.nos...@gmail.com> wrote:
>> On 23 November 2016 at 20:58, Dave Jones <da...@codemonkey.org.uk> wrote:
>>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>>
>>>  > [  317.689216] BUG: Bad page state in process kworker/u8:8  pfn:4d8fd4
>>>  > trace from just before this happened. Does this shed any light ?
>>>  >
>>>  > https://codemonkey.org.uk/junk/trace.txt
>>>
>>> crap, I just noticed the timestamps in the trace come from quite a bit
>>> later.  I'll tweak the code to do the taint checking/ftrace stop after
>>> every syscall, that should narrow the window some more.
>>
>> FWIW I hit this as well:
>>
>> BUG: unable to handle kernel paging request at 81ff08b7
>
> We really ought to improve this message.  If nothing else, it should
> say whether it was a read, a write, or an instruction fetch.
>
>> IP: [] __lock_acquire.isra.32+0xda/0x1a30
>> PGD 461e067 PUD 461f063
>> PMD 1e001e1
>
> Too lazy to manually decode this right now, but I don't think it matters.
>
>> Oops: 0003 [#1] PREEMPT SMP KASAN
>
> Note this is SMP, but that just means CONFIG_SMP=y.  Vegard, how many
> CPUs does your kernel think you have?

My first crash was running on a 1-CPU guest (not intentionally, but
because of a badly configured qemu). I'm honestly surprised it
triggered at all with 1 CPU, but I guess it shows that it's not a true
concurrency issue at least!

>
>> RIP: 0010:[]  []
>> __lock_acquire.isra.32+0xda/0x1a30
>> RSP: 0018:8801bab8f730  EFLAGS: 00010082
>> RAX: 81ff071f RBX:  RCX: 
>
> RAX points to kernel text.

Yes, it's somewhere in the middle of iov_iter_init() -- other crashes
also had put_prev_entity(), a null pointer, and some garbage values I
couldn't identify.

>
>> Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
>> c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <3e>
>> ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00
>
>   2b:3e ff 80 98 01 00 00 incl   %ds:*0x198(%rax)<--
> trapping instruction
>
> That's very strange.  I think this is:
>
> atomic_inc((atomic_t *)>ops);
>
> but my kernel contains:
>
> 3cb4:   f0 ff 80 98 01 00 00lock incl 0x198(%rax)
>
> So your kernel has been smp-alternatived.  That 3e comes from
> alternatives_smp_unlock.  If you're running on SMP with UP
> alternatives, things will break.

Yes, indeed. It was running on 1 CPU by mistake and still triggered the bug.

The crashes started really pouring in once I got my qemu fixed. Just
to reassure you, here's another crash which shows it's using the
correct instruction on an actual multicore guest:

BUG: unable to handle kernel paging request at 0003030001de
IP: [] __lock_acquire.isra.32+0xda/0x1a30
PGD 183fd2067 PUD 0

Oops: 0002 [#1] PREEMPT SMP KASAN
Dumping ftrace buffer:
  (ftrace buffer empty)
CPU: 23 PID: 9584 Comm: trinity-c104 Not tainted 4.9.0-rc7+ #219
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: 880189f68000 task.stack: 88017fe5
RIP: 0010:[]  []
__lock_acquire.isra.32+0xda/0x1a30
RSP: 0018:88017fe575e0  EFLAGS: 00010002
RAX: 00030346 RBX:  RCX: 
[...]
Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 
ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00
RIP  [] __lock_acquire.isra.32+0xda/0x1a30
RSP 
CR2: 0003030001de
---[ end trace 2846425104eb6141 ]---
Kernel panic - not syncing: Fatal exception
[ cut here ]

  2b:*  f0 ff 80 98 01 00 00lock incl 0x198(%rax)   <--
trapping instruction

> What's your kernel command line?  Can we have your entire kernel log from 
> boot?

Just in case you still want this, I've attached the boot log for the
"true SMP" guest above.


Vegard


5.txt.gz
Description: GNU Zip compressed data


Re: bio linked list corruption.

2016-12-05 Thread Vegard Nossum
On 5 December 2016 at 12:10, Vegard Nossum <vegard.nos...@gmail.com> wrote:
> On 5 December 2016 at 00:04, Vegard Nossum <vegard.nos...@gmail.com> wrote:
>> FWIW I hit this as well:
>>
>> BUG: unable to handle kernel paging request at 81ff08b7
>> IP: [] __lock_acquire.isra.32+0xda/0x1a30
>> CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: GB   4.9.0-rc7+ #217
> [...]
>
>> I think you can rule out btrfs in any case, probably block layer as
>> well, since it looks like this comes from shmem.
>
> I should rather say that the VM runs on a 9p root filesystem and it
> doesn't use/mount any block devices or disk-based filesystems.
>
> I have all the trinity logs for the crash if that's useful. I tried a
> couple of runs with just the (at the time) in-progress syscalls but it
> didn't turn up anything interesting. Otherwise it seems like a lot of
> data to go through by hand.

I've hit this another 7 times in the past ~3 hours.

Three times the address being dereferenced has pointed to
iov_iter_init+0xaf (even across a kernel rebuild), three times it has
pointed to put_prev_entity+0x55, once to 0x80008, and twice to
0x292. The fact that it would hit even one of those more than once
across runs is pretty suspicious to me, although the ones that point
to iov_iter_init and put_prev_entity point to "random" instructions in
the sense that they are neither entry points nor return addresses.

shmem_fault() was always on the stack, but it came from different
syscalls: add_key(), newuname(), pipe2(), newstat(), fstat(),
clock_settime(), mount(), etc.

I also got this warning which is related:

[ cut here ]
WARNING: CPU: 9 PID: 25045 at lib/list_debug.c:59 __list_del_entry+0x14f/0x1d0
list_del corruption. prev->next should be 88014bdc79e8, but was
88014bfbfc60
Kernel panic - not syncing: panic_on_warn set ...

CPU: 9 PID: 25045 Comm: trinity-c22 Not tainted 4.9.0-rc7+ #219
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
88014bdc7700 81fb0861 83e74b60 88014bdc77d8
84006c00 847103e0 88014bdc77c8 81515244
41b58ab3 844e21c2 81515061 0054
Call Trace:
[] dump_stack+0x83/0xb2
[] panic+0x1e3/0x3ad
[] ? percpu_up_read_preempt_enable.constprop.45+0xcb/0xcb
[] ? __list_del_entry+0x14f/0x1d0
[] __warn+0x1bf/0x1e0
[] ? __lock_acquire.isra.32+0xc2/0x1a30
[] warn_slowpath_fmt+0xac/0xd0
[] ? __warn+0x1e0/0x1e0
[] ? finish_wait+0xb0/0x180
[] __list_del_entry+0x14f/0x1d0
[] ? finish_wait+0xb0/0x180
[] finish_wait+0xbb/0x180
[] shmem_fault+0x4c7/0x6b0
[] ? shmem_getpage_gfp+0x673/0x1c90
[] ? shmem_getpage_gfp+0x1c90/0x1c90
[] ? wake_atomic_t_function+0x210/0x210
[] __do_fault+0x206/0x410
[] ? do_page_mkwrite+0x320/0x320
[] ? handle_mm_fault+0x1cc/0x2a60
[] handle_mm_fault+0x10f7/0x2a60
[] ? handle_mm_fault+0x132/0x2a60
[] ? thread_group_cputime+0x49f/0x6e0
[] ? __pmd_alloc+0x370/0x370
[] ? thread_group_cputime+0x4bc/0x6e0
[] ? thread_group_cputime_adjusted+0x6d/0xe0
[] ? __do_page_fault+0x220/0x9f0
[] ? find_vma+0x30/0x150
[] __do_page_fault+0x452/0x9f0
[] trace_do_page_fault+0x1e5/0x3a0
[] do_async_page_fault+0x27/0xa0
[] async_page_fault+0x28/0x30
[] ? copy_user_generic_string+0x2c/0x40
[] ? SyS_times+0x93/0x110
[] ? do_sys_times+0x2b0/0x2b0
[] ? do_sys_times+0x2b0/0x2b0
[] do_syscall_64+0x1af/0x4d0
[] entry_SYSCALL64_slow_path+0x25/0x25
[ cut here ]

The warning shows that it made it past the list_empty_careful() check
in finish_wait() but then bugs out on the >task_list
dereference.

Anything stick out?


Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio linked list corruption.

2016-12-05 Thread Vegard Nossum
On 5 December 2016 at 00:04, Vegard Nossum <vegard.nos...@gmail.com> wrote:
> FWIW I hit this as well:
>
> BUG: unable to handle kernel paging request at 81ff08b7
> IP: [] __lock_acquire.isra.32+0xda/0x1a30
> CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: GB   4.9.0-rc7+ #217
[...]

> I think you can rule out btrfs in any case, probably block layer as
> well, since it looks like this comes from shmem.

I should rather say that the VM runs on a 9p root filesystem and it
doesn't use/mount any block devices or disk-based filesystems.

I have all the trinity logs for the crash if that's useful. I tried a
couple of runs with just the (at the time) in-progress syscalls but it
didn't turn up anything interesting. Otherwise it seems like a lot of
data to go through by hand.

The crashing child seems to have just started, though, if that's relevant:

[child56:21744] [0]
open(filename="/sys/block/loop2/power/runtime_active_time",
flags=0x777b01, mode=666) = -1 (Not a directory)
[child56:21744] [1] [32BIT] sched_getattr(pid=1, param=0x7f37ec26c000,
size=3415) = -1 (Invalid argument)
[child56:21744] [2] [32BIT]
access(filename="/proc/2/task/2/net/stat/arp_cache", mode=2000) = -1
(Invalid argument)
[child56:21744] [3] getegid() = 0xfffe
[child56:21744] [4]
swapoff(path="/proc/721/task/721/net/dev_snmp6/tunl0") = -1 (Operation
not permitted)
[child56:21744] [5] timerfd_create(clockid=0x0, flags=0x0) = 439
[child56:21744] [6] pkey_mprotect(start=0x7f37ee656000, len=0,
prot=0x108, key=0x600) = 0
[child56:21744] [7] msync(start=0x7f37ee657000, len=0, flags=0x6) = 0
[child56:21744] [8] flock(fd=168, cmd=0xffc191f30b0c) = -1
(Invalid argument)
[child56:21744] [9] add_key(_type=0x437a15,
_description=0x7f37ec06c000, _payload=0x7f37ec06c000, plen=0,
ringid=0xfff8)

The other logfiles end thusly:

==> trinity-child0.log <==
[child0:21593] [311] faccessat(dfd=246,
filename="/proc/983/task/983/net/protocols", mode=2000) = -1 (Invalid
argument)
[child0:21593] [312] renameat(olddfd=246,
oldname="/proc/13/task/13/attr/sockcreate", newdfd=377,
newname="/proc/16/task/16/net/stat/rt_cache") = -1 (Permission denied)
[child0:21593] [313] [32BIT] readv(fd=289, vec=0x2e1a3d0, vlen=215) = 0

==> trinity-child100.log <==
[child100:21536] [439] setgid(gid=0x2a000200) = -1 (Operation not permitted)
[child100:21536] [440] waitid(which=175, upid=21587, infop=0x4,
options=3542, ru=0x7f37ec76c000) = -1 (Invalid argument)
[child100:21536] [441]
getxattr(pathname="/proc/980/task/980/net/ptype", name=0x7f37ee466000,
value=0x7f37ec26c000, size=49) = -1 (Operation not supported)

==> trinity-child101.log <==
[child101:21537] [55] getcwd(buf=0x7f37ee466000, size=4096) = 39
[child101:21537] [56] [32BIT] munlock(addr=0x7f37ee658000, len=0) = 0
[child101:21537] [57] semctl(semid=0xbd851e2b40e7df,
semnum=0x1b1b1b1b1b, cmd=0x20, arg=0xcacacacaca) = -1 (Invalid
argument)

==> trinity-child102.log <==
[child102:21542] [11] readahead(fd=353, offset=2, count=249) = -1
(Invalid argument)
[child102:21542] [12] add_key(_type=0x43793f,
_description=0x7f37ec46c000, _payload=0x7f37ee658000, plen=32,
ringid=0xfffa) = -1 (Invalid argument)
[child102:21542] [13] time(tloc=0x7f37ee466000) = 0x584474e0

==> trinity-child103.log <==
[child103:21543] [45] dup(fildes=183) = 512
[child103:21543] [46] rt_sigpending(set=0x7f37ec86c000, sigsetsize=32)
= -1 (Invalid argument)
[child103:21543] [47] newstat(filename="/proc/587/task/587/gid_map",
statbuf=0x7f37ee466000) = 0

==> trinity-child104.log <==
[child104:21546] [49] getdents(fd=162, dirent=0x0, count=127) = -1
(Not a directory)
[child104:21546] [50] [32BIT] clock_adjtime(which_clock=0, utx=0x4) =
-1 (Bad address)
[child104:21546] [51] setsid() = 0x542a

==> trinity-child105.log <==
[child105:21547] [523] epoll_wait(epfd=244, events=0x8, maxevents=246,
timeout=-1) = -1 (Invalid argument)
[child105:21547] [524] dup2(oldfd=244, newfd=244) = 244
[child105:21547] [525] acct(name=0x7f37ec26c000) = -1 (Operation not permitted)

==> trinity-child106.log <==
[child106:19910] [136] getegid() = 0xfffe
[child106:19910] [137] munmap(addr=0x7f37ee65a000, len=4096) = 0
[child106:19910] [138] clock_nanosleep(which_clock=0x1, flags=0x1,
rqtp=0x7f37ec06c000, rmtp=0x7f37ee466000)
==> trinity-child107.log <==
[child107:21224] [994] copy_file_range(fd_in=373, off_in=0x2400e210,
fd_out=373, off_out=8, len=8, flags=0x0) = -1 (Bad file descriptor)
[child107:21224] [995] kcmp(pid1=1, pid2=21453, type=0x5,
idx1=0x787878787878, idx2=0xff6060606060) = -1 (Operation not
permitted)
[child107:21224] [996] [32BIT] readv(fd=365, vec=0x2e27e10, vlen=36) = 0

==> trinity-child108.log <==
[child108:21226] [759] recvfrom(fd=219, ubuf=0x7f37ec26c000, size=8,
flags=0x0, addr=0x2e1ed80, addr_len=110) = -1 (Bad file descriptor)

Re: bio linked list corruption.

2016-12-04 Thread Vegard Nossum
On 23 November 2016 at 20:58, Dave Jones  wrote:
> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>
>  > [  317.689216] BUG: Bad page state in process kworker/u8:8  pfn:4d8fd4
>  > trace from just before this happened. Does this shed any light ?
>  >
>  > https://codemonkey.org.uk/junk/trace.txt
>
> crap, I just noticed the timestamps in the trace come from quite a bit
> later.  I'll tweak the code to do the taint checking/ftrace stop after
> every syscall, that should narrow the window some more.

FWIW I hit this as well:

BUG: unable to handle kernel paging request at 81ff08b7
IP: [] __lock_acquire.isra.32+0xda/0x1a30
PGD 461e067 PUD 461f063
PMD 1e001e1
Oops: 0003 [#1] PREEMPT SMP KASAN
Dumping ftrace buffer:
  (ftrace buffer empty)
CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: GB   4.9.0-rc7+ #217
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
task: 8801ee924080 task.stack: 8801bab88000
RIP: 0010:[]  []
__lock_acquire.isra.32+0xda/0x1a30
RSP: 0018:8801bab8f730  EFLAGS: 00010082
RAX: 81ff071f RBX:  RCX: 
RDX: 0003 RSI:  RDI: 85ae7d00
RBP: 8801bab8f7b0 R08: 0001 R09: 
R10: 8801e727fc40 R11: fbfff0b54ced R12: 85ae7d00
R13: 84912920 R14: 8801ee924080 R15: 
FS:  7f37ee653700() GS:8801f6a0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 81ff08b7 CR3: 0001daa7 CR4: 06f0
DR0: 7f37ee465000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
8801ee9247d0  0001 8801ee924080
8801f6a201c0 8801f6a201c0  0001
8801 8801 8801e727fc40 8801ee924080
Call Trace:
[] lock_acquire+0x141/0x2b0
[] ? finish_wait+0xb0/0x180
[] _raw_spin_lock_irqsave+0x49/0x60
[] ? finish_wait+0xb0/0x180
[] finish_wait+0xb0/0x180
[] shmem_fault+0x4c7/0x6b0
[] ? p9_client_rpc+0x13d/0xf40
[] ? shmem_getpage_gfp+0x1c90/0x1c90
[] ? radix_tree_next_chunk+0x4f7/0x840
[] ? wake_atomic_t_function+0x210/0x210
[] __do_fault+0x206/0x410
[] ? do_page_mkwrite+0x320/0x320
[] handle_mm_fault+0x1cef/0x2a60
[] ? handle_mm_fault+0x132/0x2a60
[] ? __pmd_alloc+0x370/0x370
[] ? inode_add_bytes+0x10e/0x160
[] ? memset+0x31/0x40
[] ? find_vma+0x30/0x150
[] __do_page_fault+0x452/0x9f0
[] ? iov_iter_init+0xaf/0x1d0
[] trace_do_page_fault+0x1e5/0x3a0
[] do_async_page_fault+0x27/0xa0
[] async_page_fault+0x28/0x30
[] ? strnlen_user+0x91/0x1a0
[] ? strnlen_user+0x6e/0x1a0
[] strndup_user+0x28/0xb0
[] SyS_add_key+0xc7/0x370
[] ? key_get_type_from_user.constprop.6+0xd0/0xd0
[] ? __context_tracking_exit.part.4+0x3a/0x1e0
[] ? key_get_type_from_user.constprop.6+0xd0/0xd0
[] do_syscall_64+0x1af/0x4d0
[] entry_SYSCALL64_slow_path+0x25/0x25
Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <3e>
ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00
RIP  [] __lock_acquire.isra.32+0xda/0x1a30

I didn't read all the emails in this thread, the crash site looks
identical to one of the earlier traces although the caller may be
different.

I think you can rule out btrfs in any case, probably block layer as
well, since it looks like this comes from shmem.


Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


divide error in __btrfs_map_block (div64_u64(stripe_nr, stripe_len))

2015-12-01 Thread Vegard Nossum

Hi,

With the attached (fuzzed) disk image I get this crash on latest 
linus/master when mounting:


BTRFS: device fsid de80ced1-18ac-490c-9afb-cf0a7d66cc7e devid 1 transid 
7 /dev/loop0

BTRFS info (device loop0): disk space caching is enabled
divide error:  [#1] SMP KASAN
CPU: 0 PID: 955 Comm: mount Not tainted 4.4.0-rc3+ #244
task: 880015231c00 ti: 8800156f task.ti: 8800156f
RIP: 0010:[]  [] 
__btrfs_map_block+0x175/0x1b30

RSP: 0018:8800156f6f18  EFLAGS: 00010246
RAX: 00021000 RBX: 00021000 RCX: 880015fce000
RDX:  RSI: 00021000 RDI: 8800152cb9f0
RBP: 8800156f70c8 R08: 0040 R09: 
R10: 880015fde1c0 R11: 0001 R12: 0040
R13: 00021000 R14:  R15: 8800156f7170
FS:  7f387f51c880() GS:880016e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f387eca3540 CR3: 0005d000 CR4: 001406b0
Stack:
 81224300 8800156f6f48 8101af95 8800156f6f48
 810ad7fe  8800156f6fa8 8100b5cb
 8800156f7ff8 8800156f c0ed0001 8800156f71b0
Call Trace:
 [] btrfs_map_bio+0x128/0x600
 [] btree_submit_bio_hook+0x179/0x190
 [] submit_one_bio+0xee/0x120
 [] read_extent_buffer_pages+0x2cf/0x4a0
 [] 
btree_read_extent_buffer_pages.constprop.51+0x12e/0x190

 [] read_tree_block+0x4b/0x80
 [] open_ctree+0x2489/0x3770
 [] btrfs_mount+0xf43/0x10c0
 [] mount_fs+0x56/0x1b0
 [] vfs_kern_mount+0x66/0x190
 [] btrfs_mount+0x27e/0x10c0
 [] mount_fs+0x56/0x1b0
 [] vfs_kern_mount+0x66/0x190
 [] do_mount+0x362/0x16b0
 [] SyS_mount+0xf6/0x160
 [] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 29 da 48 89 d3 49 89 c6 48 89 85 e8 fe ff ff 48 83 c0 10 48 89 c7 
48 89 85 30 ff ff ff e8 54 95 d5 ff 4d 63 76 10 31 d2 48 89 d8 <49> f7 
f6 44 89 b5 20 ff ff ff 48 89 85 10 ff ff ff 49 0f af c6

RIP  [] __btrfs_map_block+0x175/0x1b30
 RSP 
---[ end trace c42185c4a2495b9c ]---
mount (955) used greatest stack depth: 25200 bytes left
Segmentation fault

It seems to be this line:

/*
 * stripe_nr counts the total number of stripes we have to stride
 * to get to this block
 */
stripe_nr = div64_u64(stripe_nr, stripe_len);

I can test patches. Thanks,


Vegard


btrfs.2.bz2
Description: application/bzip


Re: BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!

2015-11-30 Thread Vegard Nossum

On 11/30/2015 05:34 PM, David Sterba wrote:

On Mon, Nov 30, 2015 at 02:48:51PM +0100, David Sterba wrote:

On Sun, Nov 15, 2015 at 07:21:17PM +0100, Vegard Nossum wrote:

With the attached btrfs image, I get the following splat when mounting:

"""
# mount -o loop -t btrfs ./btrfs.0 /mnt/0/
BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid
7 /dev/loop0
BTRFS info (device loop0): disk space caching is enabled
BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!


The fix, worked for me on the provided image:
https://patchwork.kernel.org/patch/7728051/

I'll add the attached image to btrfs-progs testsuite as it triggers
crashes in other tools.



Thanks, that seems to fix the problem.

With your patch and a new image, I run into a second issue (which is 
probably unrelated):


BTRFS critical (device loop0): unable to find logical 4294963200 len 
40966cc7e != de8

BUG: failure at fs/btrfs/inode.c:1805/btrfs_merge_bio_hook()!
Kernel panic - not syncing: BUG! errors
CPU: 0 PID: 913 Comm: mount Not tainted 4.2.5 #1cc7e devid 1 transid 7 
/dev/loop0
Stack: dev_item UUID does not match fsid: 
de80ced1-18ac-490c-9afb-cf0a7d66cc7e != de8

  e0147430 60075412 600bd457 603f5080
  606dfc7a 605e2b14 e0147440 605e595fors
  e0147560 605e291e e00cb2e8 1000
 Call Trace:
  [<60029f3b>] show_stack+0xdb/0x1a0
  [<605e595f>] dump_stack+0x2a/0x2c
  [<605e291e>] panic+0x137/0x2ac
  [<602b8d0c>] btrfs_merge_bio_hook+0xfc/0x100
  [<602e3923>] submit_extent_page+0x223/0x330
  [<602e51a6>] __do_readpage+0x476/0xb60
  [<602e59bf>] __extent_read_full_page+0x12f/0x140
  [<602e8264>] read_extent_buffer_pages+0x1e4/0x410
  [<602a9d46>] btree_read_extent_buffer_pages.constprop.24+0x106/0x1a0
  [<602aa67e>] read_tree_block+0x5e/0xa0
  [<602b0064>] open_ctree+0x1814/0x2db0
  [<60273a3b>] btrfs_mount+0xf3b/0x1010
  [<601054d3>] mount_fs+0x33/0x210
  [<60124b94>] vfs_kern_mount+0x74/0x170
  [<60272df3>] btrfs_mount+0x2f3/0x1010
  [<601054d3>] mount_fs+0x33/0x210
  [<60124b94>] vfs_kern_mount+0x74/0x170
  [<601264dc>] do_mount+0x26c/0xf30
  [<6012768b>] SyS_mount+0xab/0x120

Should I start a new thread? I've attached the new image. Thanks,


Vegard


btrfs.1.bz2
Description: application/bzip


BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!

2015-11-15 Thread Vegard Nossum

Hi,

With the attached btrfs image, I get the following splat when mounting:

"""
# mount -o loop -t btrfs ./btrfs.0 /mnt/0/
BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 
7 /dev/loop0

BTRFS info (device loop0): disk space caching is enabled
BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
Kernel panic - not syncing: BUG!
CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25
Stack:
 637af890 60062489 602aeb2e 604192ba
 60387961 0011 637af8a0 6038a835
 637af9c0 6038776b 634ef32b 
Call Trace:
 [<6001c86d>] show_stack+0xfe/0x15b
 [<6038a835>] dump_stack+0x2a/0x2c
 [<6038776b>] panic+0x13e/0x2b3
 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff
 [<601cfbbe>] open_ctree+0x192d/0x27af
 [<6019c2c1>] btrfs_mount+0x8f5/0xb9a
 [<600bc9a7>] mount_fs+0x11/0xf3
 [<600d5167>] vfs_kern_mount+0x75/0x11a
 [<6019bcb0>] btrfs_mount+0x2e4/0xb9a
 [<600bc9a7>] mount_fs+0x11/0xf3
 [<600d5167>] vfs_kern_mount+0x75/0x11a
 [<600d710b>] do_mount+0xa35/0xbc9
 [<600d7557>] SyS_mount+0x95/0xc8
 [<6001e884>] handle_syscall+0x6b/0x8e

Aborted
"""

I can test patches. Thanks,


Vegard


btrfs.0.bz2
Description: application/bzip