Re: BUG at net/sctp/socket.c:7425

2017-01-30 Thread Alexander Popov
On 29.01.2017 13:40, Marcelo Ricardo Leitner wrote:
> On Sun, Jan 29, 2017 at 03:35:31AM +0300, Alexander Popov wrote:
>> Hello,
>>
>> I'm running the syzkaller fuzzer for v4.10-rc4 
>> (0aa0313f9d576affd7747cc3f179feb097d28990)
>> and have such a crash in sctp code:
>>
> ...
>>
>> Unfortunately, I didn't manage to get a C program reproducing the crash 
>> (looks like race).
>> However, I stably hit it on my setup - so I can help fixing the issue.
>>
>> The crash happens here:
>>  /* Let another process have a go.  Since we are going
>>   * to sleep anyway.
>>   */
>>  release_sock(sk);
>>  current_timeo = schedule_timeout(current_timeo);
>>> BUG_ON(sk != asoc->base.sk);
>>  lock_sock(sk);
>>
>> I've added some debugging output and see, that the original value of 
>> asoc->base.sk is
>> changed to the address of another struct sock, which appeared in 
>> sctp_endpoint_init()
>> shortly before the crash.
> 
> You need some threading for this to happen.  asoc->base.sk will change
> if you peeloff the association.
> It seems you had one thread waiting for some sndbuf to be available on a
> sendmsg() call and another thread did a peeloff on the association that
> the first thread was using.
> Yeah I think this will reproduce it.
> And in this case, it's probably better if we just return -EPIPE as the
> association doesn't exist in that socket anymore instead of the BUG_ON.

Thanks for your reply and patch, Marcelo.
I've checked your explanation and agree with it. The situation looks like this:
...
[   55.719561] sctp_endpoint_init: sk 88006718c8c0
[   55.721158] sctp_association_init: asoc 880059e96818, base.sk = 
88006718c8c0
...
[   56.144070] sctp_wait_for_sndbuf: asoc:880059e96818, 
timeo:9223372036854775807,
msg_len:24
[   56.148650] sctp_endpoint_init: sk 880068bca480
[   56.149216] sctp_sock_migrate: asoc 880059e96818 from oldsk 
88006718c8c0 to
newsk 880068bca480
[   56.150442] sctp_assoc_migrate: asoc 880059e96818 to newsk 
880068bca480
[   56.168827] crash!!! asoc 880059e96818: sk 88006718c8c0 != base.sk 
880068bca480
[   56.169801] [ cut here ]
[   56.170151] kernel BUG at net/sctp/socket.c:7433!
...


> ---8<---
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 26a514269b92..e9870aead88b 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -6838,7 +6838,8 @@ static int sctp_wait_for_sndbuf(struct sctp_association 
> *asoc, long *timeo_p,
>*/
>   sctp_release_sock(sk);
>   current_timeo = schedule_timeout(current_timeo);
> - BUG_ON(sk != asoc->base.sk);
> + if (sk != asoc->base.sk)
> + goto do_error;
>   sctp_lock_sock(sk);
>  
>   *timeo_p = current_timeo;

Tested your fix.
Acked-by: Alexander Popov 



Re: BUG at net/sctp/socket.c:7425

2017-01-29 Thread Marcelo Ricardo Leitner
On Sun, Jan 29, 2017 at 03:35:31AM +0300, Alexander Popov wrote:
> Hello,
> 
> I'm running the syzkaller fuzzer for v4.10-rc4 
> (0aa0313f9d576affd7747cc3f179feb097d28990)
> and have such a crash in sctp code:
> 
...
> 
> Unfortunately, I didn't manage to get a C program reproducing the crash 
> (looks like race).
> However, I stably hit it on my setup - so I can help fixing the issue.
> 
> The crash happens here:
>   /* Let another process have a go.  Since we are going
>* to sleep anyway.
>*/
>   release_sock(sk);
>   current_timeo = schedule_timeout(current_timeo);
> > BUG_ON(sk != asoc->base.sk);
>   lock_sock(sk);
> 
> I've added some debugging output and see, that the original value of 
> asoc->base.sk is
> changed to the address of another struct sock, which appeared in 
> sctp_endpoint_init()
> shortly before the crash.

You need some threading for this to happen.  asoc->base.sk will change
if you peeloff the association.
It seems you had one thread waiting for some sndbuf to be available on a
sendmsg() call and another thread did a peeloff on the association that
the first thread was using.
Yeah I think this will reproduce it.
And in this case, it's probably better if we just return -EPIPE as the
association doesn't exist in that socket anymore instead of the BUG_ON.

  Marcelo

---8<---

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 26a514269b92..e9870aead88b 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6838,7 +6838,8 @@ static int sctp_wait_for_sndbuf(struct sctp_association 
*asoc, long *timeo_p,
 */
sctp_release_sock(sk);
current_timeo = schedule_timeout(current_timeo);
-   BUG_ON(sk != asoc->base.sk);
+   if (sk != asoc->base.sk)
+   goto do_error;
sctp_lock_sock(sk);
 
*timeo_p = current_timeo;


BUG at net/sctp/socket.c:7425

2017-01-28 Thread Alexander Popov
Hello,

I'm running the syzkaller fuzzer for v4.10-rc4 
(0aa0313f9d576affd7747cc3f179feb097d28990)
and have such a crash in sctp code:

[   38.423932] [ cut here ]
[   38.424298] kernel BUG at net/sctp/socket.c:7425!
[   38.424583] invalid opcode:  [#1] SMP KASAN
[   38.424839] Dumping ftrace buffer:
[   38.425031](ftrace buffer empty)
[   38.425232] Modules linked in: sctp libcrc32c snd_hda_codec_generic 
snd_hda_intel
snd_hda_codec snd_hda_core snd_intel8x0 snd_ens1370 snd_ac97_codec gameport 
snd_rawmidi
snd_hwdep snd_seq_device ac97_bus snd_pcm hid_generic joydev usbmouse snd_timer 
psmouse
usbhid e1000 snd hid parport_pc i2c_piix4 soundcore serio_raw parport 
input_leds pcspkr
floppy evbug mac_hid
[   38.427058] CPU: 0 PID: 1930 Comm: syz-executor12 Not tainted 4.10.0-rc4+ #2
[   38.427457] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   38.427999] task: 88006945ca00 task.stack: 880053e4
[   38.428364] RIP: 0010:sctp_sendmsg+0x29b3/0x3030 [sctp]
[   38.428719] RSP: 0018:880053e478f8 EFLAGS: 00010297
[   38.429062] RAX: 88006945ca00 RBX: 880048d148c0 RCX: 
[   38.429636] RDX:  RSI:  RDI: 88006d022c88
[   38.430051] RBP: 880053e47b70 R08: 0560 R09: 88007ffda680
[   38.430473] R10: 000a R11: 1d400032be05 R12: dc00
[   38.430915] R13: 880048d148c0 R14:  R15: 880059ad9160
[   38.431390] FS:  7f984a645700() GS:88006d00() 
knlGS:
[   38.431979] CS:  0010 DS:  ES:  CR0: 80050033
[   38.432405] CR2: 20005fe0 CR3: 6400a000 CR4: 06f0
[   38.432827] DR0:  DR1:  DR2: 
[   38.433253] DR3:  DR6: fffe0ff0 DR7: 0400
[   38.433765] Call Trace:
[   38.433938]  ? sctp_id2assoc+0x330/0x330 [sctp]
[   38.434245]  ? wake_atomic_t_function+0x2b0/0x2b0
[   38.434545]  inet_sendmsg+0x128/0x3a0
[   38.434758]  ? inet_recvmsg+0x420/0x420
[   38.434983]  sock_sendmsg+0xcf/0x110
[   38.435192]  sock_write_iter+0x222/0x3c0
[   38.435421]  ? sock_sendmsg+0x110/0x110
[   38.435644]  ? iov_iter_init+0xaf/0x1d0
[   38.435867]  __vfs_write+0x3cb/0x640
[   38.436075]  ? do_iter_readv_writev+0x4c0/0x4c0
[   38.436338]  ? apparmor_file_permission+0x27/0x30
[   38.436618]  ? rw_verify_area+0xea/0x2b0
[   38.436853]  vfs_write+0x175/0x4e0
[   38.437053]  SyS_write+0xd8/0x1b0
[   38.437283]  ? SyS_read+0x1b0/0x1b0
[   38.437522]  entry_SYSCALL_64_fastpath+0x1e/0xad
[   38.437820] RIP: 0033:0x44f869
[   38.438013] RSP: 002b:7f984a644b58 EFLAGS: 0212 ORIG_RAX: 
0001
[   38.438464] RAX: ffda RBX: 7f984a645700 RCX: 0044f869
[   38.438886] RDX: 0018 RSI: 20ac4fe8 RDI: 0004
[   38.439305] RBP: 7ffe1d7be490 R08:  R09: 
[   38.439712] R10:  R11: 0212 R12: 
[   38.440145] R13: 7ffe1d7be40f R14: 7f984a6459c0 R15: 
[   38.440563] Code: c7 c7 10 1a 5c a0 e8 4d fb 76 e1 c6 44 24 68 01 e9 a2 f2 
ff ff e8 be
34 e1 e0 8b 9c 24 98 00 00 00 e9 06 fd ff ff e8 ad 34 e1 e0 <0f> 0b e8 a6 34 e1 
e0 4c 8b
4c 24 78 4c 8b 44 24 68 4c 89 f9 48
[   38.441881] RIP: sctp_sendmsg+0x29b3/0x3030 [sctp] RSP: 880053e478f8
[   38.442341] ---[ end trace c704b04c884389c0 ]---
[   38.442634] Kernel panic - not syncing: Fatal exception
[   38.443084] Dumping ftrace buffer:
[   38.443335](ftrace buffer empty)
[   38.443590] Kernel Offset: disabled


Unfortunately, I didn't manage to get a C program reproducing the crash (looks 
like race).
However, I stably hit it on my setup - so I can help fixing the issue.

The crash happens here:
/* Let another process have a go.  Since we are going
 * to sleep anyway.
 */
release_sock(sk);
current_timeo = schedule_timeout(current_timeo);
>   BUG_ON(sk != asoc->base.sk);
lock_sock(sk);

I've added some debugging output and see, that the original value of 
asoc->base.sk is
changed to the address of another struct sock, which appeared in 
sctp_endpoint_init()
shortly before the crash.

Hope for some assistance.
Best regards,
Alexander