Re: strange crashes in tcp_poll() via epoll_wait
Eric Dumazet wrote: > On Fri, 2013-07-19 at 23:50 +, Eric Wong wrote: > > Eric Dumazet wrote: > > > Hi Al > > > > > > I tried to debug strange crashes in tcp_poll() called from > > > sys_epoll_wait() -> sock_poll() > > > > > > The symptom is that sock->sk is NULL and we therefore dereference a NULL > > > pointer. > > > > > > It's really rare crashes but still, it would be nice to understand where > > > is the bug. Presumably latest kernels would crash in sock_poll() because > > > of the sk_can_busy_loop(sock->sk) call. > > > > > > We do test sock->sk being NULL in sock_fasync(), but epoll should be > > > safe because of existing synchronization (epmutex) ? > > > > It should be safe because of ep->mtx, actually, as epmutex is not taken > > in sys_epoll_wait. > > Hmm, it might be more complex than that for multi threaded programs : > > eventpoll_release_file() > > The problem might be because a thread closes a socket while an event > was queued for it. But ep->mtx is also held when traversing the ready list with ep_send_events_proc. Can sock->sk somehow be NULL before hitting eventpoll_release_file? > > I took a look at this but have not found anything. I've yet to see this > > this on my machines. > > > > When did you start noticing this? > > Hard to say, but we have these crashes on a 3.3+ based kernel. So I don't think any of my epoll changes caused it. Phew! > Probability of said crashes is very very low. This still worries me since I rely heavily on multi-threaded epoll. I don't have a lot of cores/CPUs, though, so maybe it's harder to trigger any potential race as a result... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange crashes in tcp_poll() via epoll_wait
On Fri, 2013-07-19 at 23:50 +, Eric Wong wrote: > Eric Dumazet wrote: > > Hi Al > > > > I tried to debug strange crashes in tcp_poll() called from > > sys_epoll_wait() -> sock_poll() > > > > The symptom is that sock->sk is NULL and we therefore dereference a NULL > > pointer. > > > > It's really rare crashes but still, it would be nice to understand where > > is the bug. Presumably latest kernels would crash in sock_poll() because > > of the sk_can_busy_loop(sock->sk) call. > > > > We do test sock->sk being NULL in sock_fasync(), but epoll should be > > safe because of existing synchronization (epmutex) ? > > It should be safe because of ep->mtx, actually, as epmutex is not taken > in sys_epoll_wait. Hmm, it might be more complex than that for multi threaded programs : eventpoll_release_file() The problem might be because a thread closes a socket while an event was queued for it. > > I took a look at this but have not found anything. I've yet to see this > this on my machines. > > When did you start noticing this? Hard to say, but we have these crashes on a 3.3+ based kernel. Probability of said crashes is very very low. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange crashes in tcp_poll() via epoll_wait
Eric Dumazet wrote: > Hi Al > > I tried to debug strange crashes in tcp_poll() called from > sys_epoll_wait() -> sock_poll() > > The symptom is that sock->sk is NULL and we therefore dereference a NULL > pointer. > > It's really rare crashes but still, it would be nice to understand where > is the bug. Presumably latest kernels would crash in sock_poll() because > of the sk_can_busy_loop(sock->sk) call. > > We do test sock->sk being NULL in sock_fasync(), but epoll should be > safe because of existing synchronization (epmutex) ? It should be safe because of ep->mtx, actually, as epmutex is not taken in sys_epoll_wait. I took a look at this but have not found anything. I've yet to see this this on my machines. When did you start noticing this? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
strange crashes in tcp_poll() via epoll_wait
Hi Al I tried to debug strange crashes in tcp_poll() called from sys_epoll_wait() -> sock_poll() The symptom is that sock->sk is NULL and we therefore dereference a NULL pointer. It's really rare crashes but still, it would be nice to understand where is the bug. Presumably latest kernels would crash in sock_poll() because of the sk_can_busy_loop(sock->sk) call. We do test sock->sk being NULL in sock_fasync(), but epoll should be safe because of existing synchronization (epmutex) ? Any idea? Thanks ! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/