Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-21 Thread Rainer Weikusat
Rainer Weikusat  writes:
> Jason Baron  writes:
>> On 10/18/2015 04:58 PM, Rainer Weikusat wrote:

[...]

>> 1)
>>
>> In unix_peer_wake_relay() function, 'sk_wq' is an __rcu pointer and thus
>> it requires proper dereferencing. Something like:
>>
>> struct unix_sock *u;
>> struct socket_wq *wq;
>>
>> u = container_of(wait, struct unix_sock, wait);
>> rcu_read_lock();
>> wq = rcu_dereference(u->sk.sk_wq);
>> if (wq_has_sleeper(wq))
>>  wake_up_interruptible_sync_poll(>wait, key);
>> rcu_read_unlock();
>
> I think this may be unecessary

I consider this unnecessary now.

Rationale: The wait queue is allocated and freed in tandem with the
socket inode which means it will remain allocated until after the
protocol release function (unix_release with the bulk of the
implementation being in unix_release_sock) returned. As the connection
with the other socket is broken in unix_release_sock, any relayed wake
up must have completed before this time (since both operations take
place while holding the same wait queue lock).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-20 Thread Rainer Weikusat
Jason Baron  writes:
> On 10/18/2015 04:58 PM, Rainer Weikusat wrote:
>
> [...]
>
>> 
>> The idea behind 'the wait queue' (insofar I'm aware of it) is that it
>> will be used as list of threads who need to be notified when the
>> associated event occurs. Since you seem to argue that the run-of-the-mill
>> algorithm is too slow for this particular case, is there anything to
>> back this up?
>>
>
> Generally the poll() routines only add to a wait queue once at the
> beginning, and all subsequent calls to poll() simply check the wakeup
> conditions. So here you are proposing to add/remove to the wait queue on
> subsequent invocations of poll(). So the initial patch I did, continued
> in the usual pattern and only added once on registration or connect().

The code uses the private member of a wait_queue_t to record if it the
add_wait_queue was already executed so the add/remove will only happen
if the wakeup condition changed in the meantime (which usually ought to
be the case, though). As far as I understand this, this really only
makes a difference for epoll as only epoll will keep everything on the
wait queues managed by it between 'polling calls'. In order to support
epoll-style wait queue management outside of epoll, the poll management
code would need to execute a cleanup callback instead of just the setup
callback it already executes.

> 1)
>
> In unix_peer_wake_relay() function, 'sk_wq' is an __rcu pointer and thus
> it requires proper dereferencing. Something like:
>
> struct unix_sock *u;
> struct socket_wq *wq;
>
> u = container_of(wait, struct unix_sock, wait);
> rcu_read_lock();
> wq = rcu_dereference(u->sk.sk_wq);
> if (wq_has_sleeper(wq))
>   wake_up_interruptible_sync_poll(>wait, key);
> rcu_read_unlock();

I think this may be unecessary but I need more time to check this than
the odd "half an hour after work after 11pm [UK time]" I could put into
this today.

> 2)
>
> For the case of epoll() in edge triggered mode we need to ensure that
> when we return -EAGAIN from unix_dgram_sendmsg() when unix_recvq_full()
> is true, we need to add a unix_peer_wake_connect() call to guarantee a
> wakeup. Otherwise, we are going to potentially hang there.

I consider this necessary.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-19 Thread Jason Baron
On 10/18/2015 04:58 PM, Rainer Weikusat wrote:

[...]

> 
> The idea behind 'the wait queue' (insofar I'm aware of it) is that it
> will be used as list of threads who need to be notified when the
> associated event occurs. Since you seem to argue that the run-of-the-mill
> algorithm is too slow for this particular case, is there anything to
> back this up?
>

Generally the poll() routines only add to a wait queue once at the
beginning, and all subsequent calls to poll() simply check the wakeup
conditions. So here you are proposing to add/remove to the wait queue on
subsequent invocations of poll(). So the initial patch I did, continued
in the usual pattern and only added once on registration or connect().

However, I do think that this is a special case since the registration
is on a shared wait queue, and thus having a long list of registered
waiters is going to affect all waiters. So I am fine with doing the
add/removes in the poll() routine and I agree that the patch below is
more compact that what I initially posted.

A couple of notes on your patch:

1)

In unix_peer_wake_relay() function, 'sk_wq' is an __rcu pointer and thus
it requires proper dereferencing. Something like:

struct unix_sock *u;
struct socket_wq *wq;

u = container_of(wait, struct unix_sock, wait);
rcu_read_lock();
wq = rcu_dereference(u->sk.sk_wq);
if (wq_has_sleeper(wq))
wake_up_interruptible_sync_poll(>wait, key);
rcu_read_unlock();


2)

For the case of epoll() in edge triggered mode we need to ensure that
when we return -EAGAIN from unix_dgram_sendmsg() when unix_recvq_full()
is true, we need to add a unix_peer_wake_connect() call to guarantee a
wakeup. Otherwise, we are going to potentially hang there.

With these changes (or tell me why they are not needed), I'm happy to
ack this patch.

Thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-18 Thread Rainer Weikusat
Jason Baron  writes:
> 
>> 
>> X-Signed-Off-By: Rainer Weikusat 
>> 

Sorry for the delayed reply but I had to do some catching up on what the
people who pay me consider 'useful work' :-).

> So the patches I've posted and yours both use the idea of a relaying
> the remote peer wakeup via callbacks that are internal to the net/unix,
> such that we avoid exposing the remote peer wakeup to the external
> poll()/select()/epoll(). They differ in when and how those callbacks
> are registered/unregistered.

Yes. In my opinion, that's the next best idea considering that the need
to handle this particular situation is presumably specfcific to the
af_unix.c code and thus, doesn't warrant changes to all I/O multiplexing
implementations.

> So I think your approach here will generally keep the peer wait wakeup
> queue to its absolute minimum, by removing from that queue when
> we set POLLOUT, however it requires taking the peer waitqueue lock on
> every poll() call.
>
> So I think there are tradeoffs here vs. what I've
> posted. So for example, if there are a lot of writers against one 'server'
> socket, there is going to be a lot of lock contention with your approach
> here. So I think the performance is going to depend on the workload that
> is tested.

This approach is really completely run-of-the-mill "possibly sleep until
some event occurs" code, eg, you'll find a description of this exact
procedure on p. 157 of chapter 6 of

https://lwn.net/Kernel/LDD3/

The idea behind 'the wait queue' (insofar I'm aware of it) is that it
will be used as list of threads who need to be notified when the
associated event occurs. Since you seem to argue that the run-of-the-mill
algorithm is too slow for this particular case, is there anything to
back this up?

This is also sort-of a zero-sum game as the idea to enqueue on a wait
queue because the event could possibly become interesting in future
really just shifts (for n:1 connected sockets) work from the (possibly
many) clients to the single server. And considering that 'the wait'
became necessary (if it became necessary) because flow-control kicked in
to stop clients from sending more data to the server until it had time
to catch up with the already sent data, this doesn't seem like a good
idea to me.

Using a flag to signal that at least some of the members of this queue
actually want to be woken up will also only save work if it is assumed
that most threads won't ever really be waiting for this, ie, won't
execute the corresponding unix_dgram_poll code. But if that code isn't
going to be executed most of the time, anyway, why optimize it?

It's also not really necessary to take the waitqueue lock on every poll,
eg, the code in unix_dgram_poll could be changed like this:

need_wakeup = 0;
unix_state_lock(sk);

other = unix_peer(sk);
if (other && unix_peer(other) != sk) {
if (unix_recvq_full(other)) {
need_wakeup = !unix_peer_wake_connect(sk, 
other);

if (unix_recvq_full(other)) {
writable = 0;
need_wakeup = 0;
} else
unix_peer_wake_disconnect(sk, other);
} else
need_wakeup = unix_peer_wake_disconnect(sk, 
other);
}

unix_state_unlock(sk);
if (need_wakeup)
wake_up_interruptible_poll(sk_sleep(sk), POLL_OUT_ALL);

I'm no big fan of adding complications like this based on the conjecture
that they might be useful, but this one seems justified to me. IMHO,
there are two interesting cases here:

1. The current thread won't have to wait. It should detect this quickly
   so that it can get back to work, hence, check for this before
   'preparing to wait'.

2. Nothing to be done here right now, hence, performing an additional
   check for 'still not writeable' shouldn't matter much.

Yet another 3.2.54 patch with the change above

---
--- linux-2-6.b/net/unix/af_unix.c  2015-10-18 20:41:10.829404899 +0100
+++ linux-2-6/net/unix/af_unix.c2015-10-18 20:17:34.819134482 +0100
@@ -115,6 +115,8 @@
 #include 
 #include 
 
+#define POLL_OUT_ALL   (POLLOUT | POLLWRNORM | POLLWRBAND)
+
 static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 static DEFINE_SPINLOCK(unix_table_lock);
 static atomic_long_t unix_nr_socks;
@@ -303,6 +305,53 @@ found:
return s;
 }
 
+static int unix_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+   void *key)
+{
+   struct unix_sock *u;
+   wait_queue_head_t *u_sleep;
+
+   u = container_of(q, struct unix_sock, peer_wake);
+
+   u_sleep = sk_sleep(>sk);
+   if (u_sleep)
+   

Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-14 Thread Rainer Weikusat
Jason Baron  writes:
> On 10/12/2015 04:41 PM, Rainer Weikusat wrote:
>> Jason Baron  writes:
>>> On 10/05/2015 12:31 PM, Rainer Weikusat wrote:

[...]

>> OTOH, something I seriously dislike about your relaying implementation
>> is the unconditional add_wait_queue upon connect as this builds up a
>> possibly large wait queue of entities entirely uninterested in the
>> event which will need to be traversed even if peer_wait wakeups will
>> only happen if at least someone actually wants to be notified. This
>> could be changed such that the struct unix_sock member is only put onto
>> the peer's wait queue in poll and only if it hasn't already been put
>> onto it. The connection could then be severed if some kind of
>> 'disconnect' occurs.

[...]


> What about the following race?
>
> 1) thread A does poll() on f, finds the wakeup condition low, and adds
> itself to the remote peer_wait queue.
>
> 2) thread B sets the wake up condition in dgram_recvmsg(), but does not
> execute the wakeup of threads yet.
>
> 3) thread C also does a poll() on f, finds now that the wakeup condition
> is set, and hence removes f from the remote peer_wait queue.
>
> Then, thread A misses the POLLOUT, and hangs.

Indeed. This starts to turn into an interesting, little problem. Based
on the code I posted, there are (AFAICT) four possible situations a
thread looking for 'peer will accept data' can find itself in:

A - supposed to mean 'it connected to the peer_wait queue'
B - the peer socket is ready to accept messages

A && B
--

Since A, no other thread can be waiting for this event and since B, A
was undone. Fine to return.


!A && B
---

!A means 'was enqueued to peer_wait during an earlier call'. This means
any number of threads may be waiting. Because of this, do an additional
wake up after disconnecting since the current thread may never have
slept, ie, possibly, no wake up occurred yet.


 A && !B
 ---

 Now connected to receive peer wake ups. Go to sleep.


 !A && !B
 

 Someone else connected. Go to sleep.

> I understand your concern about POLLIN only waiters-I think we
> could add the 'relay callback' in dgram_poll() only for those who are
> looking for POLLOUT, and simply avoid the de-registration,

Whether or not the currently running thread wanted to write or to read
can only be determined (AIUI) if a poll_table was passed. If this was
the case and the thread didn't look for 'write space', it will never
execute the code in question, cf

/* No write status requested, avoid expensive OUT tests. */
if (wait && !(wait->key & POLL_OUT_ALL))
return mask;

As to 'concerns', I'll try to word that somewhat differently: The usual
way to handle a 'may sleep' situation is

1. Enqueue on the associated wait_queue.

2. Test the condition

2a) No sleep, dequeue and return

2b) Sleep until woken up

This means the wait queue is used to 'register' threads which want to be
woken up if a certain event happens. It's not used such that a thread
first enqueues itself on all wait queues it could ever need to enqueued
on and that an associated flag is used to inform the code on the way to
doing a wake up whether or not any of the entries on the list actually
corresponds to someone waiting for one. Even if such a flag exists, the
time needed to do the wake up is still proportional to size of the list.

For the given case, take a usual socketpair: It's absolutely pointless
to put these on the peer_wait queues of each other because all datagrams
send by one go to the other, ie, a situation where a thread goes to
sleep because it has to wait until data written via unrelated sockets
has been consumed can never happen.

I'll include the patch I'm currently using below, including an
X-Signed-Off by line to signify that this is the result of kernel
development sponsored by my employer and made available under the usual
terms. Changes since the last version:

- fixed the inverted (rather accidentally uninverted) test in
  _connect

- simplified the peer-checking logic: Instead of getting the
  peer with the state locked and then doing all kinds of checks
  which will hopefully enable avoiding to lock the state again,
  then, possibly, lock it again, lock the socket, do the peer
  checks followed by whatever else is necessary, then unlock it

- addressed the race described above by means of doing a wake_up
  on sk_sleep(sk) if a thread breaks the connection which
  didn't also establish it

- a comment (yohoho) explaining all of this -- I don't usually
  use comments in my own code because I consider them rather an
  annoyance than helpful

Unrelated change:

- #defined POLL_OUT_ALL (POLLOUT | POLLWRNORM | POLLWRBAND)
  and use that instead of the explicit triple

X-Signed-Off-By: Rainer Weikusat 


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-14 Thread Jason Baron

> 
> X-Signed-Off-By: Rainer Weikusat 
> 

Hi,

So the patches I've posted and yours both use the idea of a relaying
the remote peer wakeup via callbacks that are internal to the net/unix,
such that we avoid exposing the remote peer wakeup to the external
poll()/select()/epoll(). They differ in when and how those callbacks
are registered/unregistered.

So I think your approach here will generally keep the peer wait wakeup
queue to its absolute minimum, by removing from that queue when
we set POLLOUT, however it requires taking the peer waitqueue lock on
every poll() call. So I think there are tradeoffs here vs. what I've
posted. So for example, if there are a lot of writers against one 'server'
socket, there is going to be a lot of lock contention with your approach
here. So I think the performance is going to depend on the workload that
is tested.

I don't have a specific workload that I am trying to solve here, but
since you introduced the waiting on the remote peer queue, perhaps you
can post numbers comparing the patches that have been posted for the
workload that this was developed for? I will send you the latest version
of what I have privately - so as not to overly spam the list.

Thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-13 Thread Jason Baron
On 10/12/2015 04:41 PM, Rainer Weikusat wrote:
> Jason Baron  writes:
>> On 10/05/2015 12:31 PM, Rainer Weikusat wrote:
> 
> [...]
> 
>>> Here's a more simple idea which _might_ work. The underlying problem
>>> seems to be that the second sock_poll_wait introduces a covert reference
>>> to the peer socket which isn't accounted for. The basic idea behind this
>>> is to execute an additional sock_hold for the peer whenever the
>>> sock_poll_wait is called for it and store it (the struct sock *) in a
>>> new struct unix_sock member.
> 
> [...]
> 
>> Interesting - will this work for the test case you supplied where we are in
>> epoll_wait() and another thread does a connect() to a new target? In that
>> case, even if we issue a wakeup to the epoll thread, its not going to have
>> a non-NULL poll_table, so it wouldn't be added to the new target. IE
>> the first test case here:
>>
>> https://lkml.org/lkml/2015/10/4/154
> 
> "Indeed it would not." I've also meanwhile found the time to check what
> is and isn't locked here and found that Eric's "this looks racy" was
> also justified. In theory, a clean solution could be based on modifying
> the various polling implementations to keep a piece of data for a polled
> something and provided that again on each subsequent poll call. This
> could then be used to keep the peer socket alive for as long as
> necessary were it possible to change the set of wait queues with every
> poll call. Since this also isn't the case, the idea to increment the
> reference count of the peer socket won't fly.
> 
> OTOH, something I seriously dislike about your relaying implementation
> is the unconditional add_wait_queue upon connect as this builds up a
> possibly large wait queue of entities entirely uninterested in the
> event which will need to be traversed even if peer_wait wakeups will
> only happen if at least someone actually wants to be notified. This
> could be changed such that the struct unix_sock member is only put onto
> the peer's wait queue in poll and only if it hasn't already been put
> onto it. The connection could then be severed if some kind of
> 'disconnect' occurs.
> 
> The code below (again based on 3.2.54) is what I'm currently running and
> it has survived testing during the day (without trying the exercise in
> hexadecimal as that doesn't cause failure for me, anyway). The wakeup
> relaying function checks that a socket wait queue actually still exists
> because I used to get null pointers oopses without every now and then
> (I've also tested this with an additional printk printing 'no q' in case
> the pointer was actually null to verify that this really occurs here).
> 

Hi,

What about the following race?

1) thread A does poll() on f, finds the wakeup condition low, and adds
itself to the remote peer_wait queue.

2) thread B sets the wake up condition in dgram_recvmsg(), but does not
execute the wakeup of threads yet.

3) thread C also does a poll() on f, finds now that the wakeup condition
is set, and hence removes f from the remote peer_wait queue.

Then, thread A misses the POLLOUT, and hangs.

I understand your concern about POLLIN only waiters-I think we
could add the 'relay callback' in dgram_poll() only for those who are
looking for POLLOUT, and simply avoid the de-registration, as in practice
I think its unlikely we are going to have a socket switching from
POLLOUT to *only* POLLIN. I suspect that will cover most of the cases
that concern you?

Thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-12 Thread Rainer Weikusat
Jason Baron  writes:
> On 10/05/2015 12:31 PM, Rainer Weikusat wrote:

[...]

>> Here's a more simple idea which _might_ work. The underlying problem
>> seems to be that the second sock_poll_wait introduces a covert reference
>> to the peer socket which isn't accounted for. The basic idea behind this
>> is to execute an additional sock_hold for the peer whenever the
>> sock_poll_wait is called for it and store it (the struct sock *) in a
>> new struct unix_sock member.

[...]

> Interesting - will this work for the test case you supplied where we are in
> epoll_wait() and another thread does a connect() to a new target? In that
> case, even if we issue a wakeup to the epoll thread, its not going to have
> a non-NULL poll_table, so it wouldn't be added to the new target. IE
> the first test case here:
>
> https://lkml.org/lkml/2015/10/4/154

"Indeed it would not." I've also meanwhile found the time to check what
is and isn't locked here and found that Eric's "this looks racy" was
also justified. In theory, a clean solution could be based on modifying
the various polling implementations to keep a piece of data for a polled
something and provided that again on each subsequent poll call. This
could then be used to keep the peer socket alive for as long as
necessary were it possible to change the set of wait queues with every
poll call. Since this also isn't the case, the idea to increment the
reference count of the peer socket won't fly.

OTOH, something I seriously dislike about your relaying implementation
is the unconditional add_wait_queue upon connect as this builds up a
possibly large wait queue of entities entirely uninterested in the
event which will need to be traversed even if peer_wait wakeups will
only happen if at least someone actually wants to be notified. This
could be changed such that the struct unix_sock member is only put onto
the peer's wait queue in poll and only if it hasn't already been put
onto it. The connection could then be severed if some kind of
'disconnect' occurs.

The code below (again based on 3.2.54) is what I'm currently running and
it has survived testing during the day (without trying the exercise in
hexadecimal as that doesn't cause failure for me, anyway). The wakeup
relaying function checks that a socket wait queue actually still exists
because I used to get null pointers oopses without every now and then
(I've also tested this with an additional printk printing 'no q' in case
the pointer was actually null to verify that this really occurs here).

-
--- linux-2-6.b/net/unix/af_unix.c  2015-10-12 21:07:27.237102277 +0100
+++ linux-2-6/net/unix/af_unix.c2015-10-12 21:10:26.319384298 +0100
@@ -303,6 +303,51 @@ found:
return s;
 }
 
+static int unix_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+   void *key)
+{
+   struct unix_sock *u;
+   wait_queue_head_t *u_sleep;
+
+   u = container_of(q, struct unix_sock, peer_wake);
+
+   u_sleep = sk_sleep(>sk);
+   if (u_sleep)
+   wake_up_interruptible_poll(u_sleep, key);
+
+   return 0;
+}
+
+static void unix_peer_wake_connect(struct sock *me, struct sock *peer)
+{
+   struct unix_sock *u, *u_peer;
+
+   u = unix_sk(me);
+   u_peer = unix_sk(peer);
+
+   if (!u->peer_wake.private)
+   return;
+
+   u->peer_wake.private = peer;
+   add_wait_queue(_peer->peer_wait, >peer_wake);
+}
+
+static int unix_peer_wake_disconnect(struct sock *me, struct sock *peer)
+{
+   struct unix_sock *u, *u_peer;
+
+   u = unix_sk(me);
+   u_peer = unix_sk(peer);
+
+   if (u->peer_wake.private != peer)
+   return 0;
+
+   remove_wait_queue(_peer->peer_wait, >peer_wake);
+   u->peer_wake.private = NULL;
+
+   return 1;
+}
+
 static inline int unix_writable(struct sock *sk)
 {
return (atomic_read(>sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
@@ -409,6 +454,8 @@ static void unix_release_sock(struct soc
skpair->sk_state_change(skpair);
sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
}
+
+   unix_peer_wake_disconnect(sk, skpair);
sock_put(skpair); /* It may now die */
unix_peer(sk) = NULL;
}
@@ -630,6 +677,7 @@ static struct sock *unix_create1(struct
INIT_LIST_HEAD(>link);
mutex_init(>readlock); /* single task reading lock */
init_waitqueue_head(>peer_wait);
+   init_waitqueue_func_entry(>peer_wake, unix_peer_wake_relay);
unix_insert_socket(unix_sockets_unbound, sk);
 out:
if (sk == NULL)
@@ -953,7 +1001,7 @@ static int unix_dgram_connect(struct soc
struct sockaddr_un *sunaddr = (struct sockaddr_un *)addr;
struct sock *other;
unsigned hash;
-   int err;
+   int err, disconned;
 
if (addr->sa_family != AF_UNSPEC) {
err = 

Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-05 Thread Jason Baron
On 10/05/2015 12:31 PM, Rainer Weikusat wrote:
> Jason Baron  writes:
>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>> queue associated with the socket s that we are poll'ing against, but also 
>> calls
>> sock_poll_wait() for a remote peer socket p, if it is connected. Thus,
>> if we call poll()/select()/epoll() for the socket s, there are then
>> a couple of code paths in which the remote peer socket p and its associated
>> peer_wait queue can be freed before poll()/select()/epoll() have a chance
>> to remove themselves from the remote peer socket.
>>
>> The way that remote peer socket can be freed are:
>>
>> 1. If s calls connect() to a connect to a new socket other than p, it will
>> drop its reference on p, and thus a close() on p will free it.
>>
>> 2. If we call close on p(), then a subsequent sendmsg() from s, will drop
>> the final reference to p, allowing it to be freed.
> 
> Here's a more simple idea which _might_ work. The underlying problem
> seems to be that the second sock_poll_wait introduces a covert reference
> to the peer socket which isn't accounted for. The basic idea behind this
> is to execute an additional sock_hold for the peer whenever the
> sock_poll_wait is called for it and store it (the struct sock *) in a
> new struct unix_sock member. Upon entering unix_dgram_poll, if the
> member is not NULL, it's cleared and a sock_put for its former value is
> done. The 'poll peer not NULL -> sock_put it' code is also added to the
> destructor, although I'm unsure if this is really necessary. The patch
> below also includes the additional SOCK_DEAD test suggested by Martin as
> that seems generally sensible to me.
> 
> NB: This has survived both Martin's and my test programs for a number
> of executions/ longer periods of time than was common before without
> generating list corruption warnings. The patch below is against 'my'
> 3.2.54 and is here provided as a suggestion in the hope that it will be
> useful for someting, not as patch submission, as I spent less time
> thinking through this than I should ideally have but despite of this,
> it's another 2.5 hours of my life spent on something completely
> different than what I should be working on at the moment.
> 
> --
> diff -pru linux-2-6/include/net/af_unix.h linux-2-6.p/include/net/af_unix.h
> --- linux-2-6/include/net/af_unix.h   2014-01-20 21:52:53.0 +
> +++ linux-2-6.p/include/net/af_unix.h 2015-10-05 15:11:20.270958297 +0100
> @@ -50,6 +50,7 @@ struct unix_sock {
>   struct vfsmount *mnt;
>   struct mutexreadlock;
>   struct sock *peer;
> + struct sock *poll_peer;
>   struct sock *other;
>   struct list_headlink;
>   atomic_long_t   inflight;
> diff -pru linux-2-6/net/unix/af_unix.c linux-2-6.p/net/unix/af_unix.c
> --- linux-2-6/net/unix/af_unix.c  2014-01-22 16:51:52.0 +
> +++ linux-2-6.p/net/unix/af_unix.c2015-10-05 17:05:28.358273567 +0100
> @@ -361,6 +361,9 @@ static void unix_sock_destructor(struct
>   if (u->addr)
>   unix_release_addr(u->addr);
>  
> + if (u->poll_peer)
> + sock_put(u->poll_peer);
> +
>   atomic_long_dec(_nr_socks);
>   local_bh_disable();
>   sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> @@ -625,6 +628,7 @@ static struct sock *unix_create1(struct
>   u = unix_sk(sk);
>   u->dentry = NULL;
>   u->mnt= NULL;
> + u->poll_peer = NULL;
>   spin_lock_init(>lock);
>   atomic_long_set(>inflight, 0);
>   INIT_LIST_HEAD(>link);
> @@ -2135,8 +2139,16 @@ static unsigned int unix_poll(struct fil
>  static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>   poll_table *wait)
>  {
> - struct sock *sk = sock->sk, *other;
> - unsigned int mask, writable;
> + struct sock *sk = sock->sk, *other, *pp;
> + struct unix_sock *u;
> + unsigned int mask, writable, dead;
> +
> + u = unix_sk(sk);
> + pp = u->poll_peer;
> + if (pp) {
> + u->poll_peer = NULL;
> + sock_put(pp);
> + }
>  
>   sock_poll_wait(file, sk_sleep(sk), wait);
>   mask = 0;
> @@ -2170,7 +2182,20 @@ static unsigned int unix_dgram_poll(stru
>   other = unix_peer_get(sk);
>   if (other) {
>   if (unix_peer(other) != sk) {
> - sock_poll_wait(file, _sk(other)->peer_wait, wait);
> + unix_state_lock(other);
> +
> + dead = sock_flag(other, SOCK_DEAD);
> + if (!dead)
> + sock_poll_wait(file, _sk(other)->peer_wait,
> +wait);
> +
> + unix_state_unlock(other);
> +
> + if (!dead) {
> + u->poll_peer = other;
> + 

Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-05 Thread Eric Dumazet
On Mon, 2015-10-05 at 17:31 +0100, Rainer Weikusat wrote:

>   atomic_long_set(>inflight, 0);
>   INIT_LIST_HEAD(>link);
> @@ -2135,8 +2139,16 @@ static unsigned int unix_poll(struct fil
>  static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>   poll_table *wait)
>  {
> - struct sock *sk = sock->sk, *other;
> - unsigned int mask, writable;
> + struct sock *sk = sock->sk, *other, *pp;
> + struct unix_sock *u;
> + unsigned int mask, writable, dead;
> +
> + u = unix_sk(sk);
> + pp = u->poll_peer;
> + if (pp) {
> + u->poll_peer = NULL;
> + sock_put(pp);
> + }


This looks racy.
Multiple threads could use poll() at the same time,
and you would have too many sock_put()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-05 Thread Rainer Weikusat
Eric Dumazet  writes:
> On Mon, 2015-10-05 at 17:31 +0100, Rainer Weikusat wrote:
>
>>  atomic_long_set(>inflight, 0);
>>  INIT_LIST_HEAD(>link);
>> @@ -2135,8 +2139,16 @@ static unsigned int unix_poll(struct fil
>>  static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>>  poll_table *wait)
>>  {
>> -struct sock *sk = sock->sk, *other;
>> -unsigned int mask, writable;
>> +struct sock *sk = sock->sk, *other, *pp;
>> +struct unix_sock *u;
>> +unsigned int mask, writable, dead;
>> +
>> +u = unix_sk(sk);
>> +pp = u->poll_peer;
>> +if (pp) {
>> +u->poll_peer = NULL;
>> +sock_put(pp);
>> +}
>
>
> This looks racy.
> Multiple threads could use poll() at the same time,
> and you would have too many sock_put()

That's one of the reasons why I wrote "might work": The use of a single
structure member without any locking for the sock_poll_wait suggests
that this is taken care of in some other way, as does the absence of any
comment about that in the 'public' LDDs ("Linux Device Drivers"),
however, I don't really know if this is true. If not, this simple idea
can't work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-04 Thread Rainer Weikusat
Rainer Weikusat  writes:

> Mathias Krause  writes:
>> On 2 October 2015 at 22:43, Jason Baron  wrote:
>>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>>> queue associated with the socket s that we are poll'ing against, but also 
>>> calls
>
> [useless full-quote removed]
>
>> My reproducer runs on this patch for more than 3 days now without
>> triggering anything anymore.
>
> Since the behaviour of your program is random, using it to "test"
> anything doesn't really provide any insight: It could have been
> executing the same codepath which doesn't happen to trigger any problems
> for all of these three days. Nobody can tell.

Since this "strangely" seems to have been lost in the thread: Here's the
test program showing that the reconnect while in epoll actually causes a
problem (at least I think so):


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int sk, tg0, tg1;

static void *epoller(void *unused)
{
struct epoll_event epev;
int epfd;

epfd = epoll_create(1);
if (epfd == -1) exit(0);

epev.events = EPOLLOUT;
epoll_ctl(epfd, EPOLL_CTL_ADD, sk, );
epoll_wait(epfd, , 1, 5000);

close(sk);

execl("./a.out", "./a.out", (void *)0);

return NULL;
}

int main(void)
{
struct sockaddr_un sun;
pthread_t tid;
int rc;

sun.sun_family = AF_UNIX;

tg0 = socket(AF_UNIX, SOCK_DGRAM, 0);
strncpy(sun.sun_path, "/tmp/tg0", sizeof(sun.sun_path));
unlink(sun.sun_path);
bind(tg0, (struct sockaddr *), sizeof(sun));

tg1 = socket(AF_UNIX, SOCK_DGRAM, 0);
strncpy(sun.sun_path, "/tmp/tg1", sizeof(sun.sun_path));
unlink(sun.sun_path);
bind(tg1, (struct sockaddr *), sizeof(sun));

sk = socket(AF_UNIX, SOCK_DGRAM, 0);
connect(sk, (struct sockaddr *), sizeof(sun));

fcntl(sk, F_SETFL, fcntl(sk, F_GETFL) | O_NONBLOCK);

while ((rc = write(sk, "bla", 3)) != -1);

pthread_create(, NULL, epoller, NULL);

usleep(5);

strncpy(sun.sun_path, "/tmp/tg0", sizeof(sun.sun_path));
connect(sk, (struct sockaddr *), sizeof(sun));
close(tg1);

pause();

return 0;
}
--

And here the other demonstrating the poller not being woken up despite
it could write something:

--
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(void)
{
struct sockaddr_un sun;
struct pollfd pfd;
int tg, sk0, sk1, rc;
char buf[16];

sun.sun_family = AF_UNIX;

tg = socket(AF_UNIX, SOCK_DGRAM, 0);
strncpy(sun.sun_path, "/tmp/tg", sizeof(sun.sun_path));
unlink(sun.sun_path);
bind(tg, (struct sockaddr *), sizeof(sun));

sk0 = socket(AF_UNIX, SOCK_DGRAM, 0);
connect(sk0, (struct sockaddr *), sizeof(sun));

sk1 = socket(AF_UNIX, SOCK_DGRAM, 0);
connect(sk1, (struct sockaddr *), sizeof(sun));

fcntl(sk0, F_SETFL, fcntl(sk0, F_GETFL) | O_NONBLOCK);
fcntl(sk1, F_SETFL, fcntl(sk1, F_GETFL) | O_NONBLOCK);

while (write(sk0, "bla", 3) != -1);

if (fork() == 0) {
pfd.fd = sk1;
pfd.events = POLLOUT;
rc = poll(, 1, -1);

_exit(0);
}

sleep(3);
read(tg, buf, sizeof(buf));
wait();

return 0;
}
---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-03 Thread Rainer Weikusat
Mathias Krause  writes:
> On 2 October 2015 at 22:43, Jason Baron  wrote:
>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>> queue associated with the socket s that we are poll'ing against, but also 
>> calls

[useless full-quote removed]

> My reproducer runs on this patch for more than 3 days now without
> triggering anything anymore.

Since the behaviour of your program is random, using it to "test"
anything doesn't really provide any insight: It could have been
executing the same codepath which doesn't happen to trigger any problems
for all of these three days. Nobody can tell.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-02 Thread Jason Baron
The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
queue associated with the socket s that we are poll'ing against, but also calls
sock_poll_wait() for a remote peer socket p, if it is connected. Thus,
if we call poll()/select()/epoll() for the socket s, there are then
a couple of code paths in which the remote peer socket p and its associated
peer_wait queue can be freed before poll()/select()/epoll() have a chance
to remove themselves from the remote peer socket.

The way that remote peer socket can be freed are:

1. If s calls connect() to a connect to a new socket other than p, it will
drop its reference on p, and thus a close() on p will free it.

2. If we call close on p(), then a subsequent sendmsg() from s, will drop
the final reference to p, allowing it to be freed.

Address this issue, by reverting unix_dgram_poll() to only register with
the wait queue associated with s and register a callback with the remote peer
socket on connect() that will wake up the wait queue associated with s. If
scenarios 1 or 2 occur above we then simply remove the callback from the
remote peer. This then presents the expected semantics to poll()/select()/
epoll().

I've implemented this for sock-type, SOCK_RAW, SOCK_DGRAM, and SOCK_SEQPACKET
but not for SOCK_STREAM, since SOCK_STREAM does not use unix_dgram_poll().

Signed-off-by: Jason Baron 
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c| 32 +++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 4a167b3..9698aff 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -62,6 +62,7 @@ struct unix_sock {
 #define UNIX_GC_CANDIDATE  0
 #define UNIX_GC_MAYBE_CYCLE1
struct socket_wqpeer_wq;
+   wait_queue_twait;
 };
 #define unix_sk(__sk) ((struct unix_sock *)__sk)
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 03ee4d3..f789423 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -420,6 +420,9 @@ static void unix_release_sock(struct sock *sk, int embrion)
skpair = unix_peer(sk);
 
if (skpair != NULL) {
+   if (sk->sk_type != SOCK_STREAM)
+   remove_wait_queue(_sk(skpair)->peer_wait,
+ >wait);
if (sk->sk_type == SOCK_STREAM || sk->sk_type == 
SOCK_SEQPACKET) {
unix_state_lock(skpair);
/* No more writes */
@@ -636,6 +639,16 @@ static struct proto unix_proto = {
  */
 static struct lock_class_key af_unix_sk_receive_queue_lock_key;
 
+static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+   struct unix_sock *u;
+
+   u = container_of(wait, struct unix_sock, wait);
+   wake_up_interruptible_sync_poll(sk_sleep(>sk), key);
+
+   return 0;
+}
+
 static struct sock *unix_create1(struct net *net, struct socket *sock, int 
kern)
 {
struct sock *sk = NULL;
@@ -664,6 +677,7 @@ static struct sock *unix_create1(struct net *net, struct 
socket *sock, int kern)
INIT_LIST_HEAD(>link);
mutex_init(>readlock); /* single task reading lock */
init_waitqueue_head(>peer_wait);
+   init_waitqueue_func_entry(>wait, peer_wake);
unix_insert_socket(unix_sockets_unbound(sk), sk);
 out:
if (sk == NULL)
@@ -1030,7 +1044,11 @@ restart:
 */
if (unix_peer(sk)) {
struct sock *old_peer = unix_peer(sk);
+
+   remove_wait_queue(_sk(old_peer)->peer_wait,
+ _sk(sk)->wait);
unix_peer(sk) = other;
+   add_wait_queue(_sk(other)->peer_wait, _sk(sk)->wait);
unix_state_double_unlock(sk, other);
 
if (other != old_peer)
@@ -1038,8 +1056,12 @@ restart:
sock_put(old_peer);
} else {
unix_peer(sk) = other;
+   add_wait_queue(_sk(other)->peer_wait, _sk(sk)->wait);
unix_state_double_unlock(sk, other);
}
+   /* New remote may have created write space for us */
+   wake_up_interruptible_sync_poll(sk_sleep(sk),
+   POLLOUT | POLLWRNORM | POLLWRBAND);
return 0;
 
 out_unlock:
@@ -1194,6 +1216,8 @@ restart:
 
sock_hold(sk);
unix_peer(newsk)= sk;
+   if (sk->sk_type == SOCK_SEQPACKET)
+   add_wait_queue(_sk(sk)->peer_wait, _sk(newsk)->wait);
newsk->sk_state = TCP_ESTABLISHED;
newsk->sk_type  = sk->sk_type;
init_peercred(newsk);
@@ -1220,6 +1244,8 @@ restart:
 
smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */
unix_peer(sk)   = newsk;
+   if (sk->sk_type == SOCK_SEQPACKET)
+   add_wait_queue(_sk(newsk)->peer_wait, _sk(sk)->wait);
 
unix_state_unlock(sk);
 
@@ -1254,6 +1280,10 @@ static 

Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-02 Thread Mathias Krause
On 2 October 2015 at 22:43, Jason Baron  wrote:
> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
> queue associated with the socket s that we are poll'ing against, but also 
> calls
> sock_poll_wait() for a remote peer socket p, if it is connected. Thus,
> if we call poll()/select()/epoll() for the socket s, there are then
> a couple of code paths in which the remote peer socket p and its associated
> peer_wait queue can be freed before poll()/select()/epoll() have a chance
> to remove themselves from the remote peer socket.
>
> The way that remote peer socket can be freed are:
>
> 1. If s calls connect() to a connect to a new socket other than p, it will
> drop its reference on p, and thus a close() on p will free it.
>
> 2. If we call close on p(), then a subsequent sendmsg() from s, will drop
> the final reference to p, allowing it to be freed.
>
> Address this issue, by reverting unix_dgram_poll() to only register with
> the wait queue associated with s and register a callback with the remote peer
> socket on connect() that will wake up the wait queue associated with s. If
> scenarios 1 or 2 occur above we then simply remove the callback from the
> remote peer. This then presents the expected semantics to poll()/select()/
> epoll().
>
> I've implemented this for sock-type, SOCK_RAW, SOCK_DGRAM, and SOCK_SEQPACKET
> but not for SOCK_STREAM, since SOCK_STREAM does not use unix_dgram_poll().
>
> Signed-off-by: Jason Baron 
> ---
>  include/net/af_unix.h |  1 +
>  net/unix/af_unix.c| 32 +++-
>  2 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 4a167b3..9698aff 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -62,6 +62,7 @@ struct unix_sock {
>  #define UNIX_GC_CANDIDATE  0
>  #define UNIX_GC_MAYBE_CYCLE1
> struct socket_wqpeer_wq;
> +   wait_queue_twait;
>  };
>  #define unix_sk(__sk) ((struct unix_sock *)__sk)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..f789423 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -420,6 +420,9 @@ static void unix_release_sock(struct sock *sk, int 
> embrion)
> skpair = unix_peer(sk);
>
> if (skpair != NULL) {
> +   if (sk->sk_type != SOCK_STREAM)
> +   remove_wait_queue(_sk(skpair)->peer_wait,
> + >wait);
> if (sk->sk_type == SOCK_STREAM || sk->sk_type == 
> SOCK_SEQPACKET) {
> unix_state_lock(skpair);
> /* No more writes */
> @@ -636,6 +639,16 @@ static struct proto unix_proto = {
>   */
>  static struct lock_class_key af_unix_sk_receive_queue_lock_key;
>
> +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +   struct unix_sock *u;
> +
> +   u = container_of(wait, struct unix_sock, wait);
> +   wake_up_interruptible_sync_poll(sk_sleep(>sk), key);
> +
> +   return 0;
> +}
> +
>  static struct sock *unix_create1(struct net *net, struct socket *sock, int 
> kern)
>  {
> struct sock *sk = NULL;
> @@ -664,6 +677,7 @@ static struct sock *unix_create1(struct net *net, struct 
> socket *sock, int kern)
> INIT_LIST_HEAD(>link);
> mutex_init(>readlock); /* single task reading lock */
> init_waitqueue_head(>peer_wait);
> +   init_waitqueue_func_entry(>wait, peer_wake);
> unix_insert_socket(unix_sockets_unbound(sk), sk);
>  out:
> if (sk == NULL)
> @@ -1030,7 +1044,11 @@ restart:
>  */
> if (unix_peer(sk)) {
> struct sock *old_peer = unix_peer(sk);
> +
> +   remove_wait_queue(_sk(old_peer)->peer_wait,
> + _sk(sk)->wait);
> unix_peer(sk) = other;
> +   add_wait_queue(_sk(other)->peer_wait, 
> _sk(sk)->wait);
> unix_state_double_unlock(sk, other);
>
> if (other != old_peer)
> @@ -1038,8 +1056,12 @@ restart:
> sock_put(old_peer);
> } else {
> unix_peer(sk) = other;
> +   add_wait_queue(_sk(other)->peer_wait, 
> _sk(sk)->wait);
> unix_state_double_unlock(sk, other);
> }
> +   /* New remote may have created write space for us */
> +   wake_up_interruptible_sync_poll(sk_sleep(sk),
> +   POLLOUT | POLLWRNORM | POLLWRBAND);
> return 0;
>
>  out_unlock:
> @@ -1194,6 +1216,8 @@ restart:
>
> sock_hold(sk);
> unix_peer(newsk)= sk;
> +   if (sk->sk_type == SOCK_SEQPACKET)
> +   add_wait_queue(_sk(sk)->peer_wait, 
> _sk(newsk)->wait);
> newsk->sk_state = TCP_ESTABLISHED;
> newsk->sk_type  = sk->sk_type;
> init_peercred(newsk);
> @@ -1220,6