RE: [PATCH net] udp: fix dst races with multicast early demux

2015-08-03 Thread Gregory Hoggarth
Hi,

I have included this patch into my code and re-run our tests overnight, out of 
644 iterations we did not see the kernel crash. Previous reproduction rate we 
would have expected 4-6 crashes in this time.

So I think this fixes the issue we are seeing.

Thanks,
Greg


From: netdev-ow...@vger.kernel.org netdev-ow...@vger.kernel.org on behalf of 
Eric Dumazet eric.duma...@gmail.com
Sent: Saturday, 1 August 2015 10:14 p.m.
To: Gregory Hoggarth
Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com; Michal Kubeček
Subject: [PATCH net] udp: fix dst races with multicast early demux

From: Eric Dumazet eduma...@google.com

Multicast dst are not cached. They carry DST_NOCACHE.

As mentioned in commit f8864972126899 (ipv4: fix dst race in
sk_dst_get()), these dst need special care before caching them
into a socket.

Caching them is allowed only if their refcnt was not 0, ie we
must use atomic_inc_not_zero()

Also, we must use READ_ONCE() to fetch sk-sk_rx_dst, as mentioned
in commit d0c294c53a771 (tcp: prevent fetching dst twice in early demux
code)

Fixes: 421b3885bf6d (udp: ipv4: Add udp early demux)
Signed-off-by: Eric Dumazet eduma...@google.com
Reported-by: Gregory Hoggarth gregory.hogga...@alliedtelesis.co.nz
Reported-by: Alex Gartrell agartr...@fb.com
Cc: Michal Kubeček mkube...@suse.cz
---
David : I will be on vacation for following 7 days, no internet access.
Please wait for tests done by Gregory  Alex before merging this ?
Thanks !

 net/ipv4/udp.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604f9273..1b8c5ba7d5f7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1995,12 +1995,19 @@ void udp_v4_early_demux(struct sk_buff *skb)

skb-sk = sk;
skb-destructor = sock_efree;
-   dst = sk-sk_rx_dst;
+   dst = READ_ONCE(sk-sk_rx_dst);

if (dst)
dst = dst_check(dst, 0);
-   if (dst)
-   skb_dst_set_noref(skb, dst);
+   if (dst) {
+   /* DST_NOCACHE can not be used without taking a reference */
+   if (dst-flags  DST_NOCACHE) {
+   if (likely(atomic_inc_not_zero(dst-__refcnt)))
+   skb_dst_set(skb, dst);
+   } else {
+   skb_dst_set_noref(skb, dst);
+   }
+   }
 }

 int udp_rcv(struct sk_buff *skb)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

2015-08-02 Thread Gregory Hoggarth
Hi Eric,

The previous fix, which included the change to __udp4_lib_rcv did not fix the 
issue. The reproduction rate may perhaps have been slightly lower with that fix 
in place, hard to tell as our automated tests unfortunately failed after 600 
runs due to another issue and didn't automatically resume themselves (3 kernel 
crash observed, so 1/200 instead of previous 1/100).

I will apply the new suggested patch, reverting previous patch, and test 
overnight and update tomorrow.

Thanks,
Greg


From: Eric Dumazet eric.duma...@gmail.com
Sent: Friday, 31 July 2015 11:17 p.m.
To: Gregory Hoggarth
Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com
Subject: Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

On Fri, 2015-07-31 at 12:17 +0200, Eric Dumazet wrote:
 On Thu, 2015-07-30 at 22:29 +, Gregory Hoggarth wrote:
  Hi Eric,
 
  Thanks for the patch. I have put this into our code and we'll run our test 
  over tonight and the weekend and report back on Monday.
 
  Thanks,

 Oh well, real fix will be :

 (Or at least test if DST_NOCACHE is set : those dst need special care)

Here is a more complete patch, not destroying performance of unicast
UDP.

Please guys test it, I am confident it should solve your problems.

Thanks

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604f9273..1b8c5ba7d5f7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1995,12 +1995,19 @@ void udp_v4_early_demux(struct sk_buff *skb)

skb-sk = sk;
skb-destructor = sock_efree;
-   dst = sk-sk_rx_dst;
+   dst = READ_ONCE(sk-sk_rx_dst);

if (dst)
dst = dst_check(dst, 0);
-   if (dst)
-   skb_dst_set_noref(skb, dst);
+   if (dst) {
+   /* DST_NOCACHE can not be used without taking a reference */
+   if (dst-flags  DST_NOCACHE) {
+   if (likely(atomic_inc_not_zero(dst-__refcnt)))
+   skb_dst_set(skb, dst);
+   } else {
+   skb_dst_set_noref(skb, dst);
+   }
+   }
 }

 int udp_rcv(struct sk_buff *skb)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

2015-07-30 Thread Gregory Hoggarth
Hi Eric,

Thanks for the patch. I have put this into our code and we'll run our test over 
tonight and the weekend and report back on Monday.

Thanks,
Greg


From: Eric Dumazet eric.duma...@gmail.com
Sent: Thursday, 30 July 2015 5:45 p.m.
To: Gregory Hoggarth
Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com
Subject: Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

On Thu, 2015-07-30 at 07:42 +0200, Eric Dumazet wrote:
 On Thu, 2015-07-30 at 01:41 +, Gregory Hoggarth wrote:
  Hi,
 
  My company has also started having what appears to be the same problem, 
  since we upgraded our embedded system to
  linux kernel 3.16.
 
  I tried applying the suggested fix of READ_ONCE (and also had to add in the 
  necessary code to compiler.h as 3.16
  didn't have it) and unfortunately it did not fix the issue at all.
 
  Unfortunately we do not have an easy reproduction method, and do not know 
  precisely what is going on in the system
  when the issue occurs. We know it is a multicast UDP packet but that is 
  about it. For us, the crash happens during
  a critical stage in our system initialisation, making additional debugging 
  and instrumentation difficult. Our
  reproduction rate is approximately 1 out of 100 test runs; testing 
  overnight we will usually see 3-5 instances of
  the crash happening. All our attempts to increase the reproduction rate, or 
  reproduce the issue in a simpler/more
  controlled way have failed.
 
  Because we have customised the linux kernel, in some places radically, we 
  assumed this was just a problem only we
  were seeing, so we were trying to fix it ourselves. Now that this appears 
  to be a generic problem upstream, we've
  simply disabled UDP early demux in our system (since it's a new 
  optimisation that we have lived without up till
  now) and will wait for this issue to be fixed upstream instead.
 
 
  So I'm sharing the debug patch I've written to help gather data on what is 
  going on in the system, and some
  of the output we've gotten from the debug, in case this is useful for 
  anyone else who is seeing this problem or
  would like to try and fix it.
 
  Feel free to ask questions, I'm not sure how much help I can be but will do 
  my best. We'll be happy to assist in
  testing any proposed fixes. I also have some more examples of kernel oops 
  and debug output if that could be useful,
  although the debug is from earlier iterations of the patch so that 
  historical output is not as detailed as the
  output generated by the latest version of the patch attached here.
 
  Thanks,
  Greg Hoggarth

 CC UDP early demux author : Shawn Bohrer

 I believe this is a race condition with a dst escaping RCU protected
 region.

 I will send a patch.


Please try following fixes :

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604f9273..02baaa6d97b3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1778,9 +1778,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
struct dst_entry *dst = skb_dst(skb);
int ret;

-   if (unlikely(sk-sk_rx_dst != dst))
+   if (unlikely(sk-sk_rx_dst != dst)) {
+   skb_dst_force(skb);
udp_sk_rx_dst_set(sk, dst);
-
+   }
ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value  0 means to resubmit the input, but
@@ -1995,7 +1996,7 @@ void udp_v4_early_demux(struct sk_buff *skb)

skb-sk = sk;
skb-destructor = sock_efree;
-   dst = sk-sk_rx_dst;
+   dst = READ_ONCE(sk-sk_rx_dst);

if (dst)
dst = dst_check(dst, 0);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

2015-07-29 Thread Gregory Hoggarth
Hi,

My company has also started having what appears to be the same problem, since 
we upgraded our embedded system to 
linux kernel 3.16.

I tried applying the suggested fix of READ_ONCE (and also had to add in the 
necessary code to compiler.h as 3.16
didn't have it) and unfortunately it did not fix the issue at all.

Unfortunately we do not have an easy reproduction method, and do not know 
precisely what is going on in the system
when the issue occurs. We know it is a multicast UDP packet but that is about 
it. For us, the crash happens during
a critical stage in our system initialisation, making additional debugging and 
instrumentation difficult. Our 
reproduction rate is approximately 1 out of 100 test runs; testing overnight we 
will usually see 3-5 instances of 
the crash happening. All our attempts to increase the reproduction rate, or 
reproduce the issue in a simpler/more 
controlled way have failed.

Because we have customised the linux kernel, in some places radically, we 
assumed this was just a problem only we 
were seeing, so we were trying to fix it ourselves. Now that this appears to be 
a generic problem upstream, we've 
simply disabled UDP early demux in our system (since it's a new optimisation 
that we have lived without up till 
now) and will wait for this issue to be fixed upstream instead.


So I'm sharing the debug patch I've written to help gather data on what is 
going on in the system, and some
of the output we've gotten from the debug, in case this is useful for anyone 
else who is seeing this problem or
would like to try and fix it.

Feel free to ask questions, I'm not sure how much help I can be but will do my 
best. We'll be happy to assist in
testing any proposed fixes. I also have some more examples of kernel oops and 
debug output if that could be useful, 
although the debug is from earlier iterations of the patch so that historical 
output is not as detailed as the 
output generated by the latest version of the patch attached here.

Thanks,
Greg HoggarthFrom: Gregory Hoggarth gregory.hogga...@alliedtelesis.co.nz

dst tracking debug patch

The debugging works by adding poisoned function pointers to a dst structure
when it is cleaned up by the garbage collector. Then, at several points where
these pointers are de-referenced, check if they hold the poisoned value. If so,
it means the dst that is trying to be referenced has gone through the garbage
collector, ie a 'use-after-free' scenario. Instead of letting the system access
the garbage pointer and crash, pre-empty the de-reference and panic allowing us
to print out the debug information.

At many places in the code where dst structures are manipulated, add tracking
information about the current dst structure to an external array that can store
100 dst structures. This method of storing the data is not actually very safe,
as each new dst structure allocated will consume a slot in the array. If many
dst structures are created quickly, this will quickly wrap-around and destroy
valid data. A linked-list implementation would be better.
---

 include/net/dst.h  |  145 +---
 include/net/sock.h |2 +
 net/core/dst.c |  109 +++
 net/ipv4/udp.c |2 +
 4 files changed, 248 insertions(+), 10 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 273c15b..f746999 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -103,8 +103,106 @@ struct dst_entry {
 		struct rt6_info		*rt6_next;
 		struct dn_route __rcu	*dn_next;
 	};
+
+	// dst tracking ID
+	int track_id;
+};
+
+#include linux/fs.h
+#include asm/uaccess.h
+#include linux/buffer_head.h
+#include linux/time.h
+
+#define DST_LIST_SZ 25
+extern int dst_latest;
+static char dst_filename[50];
+
+struct dst_track {
+	unsigned long address[DST_LIST_SZ];
+	unsigned int action[DST_LIST_SZ];
+	unsigned int refCount[DST_LIST_SZ];
+	struct timespec tvs[DST_LIST_SZ];
+
+	unsigned int index;
+	unsigned int reuseMagic;
 };
 
+extern struct dst_track track_array[100];
+
+static char *dst_list_action[16] =
+{
+	Free, Hold, Clon, Aloc, AlNr,
+	ReAl, ReNr, Grbg, NUse, NoRf,
+	Chld, Zero, SkBf, Sach, Gotn,
+	NSwp
+};
+
+struct file* file_open1(const char* path, int flags, int rights);
+void file_close1(struct file* file);
+int file_write1(struct file* file, unsigned long long offset, unsigned char* data, unsigned int size);
+
+#define DST_TRACKING(TYPE)		\
+do {	\
+	struct dst_track *track = track_array[dst-track_id];		\
+	int index = track-index;	\
+	if (index  DST_LIST_SZ) 	\
+	{\
+		track-address[index] = \
+			(unsigned long) __builtin_return_address(0);	\
+		track-action[index] = (TYPE);\
+		track-refCount[index] = dst-__refcnt.counter;		\
+		getrawmonotonic (track-tvs[index]);			\
+		track-index++;		\
+	}\
+} while (0)
+
+#define DST_PANIC			\
+do {	\
+	int k, i, len = 0;		\
+	struct dst_track *track