Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-30 Thread Eliezer Tamir

On 29/05/2013 21:52, Or Gerlitz wrote:

Eliezer Tamir  wrote:

Or Gerlitz wrote:



Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?



Nothing will break if you poll on the wrong queue.
Your data will come through normal NAPI processing of the right queue.


Can you elaborate a little further, why you call this "wrong" and "right"?


Right == the queue the packets arrive on.
Wrong == any other queue.

BTW, if you have an application that receives UDP data to an unbound 
socket, wouldn't it be better in any case to steer all of the incoming 
packets for this UDP socket to a single queue disregarding the source 
address? (Can't your hardware do that?)


The general approach is that userspace needs to make sure that threads, 
connections and IRQs are bound to the right CPUs.


-Eliezer

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-30 Thread Eliezer Tamir

On 29/05/2013 21:52, Or Gerlitz wrote:

Eliezer Tamir eliezer.ta...@linux.intel.com wrote:

Or Gerlitz wrote:



Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?



Nothing will break if you poll on the wrong queue.
Your data will come through normal NAPI processing of the right queue.


Can you elaborate a little further, why you call this wrong and right?


Right == the queue the packets arrive on.
Wrong == any other queue.

BTW, if you have an application that receives UDP data to an unbound 
socket, wouldn't it be better in any case to steer all of the incoming 
packets for this UDP socket to a single queue disregarding the source 
address? (Can't your hardware do that?)


The general approach is that userspace needs to make sure that threads, 
connections and IRQs are bound to the right CPUs.


-Eliezer

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 22:08, Eric Dumazet wrote:

On Wed, 2013-05-29 at 21:52 +0300, Or Gerlitz wrote:

Eliezer Tamir  wrote:

Or Gerlitz wrote:



Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?



Nothing will break if you poll on the wrong queue.
Your data will come through normal NAPI processing of the right queue.


Can you elaborate a little further, why you call this "wrong" and "right"?
--


This definitely need some documentation, because before llpoll, device
RX path was serviced by the cpu receiving the harwdare interrupt.

So the "wrong" queue could add false sharing, and wrong NUMA
allocations.


Yes,
To work properly when you have more than one NUMA node, you have to have 
packet steering set up, either by your NIC or by HW accelerated RFS.


I would like to add a short writeup of the design and suggested 
configuration. Where should it go?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Ben Hutchings
On Wed, 2013-05-29 at 17:14 +0300, Or Gerlitz wrote:
> On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
>  wrote:
> > Adds a new ndo_ll_poll method and the code that supports and uses it.
> > This method can be used by low latency applications to busy poll Ethernet
> > device queues directly from the socket code. The value of sysctl_net_ll_poll
> > controls how many microseconds to poll. Set to zero to disable.
> 
> Unlike with TCP sockets, UDP sockets may receive packets from multiple
> sources and hence the receiving context may be steered to be executed
> on different cores through RSS or other Flow-Steering HW mechanisms
> which could mean different napi contexts for the same socket, is that
> a problem here? what's the severity?

Maybe ARFS could be extended so the driver can tell whether a UDP socket
it's steering for is connected or not.  Then for disconnected sockets
the driver can use a filter that only matches destination address.
(Though that's probably undesirable if the socket has SO_REUSEPORT set.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eric Dumazet
On Wed, 2013-05-29 at 21:52 +0300, Or Gerlitz wrote:
> Eliezer Tamir  wrote:
> > Or Gerlitz wrote:
> 
> >> Unlike with TCP sockets, UDP sockets may receive packets from multiple
> >> sources and hence the receiving context may be steered to be executed
> >> on different cores through RSS or other Flow-Steering HW mechanisms
> >> which could mean different napi contexts for the same socket, is that
> >> a problem here? what's the severity?
> 
> > Nothing will break if you poll on the wrong queue.
> > Your data will come through normal NAPI processing of the right queue.
> 
> Can you elaborate a little further, why you call this "wrong" and "right"?
> --

This definitely need some documentation, because before llpoll, device
RX path was serviced by the cpu receiving the harwdare interrupt.

So the "wrong" queue could add false sharing, and wrong NUMA
allocations.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Or Gerlitz
Eliezer Tamir  wrote:
> Or Gerlitz wrote:

>> Unlike with TCP sockets, UDP sockets may receive packets from multiple
>> sources and hence the receiving context may be steered to be executed
>> on different cores through RSS or other Flow-Steering HW mechanisms
>> which could mean different napi contexts for the same socket, is that
>> a problem here? what's the severity?

> Nothing will break if you poll on the wrong queue.
> Your data will come through normal NAPI processing of the right queue.

Can you elaborate a little further, why you call this "wrong" and "right"?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 17:20, yaniv saar wrote:

Hi Eliezer,

(If I'm too late then a future note...)
Why make polling a system-wide configuration?
Wouldn't it make more sense to implement a sock option?
An even better solution might be aggregation/combination of both types of
configurations.

-- Yaniv Sa'ar


We plan on adding a socket option in the future.

-Eliezer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 17:14, Or Gerlitz wrote:

On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
 wrote:

Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll Ethernet
device queues directly from the socket code. The value of sysctl_net_ll_poll
controls how many microseconds to poll. Set to zero to disable.


Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?


Nothing will break if you poll on the wrong queue.
Your data will come through normal NAPI processing of the right queue.

One of the things we plan on adding in the next version is a more fine 
grained control over which sockets get to busy poll.


-Eliezer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread yaniv saar
Hi Eliezer,

(If I'm too late then a future note...)
Why make polling a system-wide configuration?
Wouldn't it make more sense to implement a sock option?
An even better solution might be aggregation/combination of both types
of configurations.

-- Yaniv Sa'ar


On Wed, May 29, 2013 at 5:14 PM, Or Gerlitz  wrote:
> On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
>  wrote:
>> Adds a new ndo_ll_poll method and the code that supports and uses it.
>> This method can be used by low latency applications to busy poll Ethernet
>> device queues directly from the socket code. The value of sysctl_net_ll_poll
>> controls how many microseconds to poll. Set to zero to disable.
>
> Unlike with TCP sockets, UDP sockets may receive packets from multiple
> sources and hence the receiving context may be steered to be executed
> on different cores through RSS or other Flow-Steering HW mechanisms
> which could mean different napi contexts for the same socket, is that
> a problem here? what's the severity?
>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Or Gerlitz
On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
 wrote:
> Adds a new ndo_ll_poll method and the code that supports and uses it.
> This method can be used by low latency applications to busy poll Ethernet
> device queues directly from the socket code. The value of sysctl_net_ll_poll
> controls how many microseconds to poll. Set to zero to disable.

Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 16:37, Eric Dumazet wrote:

On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:



+static inline unsigned long ll_end_time(void)
+{
+   return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
+}


This can overflow.

Multiply is giving 32bits, as tsc_khz is an int, and sysctl_net_ll_poll
is an int.

unsigned long sysctl_net_ll_poll ?

OK


Also, if we want this to work on i386, the correct type to use for
ll_end_time(void) would be cycles_t


OK
I would be really surprised if someone uses this on an i386, but I guess 
you never know.


Thanks!
-Eliezer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eric Dumazet
On Wed, 2013-05-29 at 14:42 +0100, David Laight wrote:
> > > +/* we don't mind a ~2.5% imprecision */
> > > +#define TSC_MHZ (tsc_khz >> 10)
> 
> Wouldn't (tsc_khz << 10) be better?

We want number of cycles per usec.

Your formula gives number of cycles per 1.024 second.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread David Laight
> > +/* we don't mind a ~2.5% imprecision */
> > +#define TSC_MHZ (tsc_khz >> 10)

Wouldn't (tsc_khz << 10) be better?

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eric Dumazet
On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:

> +/* we don't mind a ~2.5% imprecision */
> +#define TSC_MHZ (tsc_khz >> 10)
> +
> +static inline unsigned long ll_end_time(void)
> +{
> + return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
> +}static inline unsigned long ll_end_time(void)
>+{
>+  return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
>+}

This can overflow.

Multiply is giving 32bits, as tsc_khz is an int, and sysctl_net_ll_poll
is an int.

unsigned long sysctl_net_ll_poll ?

Also, if we want this to work on i386, the correct type to use for
ll_end_time(void) would be cycles_t



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir
Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll Ethernet
device queues directly from the socket code. The value of sysctl_net_ll_poll
controls how many microseconds to poll. Set to zero to disable.

Signed-off-by: Alexander Duyck 
Signed-off-by: Jesse Brandeburg 
Tested-by: Willem de Bruijn 
Signed-off-by: Eliezer Tamir 
---

 Documentation/sysctl/net.txt |7 ++
 fs/select.c  |7 ++
 include/linux/netdevice.h|3 +
 include/linux/skbuff.h   |8 ++-
 include/net/ll_poll.h|  126 ++
 include/net/sock.h   |4 +
 include/uapi/linux/snmp.h|1 
 net/Kconfig  |   12 
 net/core/datagram.c  |4 +
 net/core/skbuff.c|4 +
 net/core/sock.c  |6 ++
 net/core/sysctl_net_core.c   |   10 +++
 net/ipv4/proc.c  |1 
 net/ipv4/udp.c   |6 ++
 net/ipv6/udp.c   |6 ++
 net/socket.c |   16 +
 16 files changed, 216 insertions(+), 5 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index c1f8640..85ab72d 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -50,6 +50,13 @@ The maximum number of packets that kernel can handle on a 
NAPI interrupt,
 it's a Per-CPU variable.
 Default: 64
 
+low_latency_poll
+
+Low latency busy poll timeout. (needs CONFIG_NET_LL_RX_POLL)
+Approximate time in us to spin waiting for packets on the device queue.
+Recommended value is 50. May increase power usage.
+Default: 0 (off)
+
 rmem_default
 
 
diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..0ef246d 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -400,6 +401,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec 
*end_time)
poll_table *wait;
int retval, i, timed_out = 0;
unsigned long slack = 0;
+   unsigned long ll_time = ll_end_time();
 
rcu_read_lock();
retval = max_select_fd(n, fds);
@@ -486,6 +488,8 @@ int do_select(int n, fd_set_bits *fds, struct timespec 
*end_time)
break;
}
 
+   if (can_poll_ll(ll_time))
+   continue;
/*
 * If this is the first loop and we have a timeout
 * given, then we convert to ktime_t and set the to
@@ -750,6 +754,7 @@ static int do_poll(unsigned int nfds,  struct poll_list 
*list,
ktime_t expire, *to = NULL;
int timed_out = 0, count = 0;
unsigned long slack = 0;
+   unsigned long ll_time = ll_end_time();
 
/* Optimise the no-wait case */
if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
@@ -795,6 +800,8 @@ static int do_poll(unsigned int nfds,  struct poll_list 
*list,
if (count || timed_out)
break;
 
+   if (can_poll_ll(ll_time))
+   continue;
/*
 * If this is the first loop and we have a timeout
 * given, then we convert to ktime_t and set the to
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964648e..7acea42 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -972,6 +972,9 @@ struct net_device_ops {
 gfp_t gfp);
void(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+   int (*ndo_ll_poll)(struct napi_struct *dev);
+#endif
int (*ndo_set_vf_mac)(struct net_device *dev,
  int queue, u8 *mac);
int (*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8f2b830..77f0a14 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -386,6 +386,7 @@ typedef unsigned char *sk_buff_data_t;
  * @no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  * @dma_cookie: a cookie to one of several possible DMA operations
  * done by skb DMA functions
+  *@napi_id: id of the NAPI struct this skb came from
  * @secmark: security marking
  * @mark: Generic packet mark
  * @dropcount: total number of sk_receive_queue overflows
@@ -500,8 +501,11 @@ struct sk_buff {
/* 7/9 bit hole (depending on ndisc_nodetype presence) */
kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-   dma_cookie_tdma_cookie;
+#if defined CONFIG_NET_DMA || defined CONFIG_NET_LL_RX_POLL
+   union {
+   unsigned intnapi_id;
+   dma_cookie_tdma_cookie;

[PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir
Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll Ethernet
device queues directly from the socket code. The value of sysctl_net_ll_poll
controls how many microseconds to poll. Set to zero to disable.

Signed-off-by: Alexander Duyck alexander.h.du...@intel.com
Signed-off-by: Jesse Brandeburg jesse.brandeb...@intel.com
Tested-by: Willem de Bruijn will...@google.com
Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com
---

 Documentation/sysctl/net.txt |7 ++
 fs/select.c  |7 ++
 include/linux/netdevice.h|3 +
 include/linux/skbuff.h   |8 ++-
 include/net/ll_poll.h|  126 ++
 include/net/sock.h   |4 +
 include/uapi/linux/snmp.h|1 
 net/Kconfig  |   12 
 net/core/datagram.c  |4 +
 net/core/skbuff.c|4 +
 net/core/sock.c  |6 ++
 net/core/sysctl_net_core.c   |   10 +++
 net/ipv4/proc.c  |1 
 net/ipv4/udp.c   |6 ++
 net/ipv6/udp.c   |6 ++
 net/socket.c |   16 +
 16 files changed, 216 insertions(+), 5 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index c1f8640..85ab72d 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -50,6 +50,13 @@ The maximum number of packets that kernel can handle on a 
NAPI interrupt,
 it's a Per-CPU variable.
 Default: 64
 
+low_latency_poll
+
+Low latency busy poll timeout. (needs CONFIG_NET_LL_RX_POLL)
+Approximate time in us to spin waiting for packets on the device queue.
+Recommended value is 50. May increase power usage.
+Default: 0 (off)
+
 rmem_default
 
 
diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..0ef246d 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
 #include linux/rcupdate.h
 #include linux/hrtimer.h
 #include linux/sched/rt.h
+#include net/ll_poll.h
 
 #include asm/uaccess.h
 
@@ -400,6 +401,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec 
*end_time)
poll_table *wait;
int retval, i, timed_out = 0;
unsigned long slack = 0;
+   unsigned long ll_time = ll_end_time();
 
rcu_read_lock();
retval = max_select_fd(n, fds);
@@ -486,6 +488,8 @@ int do_select(int n, fd_set_bits *fds, struct timespec 
*end_time)
break;
}
 
+   if (can_poll_ll(ll_time))
+   continue;
/*
 * If this is the first loop and we have a timeout
 * given, then we convert to ktime_t and set the to
@@ -750,6 +754,7 @@ static int do_poll(unsigned int nfds,  struct poll_list 
*list,
ktime_t expire, *to = NULL;
int timed_out = 0, count = 0;
unsigned long slack = 0;
+   unsigned long ll_time = ll_end_time();
 
/* Optimise the no-wait case */
if (end_time  !end_time-tv_sec  !end_time-tv_nsec) {
@@ -795,6 +800,8 @@ static int do_poll(unsigned int nfds,  struct poll_list 
*list,
if (count || timed_out)
break;
 
+   if (can_poll_ll(ll_time))
+   continue;
/*
 * If this is the first loop and we have a timeout
 * given, then we convert to ktime_t and set the to
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964648e..7acea42 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -972,6 +972,9 @@ struct net_device_ops {
 gfp_t gfp);
void(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+   int (*ndo_ll_poll)(struct napi_struct *dev);
+#endif
int (*ndo_set_vf_mac)(struct net_device *dev,
  int queue, u8 *mac);
int (*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8f2b830..77f0a14 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -386,6 +386,7 @@ typedef unsigned char *sk_buff_data_t;
  * @no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  * @dma_cookie: a cookie to one of several possible DMA operations
  * done by skb DMA functions
+  *@napi_id: id of the NAPI struct this skb came from
  * @secmark: security marking
  * @mark: Generic packet mark
  * @dropcount: total number of sk_receive_queue overflows
@@ -500,8 +501,11 @@ struct sk_buff {
/* 7/9 bit hole (depending on ndisc_nodetype presence) */
kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-   dma_cookie_t

Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eric Dumazet
On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:

 +/* we don't mind a ~2.5% imprecision */
 +#define TSC_MHZ (tsc_khz  10)
 +
 +static inline unsigned long ll_end_time(void)
 +{
 + return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
 +}static inline unsigned long ll_end_time(void)
+{
+  return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
+}

This can overflow.

Multiply is giving 32bits, as tsc_khz is an int, and sysctl_net_ll_poll
is an int.

unsigned long sysctl_net_ll_poll ?

Also, if we want this to work on i386, the correct type to use for
ll_end_time(void) would be cycles_t



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread David Laight
  +/* we don't mind a ~2.5% imprecision */
  +#define TSC_MHZ (tsc_khz  10)

Wouldn't (tsc_khz  10) be better?

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eric Dumazet
On Wed, 2013-05-29 at 14:42 +0100, David Laight wrote:
   +/* we don't mind a ~2.5% imprecision */
   +#define TSC_MHZ (tsc_khz  10)
 
 Wouldn't (tsc_khz  10) be better?

We want number of cycles per usec.

Your formula gives number of cycles per 1.024 second.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 16:37, Eric Dumazet wrote:

On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:



+static inline unsigned long ll_end_time(void)
+{
+   return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
+}


This can overflow.

Multiply is giving 32bits, as tsc_khz is an int, and sysctl_net_ll_poll
is an int.

unsigned long sysctl_net_ll_poll ?

OK


Also, if we want this to work on i386, the correct type to use for
ll_end_time(void) would be cycles_t


OK
I would be really surprised if someone uses this on an i386, but I guess 
you never know.


Thanks!
-Eliezer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Or Gerlitz
On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
eliezer.ta...@linux.intel.com wrote:
 Adds a new ndo_ll_poll method and the code that supports and uses it.
 This method can be used by low latency applications to busy poll Ethernet
 device queues directly from the socket code. The value of sysctl_net_ll_poll
 controls how many microseconds to poll. Set to zero to disable.

Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread yaniv saar
Hi Eliezer,

(If I'm too late then a future note...)
Why make polling a system-wide configuration?
Wouldn't it make more sense to implement a sock option?
An even better solution might be aggregation/combination of both types
of configurations.

-- Yaniv Sa'ar


On Wed, May 29, 2013 at 5:14 PM, Or Gerlitz or.gerl...@gmail.com wrote:
 On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
 eliezer.ta...@linux.intel.com wrote:
 Adds a new ndo_ll_poll method and the code that supports and uses it.
 This method can be used by low latency applications to busy poll Ethernet
 device queues directly from the socket code. The value of sysctl_net_ll_poll
 controls how many microseconds to poll. Set to zero to disable.

 Unlike with TCP sockets, UDP sockets may receive packets from multiple
 sources and hence the receiving context may be steered to be executed
 on different cores through RSS or other Flow-Steering HW mechanisms
 which could mean different napi contexts for the same socket, is that
 a problem here? what's the severity?

 Or.
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 17:14, Or Gerlitz wrote:

On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
eliezer.ta...@linux.intel.com wrote:

Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll Ethernet
device queues directly from the socket code. The value of sysctl_net_ll_poll
controls how many microseconds to poll. Set to zero to disable.


Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?


Nothing will break if you poll on the wrong queue.
Your data will come through normal NAPI processing of the right queue.

One of the things we plan on adding in the next version is a more fine 
grained control over which sockets get to busy poll.


-Eliezer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 17:20, yaniv saar wrote:

Hi Eliezer,

(If I'm too late then a future note...)
Why make polling a system-wide configuration?
Wouldn't it make more sense to implement a sock option?
An even better solution might be aggregation/combination of both types of
configurations.

-- Yaniv Sa'ar


We plan on adding a socket option in the future.

-Eliezer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Or Gerlitz
Eliezer Tamir eliezer.ta...@linux.intel.com wrote:
 Or Gerlitz wrote:

 Unlike with TCP sockets, UDP sockets may receive packets from multiple
 sources and hence the receiving context may be steered to be executed
 on different cores through RSS or other Flow-Steering HW mechanisms
 which could mean different napi contexts for the same socket, is that
 a problem here? what's the severity?

 Nothing will break if you poll on the wrong queue.
 Your data will come through normal NAPI processing of the right queue.

Can you elaborate a little further, why you call this wrong and right?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eric Dumazet
On Wed, 2013-05-29 at 21:52 +0300, Or Gerlitz wrote:
 Eliezer Tamir eliezer.ta...@linux.intel.com wrote:
  Or Gerlitz wrote:
 
  Unlike with TCP sockets, UDP sockets may receive packets from multiple
  sources and hence the receiving context may be steered to be executed
  on different cores through RSS or other Flow-Steering HW mechanisms
  which could mean different napi contexts for the same socket, is that
  a problem here? what's the severity?
 
  Nothing will break if you poll on the wrong queue.
  Your data will come through normal NAPI processing of the right queue.
 
 Can you elaborate a little further, why you call this wrong and right?
 --

This definitely need some documentation, because before llpoll, device
RX path was serviced by the cpu receiving the harwdare interrupt.

So the wrong queue could add false sharing, and wrong NUMA
allocations.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Ben Hutchings
On Wed, 2013-05-29 at 17:14 +0300, Or Gerlitz wrote:
 On Wed, May 29, 2013 at 9:39 AM, Eliezer Tamir
 eliezer.ta...@linux.intel.com wrote:
  Adds a new ndo_ll_poll method and the code that supports and uses it.
  This method can be used by low latency applications to busy poll Ethernet
  device queues directly from the socket code. The value of sysctl_net_ll_poll
  controls how many microseconds to poll. Set to zero to disable.
 
 Unlike with TCP sockets, UDP sockets may receive packets from multiple
 sources and hence the receiving context may be steered to be executed
 on different cores through RSS or other Flow-Steering HW mechanisms
 which could mean different napi contexts for the same socket, is that
 a problem here? what's the severity?

Maybe ARFS could be extended so the driver can tell whether a UDP socket
it's steering for is connected or not.  Then for disconnected sockets
the driver can use a filter that only matches destination address.
(Though that's probably undesirable if the socket has SO_REUSEPORT set.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 net-next 2/5] net: implement support for low latency socket polling

2013-05-29 Thread Eliezer Tamir

On 29/05/2013 22:08, Eric Dumazet wrote:

On Wed, 2013-05-29 at 21:52 +0300, Or Gerlitz wrote:

Eliezer Tamir eliezer.ta...@linux.intel.com wrote:

Or Gerlitz wrote:



Unlike with TCP sockets, UDP sockets may receive packets from multiple
sources and hence the receiving context may be steered to be executed
on different cores through RSS or other Flow-Steering HW mechanisms
which could mean different napi contexts for the same socket, is that
a problem here? what's the severity?



Nothing will break if you poll on the wrong queue.
Your data will come through normal NAPI processing of the right queue.


Can you elaborate a little further, why you call this wrong and right?
--


This definitely need some documentation, because before llpoll, device
RX path was serviced by the cpu receiving the harwdare interrupt.

So the wrong queue could add false sharing, and wrong NUMA
allocations.


Yes,
To work properly when you have more than one NUMA node, you have to have 
packet steering set up, either by your NIC or by HW accelerated RFS.


I would like to add a short writeup of the design and suggested 
configuration. Where should it go?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/