Re: [PATCH net] macvlan: remove duplicate check

2018-12-06 Thread Matteo Croce
On Wed, Dec 5, 2018 at 8:40 PM David Miller  wrote:
>
> From: Matteo Croce 
> Date: Tue,  4 Dec 2018 18:05:42 +0100
>
> > Following commit 59f997b088d2 ("macvlan: return correct error value"),
> > there is a duplicate check for mac addresses both in macvlan_sync_address()
> > and macvlan_set_mac_address().
> > As the former calls the latter, remove the one in macvlan_set_mac_address()
> > and move the one in macvlan_sync_address() before any other check.
> >
> > Signed-off-by: Matteo Croce 
>
> Hmmm, doesn't this change behavior?
>
> For the handling of the NETDEV_CHANGEADDR event in macvlan_device_event()
> we would make it to macvlan_sync_address(), and if IFF_UP is false,
> we would elide the macvlan_addr_busy() check and just copy the MAC addres
> over and return.
>
> Now, we would always perform the macvlan_addr_busy() check.
>
> Please, if this is OK, explain and document this behavioral chance in
> the commit message.
>
> Thank you.

Hi David,

I looked at macvlan_device_event() again. Correct me if I'm wrong:
That function is meant to handle changes to the macvlan lower device.
In my case, it receives an NETDEV_CHANGEADDR after the lower device
mac addres is changed.
Actually events are handled only if the macvlan mode is passthru,
while in all other modes NOTIFY_DONE is just returned, so
macvlan_sync_address() is called only for passthru mode.
The passthru mode mandates that the macvlan and phy address are the
same, hence macvlan_addr_busy() skips the address comparison if the
mode is passthru, and at the end, nothing happens.

Speaking of mac address change, I have a question about the generic code.
If I look at the NOTIFY_BAD definition in include/linux/notifier.h,
the comment states "Bad/Veto action", which suggests me that a
notifier returning NOTIFY_BAD should prevent a change.
This doesn't happen because in dev_set_mac_address(), the event is
sent to notifiers after the change has already made, and the result of
call_netdevice_notifiers() is ignored anyway.

So in theory a notifier can deny another device address change, but in
practice this doesn't happen. Does it sound right? Just asking.

Regards,
-- 
Matteo Croce
per aspera ad upstream


[PATCH net] macvlan: remove duplicate check

2018-12-04 Thread Matteo Croce
Following commit 59f997b088d2 ("macvlan: return correct error value"),
there is a duplicate check for mac addresses both in macvlan_sync_address()
and macvlan_set_mac_address().
As the former calls the latter, remove the one in macvlan_set_mac_address()
and move the one in macvlan_sync_address() before any other check.

Signed-off-by: Matteo Croce 
---
 drivers/net/macvlan.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 0da3d36b283b..f3361aabdb78 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -700,14 +700,14 @@ static int macvlan_sync_address(struct net_device *dev, 
unsigned char *addr)
struct macvlan_port *port = vlan->port;
int err;
 
+   if (macvlan_addr_busy(vlan->port, addr))
+   return -EADDRINUSE;
+
if (!(dev->flags & IFF_UP)) {
/* Just copy in the new address */
ether_addr_copy(dev->dev_addr, addr);
} else {
/* Rehash and update the device filters */
-   if (macvlan_addr_busy(vlan->port, addr))
-   return -EADDRINUSE;
-
if (!macvlan_passthru(port)) {
err = dev_uc_add(lowerdev, addr);
if (err)
@@ -747,9 +747,6 @@ static int macvlan_set_mac_address(struct net_device *dev, 
void *p)
return dev_set_mac_address(vlan->lowerdev, addr);
}
 
-   if (macvlan_addr_busy(vlan->port, addr->sa_data))
-   return -EADDRINUSE;
-
return macvlan_sync_address(dev, addr->sa_data);
 }
 
-- 
2.19.2



Re: [PATCH bpf-next v2 0/2] sample: xdp1 improvements

2018-12-01 Thread Matteo Croce
On Sat, Dec 1, 2018 at 6:11 AM Alexei Starovoitov
 wrote:
>
> On Sat, Dec 01, 2018 at 01:23:04AM +0100, Matteo Croce wrote:
> > Small improvements to improve the readability and easiness
> > to use of the xdp1 sample.
>
> Applied to bpf-next.
>
> I think that sample code could be more useful if it's wrapped with bash
> script like selftests/test_xdp* tests do.
> At that point it can move to selftests to get 0bot coverage.
> Would you be interested in doing that?
>

It would be nice, but I think that the samples have more urgent issues
right now.
Many examples doesn't compile on my system (Fedora 29, GCC 8.2.1, Clang 7.0.0),
these are the errors that I encounter:

  HOSTCC  /home/matteo/src/kernel/linux/samples/bpf/test_lru_dist
/home/matteo/src/kernel/linux/samples/bpf/test_lru_dist.c:39:8: error:
redefinition of ‘struct list_head’
 struct list_head {
^

  HOSTCC  /home/matteo/src/kernel/linux/samples/bpf/sock_example
In file included from
/home/matteo/src/kernel/linux/samples/bpf/sock_example.c:27:
/usr/include/linux/ip.h:102:2: error: unknown type name ‘__sum16’
  __sum16 check;
  ^~~

  HOSTCC  /home/matteo/src/kernel/linux/samples/bpf/tracex5_user.o
/home/matteo/src/kernel/linux/samples/bpf/tracex5_user.c: In function
‘install_accept_all_seccomp’:
/home/matteo/src/kernel/linux/samples/bpf/tracex5_user.c:17:21: error:
array type has incomplete element type ‘struct sock_filter’
  struct sock_filter filter[] = {
 ^~

  HOSTCC  /home/matteo/src/kernel/linux/samples/bpf/test_cgrp2_attach2.o
/home/matteo/src/kernel/linux/samples/bpf/test_cgrp2_attach2.c: In
function ‘prog_load_cnt’:
/home/matteo/src/kernel/linux/samples/bpf/test_cgrp2_attach2.c:229:3:
error: ‘BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE’ undeclared (first use in
this function); did you mean ‘BPF_MAP_TYPE_CGROUP_STORAGE’?
   BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
   ^~

  HOSTCC  /home/matteo/src/kernel/linux/samples/bpf/xdpsock_user.o
/home/matteo/src/kernel/linux/samples/bpf/xdpsock_user.c:59:15: error:
conflicting types for ‘u64’
 typedef __u64 u64;
   ^~~

To be able to compile the samples, I temporarily removed them from the
compilation this way:

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index be0a961450bc..33d7161f2231 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -4,8 +4,6 @@ BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
 TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools

 # List of programs to build
-hostprogs-y := test_lru_dist
-hostprogs-y += sock_example
 hostprogs-y += fds_example
 hostprogs-y += sockex1
 hostprogs-y += sockex2
@@ -14,7 +12,6 @@ hostprogs-y += tracex1
 hostprogs-y += tracex2
 hostprogs-y += tracex3
 hostprogs-y += tracex4
-hostprogs-y += tracex5
 hostprogs-y += tracex6
 hostprogs-y += tracex7
 hostprogs-y += test_probe_write_user
@@ -26,7 +23,6 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += test_cgrp2_attach
-hostprogs-y += test_cgrp2_attach2
 hostprogs-y += test_cgrp2_sock
 hostprogs-y += test_cgrp2_sock2
 hostprogs-y += xdp1
@@ -49,7 +45,6 @@ hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
 hostprogs-y += cpustat
 hostprogs-y += xdp_adjust_tail
-hostprogs-y += xdpsock
 hostprogs-y += xdp_fwd
 hostprogs-y += task_fd_query
 hostprogs-y += xdp_sample_pkts

Regards,
-- 
Matteo Croce
per aspera ad upstream


[PATCH bpf-next v2 0/2] sample: xdp1 improvements

2018-11-30 Thread Matteo Croce
Small improvements to improve the readability and easiness
to use of the xdp1 sample.

Matteo Croce (2):
  samples: bpf: improve xdp1 example
  samples: bpf: get ifindex from ifname

 samples/bpf/xdp1_user.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
2.19.1



[PATCH bpf-next v2 1/2] samples: bpf: improve xdp1 example

2018-11-30 Thread Matteo Croce
Store only the total packet count for every protocol, instead of the
whole per-cpu array.
Use bpf_map_get_next_key() to iterate the map, instead of looking up
all the protocols.

Signed-off-by: Matteo Croce 
---
 samples/bpf/xdp1_user.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b02c531510ed..4f3d824fc044 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -34,26 +34,24 @@ static void int_exit(int sig)
 static void poll_stats(int map_fd, int interval)
 {
unsigned int nr_cpus = bpf_num_possible_cpus();
-   const unsigned int nr_keys = 256;
-   __u64 values[nr_cpus], prev[nr_keys][nr_cpus];
-   __u32 key;
+   __u64 values[nr_cpus], prev[UINT8_MAX] = { 0 };
int i;
 
-   memset(prev, 0, sizeof(prev));
-
while (1) {
+   __u32 key = UINT32_MAX;
+
sleep(interval);
 
-   for (key = 0; key < nr_keys; key++) {
+   while (bpf_map_get_next_key(map_fd, , ) != -1) {
__u64 sum = 0;
 
assert(bpf_map_lookup_elem(map_fd, , values) == 0);
for (i = 0; i < nr_cpus; i++)
-   sum += (values[i] - prev[key][i]);
-   if (sum)
+   sum += values[i];
+   if (sum > prev[key])
printf("proto %u: %10llu pkt/s\n",
-  key, sum / interval);
-   memcpy(prev[key], values, sizeof(values));
+  key, (sum - prev[key]) / interval);
+   prev[key] = sum;
}
}
 }
-- 
2.19.1



[PATCH bpf-next v2 2/2] samples: bpf: get ifindex from ifname

2018-11-30 Thread Matteo Croce
Find the ifindex with if_nametoindex() instead of requiring the
numeric ifindex.

Signed-off-by: Matteo Croce 
---
v2: use if_nametoindex() instead of ioctl()

 samples/bpf/xdp1_user.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 4f3d824fc044..0a197f86ac43 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_util.h"
 #include "bpf/bpf.h"
@@ -59,7 +60,7 @@ static void poll_stats(int map_fd, int interval)
 static void usage(const char *prog)
 {
fprintf(stderr,
-   "usage: %s [OPTS] IFINDEX\n\n"
+   "usage: %s [OPTS] IFACE\n\n"
"OPTS:\n"
"-Suse skb-mode\n"
"-Nenforce native mode\n",
@@ -102,7 +103,11 @@ int main(int argc, char **argv)
return 1;
}
 
-   ifindex = strtoul(argv[optind], NULL, 0);
+   ifindex = if_nametoindex(argv[1]);
+   if (!ifindex) {
+   perror("if_nametoindex");
+   return 1;
+   }
 
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
prog_load_attr.file = filename;
-- 
2.19.1



[PATCH net] macvlan: return correct error value

2018-11-30 Thread Matteo Croce
A MAC address must be unique among all the macvlan devices with the same
lower device. The only exception is the passthru [sic] mode,
which shares the lower device address.

When duplicate addresses are detected, EBUSY is returned when bringing
the interface up:

# ip link add macvlan0 link eth0 type macvlan
# read addr 
---
 drivers/net/macvlan.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index fc8d5f1ee1ad..0da3d36b283b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -608,7 +608,7 @@ static int macvlan_open(struct net_device *dev)
goto hash_add;
}
 
-   err = -EBUSY;
+   err = -EADDRINUSE;
if (macvlan_addr_busy(vlan->port, dev->dev_addr))
goto out;
 
@@ -706,7 +706,7 @@ static int macvlan_sync_address(struct net_device *dev, 
unsigned char *addr)
} else {
/* Rehash and update the device filters */
if (macvlan_addr_busy(vlan->port, addr))
-   return -EBUSY;
+   return -EADDRINUSE;
 
if (!macvlan_passthru(port)) {
err = dev_uc_add(lowerdev, addr);
@@ -747,6 +747,9 @@ static int macvlan_set_mac_address(struct net_device *dev, 
void *p)
return dev_set_mac_address(vlan->lowerdev, addr);
}
 
+   if (macvlan_addr_busy(vlan->port, addr->sa_data))
+   return -EADDRINUSE;
+
return macvlan_sync_address(dev, addr->sa_data);
 }
 
-- 
2.19.1



Re: [PATCH bpf-next 2/2] samples: bpf: get ifindex from ifname

2018-10-19 Thread Matteo Croce
On Fri, Oct 19, 2018 at 5:35 AM Y Song  wrote:
>
> On Thu, Oct 18, 2018 at 1:48 PM Matteo Croce  wrote:
> >
> > Find the ifindex via ioctl(SIOCGIFINDEX) instead of requiring the
> > numeric ifindex.
>
> Maybe use if_nametoindex which is simpler?
>
> >
> > Signed-off-by: Matteo Croce 
> > ---
> >  samples/bpf/xdp1_user.c | 26 --
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
> > index 4f3d824fc044..a1d0c5dcee9c 100644
> > --- a/samples/bpf/xdp1_user.c
> > +++ b/samples/bpf/xdp1_user.c
> > @@ -15,6 +15,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +#include 
> >
> >  #include "bpf_util.h"
> >  #include "bpf/bpf.h"
> > @@ -59,7 +62,7 @@ static void poll_stats(int map_fd, int interval)
> >  static void usage(const char *prog)
> >  {
> > fprintf(stderr,
> > -   "usage: %s [OPTS] IFINDEX\n\n"
> > +   "usage: %s [OPTS] IFACE\n\n"
> > "OPTS:\n"
> > "-Suse skb-mode\n"
> > "-Nenforce native mode\n",
> > @@ -74,9 +77,11 @@ int main(int argc, char **argv)
> > };
> > const char *optstr = "SN";
> > int prog_fd, map_fd, opt;
> > +   struct ifreq ifr = { 0 };
> > struct bpf_object *obj;
> > struct bpf_map *map;
> > char filename[256];
> > +   int sock;
> >
> > while ((opt = getopt(argc, argv, optstr)) != -1) {
> > switch (opt) {
> > @@ -102,7 +107,24 @@ int main(int argc, char **argv)
> > return 1;
> > }
> >
> > -   ifindex = strtoul(argv[optind], NULL, 0);
> > +   sock = socket(AF_UNIX, SOCK_DGRAM, 0);
> > +   if (sock == -1) {
> > +   perror("socket");
> > +   return 1;
> > +   }
> > +
> > +   if (strlen(argv[optind]) >= IFNAMSIZ) {
> > +   printf("invalid ifname '%s'\n", argv[optind]);
> > +   return 1;
> > +   }
> > +
> > +   strcpy(ifr.ifr_name, argv[optind]);
> > +   if (ioctl(sock, SIOCGIFINDEX, ) < 0) {
> > +   perror("SIOCGIFINDEX");
> > +   return 1;
> > +   }
> > +   close(sock);
> > +   ifindex = ifr.ifr_ifindex;
> >
> > snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> > prog_load_attr.file = filename;
> > --
> > 2.19.1
> >

Right, even better. Will do a v2, thanks.

-- 
Matteo Croce
per aspera ad upstream


[PATCH bpf-next 2/2] samples: bpf: get ifindex from ifname

2018-10-18 Thread Matteo Croce
Find the ifindex via ioctl(SIOCGIFINDEX) instead of requiring the
numeric ifindex.

Signed-off-by: Matteo Croce 
---
 samples/bpf/xdp1_user.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 4f3d824fc044..a1d0c5dcee9c 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "bpf_util.h"
 #include "bpf/bpf.h"
@@ -59,7 +62,7 @@ static void poll_stats(int map_fd, int interval)
 static void usage(const char *prog)
 {
fprintf(stderr,
-   "usage: %s [OPTS] IFINDEX\n\n"
+   "usage: %s [OPTS] IFACE\n\n"
"OPTS:\n"
"-Suse skb-mode\n"
"-Nenforce native mode\n",
@@ -74,9 +77,11 @@ int main(int argc, char **argv)
};
const char *optstr = "SN";
int prog_fd, map_fd, opt;
+   struct ifreq ifr = { 0 };
struct bpf_object *obj;
struct bpf_map *map;
char filename[256];
+   int sock;
 
while ((opt = getopt(argc, argv, optstr)) != -1) {
switch (opt) {
@@ -102,7 +107,24 @@ int main(int argc, char **argv)
return 1;
}
 
-   ifindex = strtoul(argv[optind], NULL, 0);
+   sock = socket(AF_UNIX, SOCK_DGRAM, 0);
+   if (sock == -1) {
+   perror("socket");
+   return 1;
+   }
+
+   if (strlen(argv[optind]) >= IFNAMSIZ) {
+   printf("invalid ifname '%s'\n", argv[optind]);
+   return 1;
+   }
+
+   strcpy(ifr.ifr_name, argv[optind]);
+   if (ioctl(sock, SIOCGIFINDEX, ) < 0) {
+   perror("SIOCGIFINDEX");
+   return 1;
+   }
+   close(sock);
+   ifindex = ifr.ifr_ifindex;
 
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
prog_load_attr.file = filename;
-- 
2.19.1



[PATCH bpf-next 0/2] sample: xdp1 improvements

2018-10-18 Thread Matteo Croce
Small improvements to improve the readability and easiness
to use of the xdp1 sample.

Matteo Croce (2):
  samples: bpf: improve xdp1 example
  samples: bpf: get ifindex from ifname

 samples/bpf/xdp1_user.c | 44 ++---
 1 file changed, 32 insertions(+), 12 deletions(-)

-- 
2.19.1



[PATCH bpf-next 1/2] samples: bpf: improve xdp1 example

2018-10-18 Thread Matteo Croce
Store only the total packet count for every protocol, instead of the
whole per-cpu array.
Use bpf_map_get_next_key() to iterate the map, instead of looking up
all the protocols.

Signed-off-by: Matteo Croce 
---
 samples/bpf/xdp1_user.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b02c531510ed..4f3d824fc044 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -34,26 +34,24 @@ static void int_exit(int sig)
 static void poll_stats(int map_fd, int interval)
 {
unsigned int nr_cpus = bpf_num_possible_cpus();
-   const unsigned int nr_keys = 256;
-   __u64 values[nr_cpus], prev[nr_keys][nr_cpus];
-   __u32 key;
+   __u64 values[nr_cpus], prev[UINT8_MAX] = { 0 };
int i;
 
-   memset(prev, 0, sizeof(prev));
-
while (1) {
+   __u32 key = UINT32_MAX;
+
sleep(interval);
 
-   for (key = 0; key < nr_keys; key++) {
+   while (bpf_map_get_next_key(map_fd, , ) != -1) {
__u64 sum = 0;
 
assert(bpf_map_lookup_elem(map_fd, , values) == 0);
for (i = 0; i < nr_cpus; i++)
-   sum += (values[i] - prev[key][i]);
-   if (sum)
+   sum += values[i];
+   if (sum > prev[key])
printf("proto %u: %10llu pkt/s\n",
-  key, sum / interval);
-   memcpy(prev[key], values, sizeof(values));
+  key, (sum - prev[key]) / interval);
+   prev[key] = sum;
}
}
 }
-- 
2.19.1



[PATCH iproute2] ip link: don't stop batch processing

2018-08-03 Thread Matteo Croce
When 'ip link show dev DEVICE' is processed in a batch mode, ip exits
and stop processing further commands.
This because ipaddr_list_flush_or_save() calls exit() to avoid printing
the link information twice.
Replace the exit with a classic goto out instruction.

Signed-off-by: Matteo Croce 
---
 ip/ipaddress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 6c306ab7..b7b78f6e 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1920,7 +1920,7 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
exit(1);
}
delete_json_obj();
-   exit(0);
+   goto out;
}
 
if (filter.family != AF_PACKET) {
-- 
2.17.1



Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-31 Thread Matteo Croce
On Mon, Jul 16, 2018 at 4:54 PM Matteo Croce  wrote:
>
> On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar  wrote:
> >
> > On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> > > From: Stefano Brivio 
> > >
> > > Open vSwitch sends to userspace all received packets that have
> > > no associated flow (thus doing an "upcall"). Then the userspace
> > > program creates a new flow and determines the actions to apply
> > > based on its configuration.
> > >
> > > When a single port generates a high rate of upcalls, it can
> > > prevent other ports from dispatching their own upcalls. vswitchd
> > > overcomes this problem by creating many netlink sockets for each
> > > port, but it quickly exceeds any reasonable maximum number of
> > > open files when dealing with huge amounts of ports.
> > >
> > > This patch queues all the upcalls into a list, ordering them in
> > > a per-port round-robin fashion, and schedules a deferred work to
> > > queue them to userspace.
> > >
> > > The algorithm to queue upcalls in a round-robin fashion,
> > > provided by Stefano, is based on these two rules:
> > >  - upcalls for a given port must be inserted after all the other
> > >occurrences of upcalls for the same port already in the queue,
> > >in order to avoid out-of-order upcalls for a given port
> > >  - insertion happens once the highest upcall count for any given
> > >port (excluding the one currently at hand) is greater than the
> > >count for the port we're queuing to -- if this condition is
> > >never true, upcall is queued at the tail. This results in a
> > >per-port round-robin order.
> > >
> > > In order to implement a fair round-robin behaviour, a variable
> > > queueing delay is introduced. This will be zero if the upcalls
> > > rate is below a given threshold, and grows linearly with the
> > > queue utilisation (i.e. upcalls rate) otherwise.
> > >
> > > This ensures fairness among ports under load and with few
> > > netlink sockets.
> > >
> > Thanks for the patch.
> > This patch is adding following overhead for upcall handling:
> > 1. kmalloc.
> > 2. global spin-lock.
> > 3. context switch to single worker thread.
> > I think this could become bottle neck on most of multi core systems.
> > You have mentioned issue with existing fairness mechanism, Can you
> > elaborate on those, I think we could improve that before implementing
> > heavy weight fairness in upcall handling.
>
> Hi Pravin,
>
> vswitchd allocates N * P netlink sockets, where N is the number of
> online CPU cores, and P the number of ports.
> With some setups, this number can grow quite fast, also exceeding the
> system maximum file descriptor limit.
> I've seen a 48 core server failing with -EMFILE when trying to create
> more than 65535 netlink sockets needed for handling 1800+ ports.
>
> I made a previous attempt to reduce the sockets to one per CPU, but
> this was discussed and rejected on ovs-dev because it would remove
> fairness among ports[1].
> I think that the current approach of opening a huge number of sockets
> doesn't really work, (it doesn't scale for sure), it still needs some
> queueing logic (either in kernel or user space) if we really want to
> be sure that low traffic ports gets their upcalls quota when other
> ports are doing way more traffic.
>
> If you are concerned about the kmalloc or spinlock, we can solve them
> with kmem_cache or two copies of the list and rcu, I'll happy to
> discuss the implementation details, as long as we all agree that the
> current implementation doesn't scale well and has an issue.
>
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html
>
> --
> Matteo Croce
> per aspera ad upstream

Hi all,

any idea on how to solve the file descriptor limit hit by the netlink sockets?
I see this issue happen very often, and raising the FD limit to 400k
seems not the right way to solve it.
Any other suggestion on how to improve the patch, or solve the problem
in a different way?

Regards,



--
Matteo Croce
per aspera ad upstream


Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-16 Thread Matteo Croce
On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar  wrote:
>
> On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> > From: Stefano Brivio 
> >
> > Open vSwitch sends to userspace all received packets that have
> > no associated flow (thus doing an "upcall"). Then the userspace
> > program creates a new flow and determines the actions to apply
> > based on its configuration.
> >
> > When a single port generates a high rate of upcalls, it can
> > prevent other ports from dispatching their own upcalls. vswitchd
> > overcomes this problem by creating many netlink sockets for each
> > port, but it quickly exceeds any reasonable maximum number of
> > open files when dealing with huge amounts of ports.
> >
> > This patch queues all the upcalls into a list, ordering them in
> > a per-port round-robin fashion, and schedules a deferred work to
> > queue them to userspace.
> >
> > The algorithm to queue upcalls in a round-robin fashion,
> > provided by Stefano, is based on these two rules:
> >  - upcalls for a given port must be inserted after all the other
> >occurrences of upcalls for the same port already in the queue,
> >in order to avoid out-of-order upcalls for a given port
> >  - insertion happens once the highest upcall count for any given
> >port (excluding the one currently at hand) is greater than the
> >count for the port we're queuing to -- if this condition is
> >never true, upcall is queued at the tail. This results in a
> >per-port round-robin order.
> >
> > In order to implement a fair round-robin behaviour, a variable
> > queueing delay is introduced. This will be zero if the upcalls
> > rate is below a given threshold, and grows linearly with the
> > queue utilisation (i.e. upcalls rate) otherwise.
> >
> > This ensures fairness among ports under load and with few
> > netlink sockets.
> >
> Thanks for the patch.
> This patch is adding following overhead for upcall handling:
> 1. kmalloc.
> 2. global spin-lock.
> 3. context switch to single worker thread.
> I think this could become bottle neck on most of multi core systems.
> You have mentioned issue with existing fairness mechanism, Can you
> elaborate on those, I think we could improve that before implementing
> heavy weight fairness in upcall handling.

Hi Pravin,

vswitchd allocates N * P netlink sockets, where N is the number of
online CPU cores, and P the number of ports.
With some setups, this number can grow quite fast, also exceeding the
system maximum file descriptor limit.
I've seen a 48 core server failing with -EMFILE when trying to create
more than 65535 netlink sockets needed for handling 1800+ ports.

I made a previous attempt to reduce the sockets to one per CPU, but
this was discussed and rejected on ovs-dev because it would remove
fairness among ports[1].
I think that the current approach of opening a huge number of sockets
doesn't really work, (it doesn't scale for sure), it still needs some
queueing logic (either in kernel or user space) if we really want to
be sure that low traffic ports gets their upcalls quota when other
ports are doing way more traffic.

If you are concerned about the kmalloc or spinlock, we can solve them
with kmem_cache or two copies of the list and rcu, I'll happy to
discuss the implementation details, as long as we all agree that the
current implementation doesn't scale well and has an issue.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html

--
Matteo Croce
per aspera ad upstream


[PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-04 Thread Matteo Croce
From: Stefano Brivio 

Open vSwitch sends to userspace all received packets that have
no associated flow (thus doing an "upcall"). Then the userspace
program creates a new flow and determines the actions to apply
based on its configuration.

When a single port generates a high rate of upcalls, it can
prevent other ports from dispatching their own upcalls. vswitchd
overcomes this problem by creating many netlink sockets for each
port, but it quickly exceeds any reasonable maximum number of
open files when dealing with huge amounts of ports.

This patch queues all the upcalls into a list, ordering them in
a per-port round-robin fashion, and schedules a deferred work to
queue them to userspace.

The algorithm to queue upcalls in a round-robin fashion,
provided by Stefano, is based on these two rules:
 - upcalls for a given port must be inserted after all the other
   occurrences of upcalls for the same port already in the queue,
   in order to avoid out-of-order upcalls for a given port
 - insertion happens once the highest upcall count for any given
   port (excluding the one currently at hand) is greater than the
   count for the port we're queuing to -- if this condition is
   never true, upcall is queued at the tail. This results in a
   per-port round-robin order.

In order to implement a fair round-robin behaviour, a variable
queueing delay is introduced. This will be zero if the upcalls
rate is below a given threshold, and grows linearly with the
queue utilisation (i.e. upcalls rate) otherwise.

This ensures fairness among ports under load and with few
netlink sockets.

Signed-off-by: Matteo Croce 
Co-authored-by: Stefano Brivio 
---
 net/openvswitch/datapath.c | 143 ++---
 net/openvswitch/datapath.h |  27 ++-
 2 files changed, 161 insertions(+), 9 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0f5ce77460d4..2cfd504562d8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -59,6 +59,10 @@
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
+#define UPCALL_QUEUE_TIMEOUT   msecs_to_jiffies(10)
+#define UPCALL_QUEUE_MAX_DELAY msecs_to_jiffies(10)
+#define UPCALL_QUEUE_MAX_LEN   200
+
 unsigned int ovs_net_id __read_mostly;
 
 static struct genl_family dp_packet_genl_family;
@@ -225,6 +229,116 @@ void ovs_dp_detach_port(struct vport *p)
ovs_vport_del(p);
 }
 
+static void ovs_dp_upcall_dequeue(struct work_struct *work)
+{
+   struct datapath *dp = container_of(work, struct datapath,
+  upcalls.work.work);
+   struct dp_upcall_info *u, *n;
+
+   spin_lock_bh(>upcalls.lock);
+   list_for_each_entry_safe(u, n, >upcalls.list, list) {
+   if (unlikely(ovs_dp_upcall(dp, u->skb, >key, u, 0)))
+   kfree_skb(u->skb);
+   else
+   consume_skb(u->skb);
+   kfree(u);
+   }
+   dp->upcalls.len = 0;
+   INIT_LIST_HEAD(>upcalls.list);
+   spin_unlock_bh(>upcalls.lock);
+}
+
+/* Calculate the delay of the deferred work which sends the upcalls. If it ran
+ * more than UPCALL_QUEUE_TIMEOUT ago, schedule the work immediately. Otherwise
+ * return a time between 0 and UPCALL_QUEUE_MAX_DELAY, depending linearly on 
the
+ * queue utilisation.
+ */
+static unsigned long ovs_dp_upcall_delay(int queue_len, unsigned long last_run)
+{
+   if (jiffies - last_run >= UPCALL_QUEUE_TIMEOUT)
+   return 0;
+
+   return UPCALL_QUEUE_MAX_DELAY -
+  UPCALL_QUEUE_MAX_DELAY * queue_len / UPCALL_QUEUE_MAX_LEN;
+}
+
+static int ovs_dp_upcall_queue_roundrobin(struct datapath *dp,
+ struct dp_upcall_info *upcall)
+{
+   struct list_head *head = >upcalls.list;
+   struct dp_upcall_info *here = NULL, *pos;
+   bool find_next = true;
+   unsigned long delay;
+   int err = 0;
+   u8 count;
+
+   spin_lock_bh(>upcalls.lock);
+   if (dp->upcalls.len > UPCALL_QUEUE_MAX_LEN) {
+   err = -ENOSPC;
+   goto out;
+   }
+
+   /* Insert upcalls in the list in a per-port round-robin fashion, look
+* for insertion point:
+* - to avoid out-of-order per-port upcalls, we can insert only after
+*   the last occurrence of upcalls for the same port
+* - insert upcall only after we reach a count of occurrences for a
+*   given port greater than the one we're inserting this upcall for
+*/
+   list_for_each_entry(pos, head, list) {
+   /* Count per-port upcalls. */
+   if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) {
+   err = -ENOSPC;
+   goto out_clear;
+   }
+   dp->upcalls.count[pos->port_no]++;
+
+   if (pos->port_no == upcall-

Re: [PATCH] bpfilter: fix user mode helper cross compilation

2018-06-28 Thread Matteo Croce
On Thu, Jun 28, 2018 at 6:17 AM Andrew Morton  wrote:
>
> On Wed, 20 Jun 2018 16:04:34 +0200 Matteo Croce  wrote:
>
> > Use $(OBJDUMP) instead of literal 'objdump' to avoid
> > using host toolchain when cross compiling.
> >
>
> I'm still having issues here, with ld.
>
> x86_64 machine, ARCH=i386:
>
> y:/usr/src/25> make V=1 M=net/bpfilter
> test -e include/generated/autoconf.h -a -e include/config/auto.conf || (  
>  \
> echo >&2;   \
> echo >&2 "  ERROR: Kernel configuration is invalid.";   \
> echo >&2 " include/generated/autoconf.h or include/config/auto.conf 
> are missing.";\
> echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix 
> it.";  \
> echo >&2 ;  \
> /bin/false)
> mkdir -p net/bpfilter/.tmp_versions ; rm -f net/bpfilter/.tmp_versions/*
> make -f ./scripts/Makefile.build obj=net/bpfilter
> (cat /dev/null;   echo kernel/net/bpfilter/bpfilter.ko;) > 
> net/bpfilter/modules.order
>   ld -m elf_i386   -r -o net/bpfilter/bpfilter.o net/bpfilter/bpfilter_kern.o 
> net/bpfilter/bpfilter_umh.o ; scripts/mod/modpost net/bpfilter/bpfilter.o
> ld: i386:x86-64 architecture of input file `net/bpfilter/bpfilter_umh.o' is 
> incompatible with i386 output
> scripts/Makefile.build:530: recipe for target 'net/bpfilter/bpfilter.o' failed
> make[1]: *** [net/bpfilter/bpfilter.o] Error 1
> Makefile:1518: recipe for target '_module_net/bpfilter' failed
> make: *** [_module_net/bpfilter] Error 2
>
> y:/usr/src/25> ld --version
> GNU ld (GNU Binutils for Ubuntu) 2.29.1
>
>

Hi Andrew,

That's because the Makefile does `HOSTCC:=$(CC)` which replaces the
tools compiler with the target one.
The problem is that for i386 and x86_64 the compiler is the same, it's
just called with different arguments, -m32 and -m64.
This ends up with mixed i386 and x86_64 binaries which obviously can't
link together.

Personally I think that we should add infrastructure to build target
progs like we do with hostprogs-y instead of keeping messing with
variables and flags. If you want a quick and dirty hack to build it,
I'm using this.

Regards,

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 051dc18b8ccb..5de353cfd26b 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -5,8 +5,9 @@

 hostprogs-y := bpfilter_umh
 bpfilter_umh-objs := main.o
-HOSTCFLAGS += -I. -Itools/include/ -Itools/include/uapi
 HOSTCC := $(CC)
+HOSTCFLAGS := $(KBUILD_CFLAGS) -I. -Itools/include/ -Itools/include/uapi
+HOSTLOADLIBES_bpfilter_umh := $(KBUILD_CFLAGS)

 ifeq ($(CONFIG_BPFILTER_UMH), y)
 # builtin bpfilter_umh should be compiled with -static

--
Matteo Croce
per aspera ad upstream


Re: [PATCH] bpfilter: fix build error

2018-06-20 Thread Matteo Croce
On Wed, Jun 20, 2018 at 12:39 PM Stefano Brivio  wrote:
>
> On Tue, 19 Jun 2018 17:16:20 +0200
> Matteo Croce  wrote:
>
> > bpfilter Makefile assumes that the system locale is en_US, and the
> > parsing of objdump output fails.
> > Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
> > only 2 processes instead of 7.
> >
> > Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
> > Signed-off-by: Matteo Croce 
> > ---
> >  net/bpfilter/Makefile | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> > index e0bbe7583e58..dd86b022eff0 100644
> > --- a/net/bpfilter/Makefile
> > +++ b/net/bpfilter/Makefile
> > @@ -21,8 +21,10 @@ endif
> >  # which bpfilter_kern.c passes further into umh blob loader at run-time
> >  quiet_cmd_copy_umh = GEN $@
> >cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> > -  $(OBJCOPY) -I binary -O `$(OBJDUMP) -f $<|grep format|cut -d' ' -f8` 
> > \
> > -  -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> > +  $(OBJCOPY) -I binary \
> > +  `LC_ALL=C objdump -f net/bpfilter/bpfilter_umh \
>
> Why do you use objdump instead of $(OBJDUMP) now? I guess this might
> cause issues if you're cross-compiling.
>
> --
> Stefano

Right, I've sent a proper fix.

Thanks,
-- 
Matteo Croce
per aspera ad upstream


[PATCH] bpfilter: fix user mode helper cross compilation

2018-06-20 Thread Matteo Croce
Use $(OBJDUMP) instead of literal 'objdump' to avoid
using host toolchain when cross compiling.

Fixes: 421780fd4983 ("bpfilter: fix build error")
Signed-off-by: Matteo Croce 
---
 net/bpfilter/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index dd86b022eff0..051dc18b8ccb 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -22,7 +22,7 @@ endif
 quiet_cmd_copy_umh = GEN $@
   cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
   $(OBJCOPY) -I binary \
-  `LC_ALL=C objdump -f net/bpfilter/bpfilter_umh \
+  `LC_ALL=C $(OBJDUMP) -f net/bpfilter/bpfilter_umh \
   |awk -F' |,' '/file format/{print "-O",$$NF} \
   /^architecture:/{print "-B",$$2}'` \
   --rename-section .data=.init.rodata $< $@
-- 
2.17.1



[PATCH] bpfilter: ignore binary files

2018-06-19 Thread Matteo Croce
net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
enabled, add it to .gitignore to avoid committing it.

Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce 
---
 net/bpfilter/.gitignore | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 net/bpfilter/.gitignore

diff --git a/net/bpfilter/.gitignore b/net/bpfilter/.gitignore
new file mode 100644
index ..e97084e3eea2
--- /dev/null
+++ b/net/bpfilter/.gitignore
@@ -0,0 +1 @@
+bpfilter_umh
-- 
2.17.1



[PATCH] bpfilter: fix build error

2018-06-19 Thread Matteo Croce
bpfilter Makefile assumes that the system locale is en_US, and the
parsing of objdump output fails.
Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
only 2 processes instead of 7.

Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce 
---
 net/bpfilter/Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index e0bbe7583e58..dd86b022eff0 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -21,8 +21,10 @@ endif
 # which bpfilter_kern.c passes further into umh blob loader at run-time
 quiet_cmd_copy_umh = GEN $@
   cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
-  $(OBJCOPY) -I binary -O `$(OBJDUMP) -f $<|grep format|cut -d' ' -f8` \
-  -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
+  $(OBJCOPY) -I binary \
+  `LC_ALL=C objdump -f net/bpfilter/bpfilter_umh \
+  |awk -F' |,' '/file format/{print "-O",$$NF} \
+  /^architecture:/{print "-B",$$2}'` \
   --rename-section .data=.init.rodata $< $@
 
 $(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
-- 
2.17.1



Re: [PATCH] [net-next] ipvlan: fix building with modular IPV6

2018-02-26 Thread Matteo Croce
On Mon, Feb 26, 2018 at 10:41 AM, Arnd Bergmann <a...@arndb.de> wrote:
> We no longer depend on IPV6, but that now causes a link error with
> CONFIG_IPV6=m and CONFIG_IPVLAN=y:
>
> drivers/net/ipvlan/ipvlan_core.o: In function `ipvlan_queue_xmit':
> ipvlan_core.c:(.text+0x1440): undefined reference to `ip6_route_output_flags'
> drivers/net/ipvlan/ipvlan_core.o: In function `ipvlan_l3_rcv':
> ipvlan_core.c:(.text+0x1818): undefined reference to `ip6_route_input_lookup'
>
> This adds back the dependency on IPV6, with the option of building without
> IPV6, but forcing IPVLAN to be a module when IPV6 is a module.
>
> Fixes: 94333fac44d1 ("ipvlan: drop ipv6 dependency")
> Signed-off-by: Arnd Bergmann <a...@arndb.de>
> ---
>  drivers/net/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index d88b78a17440..08b85215c2be 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -149,6 +149,7 @@ config MACVTAP
>  config IPVLAN
>  tristate "IP-VLAN support"
>  depends on INET
> +depends on IPV6 || !IPV6
>  depends on NETFILTER
>  select NET_L3_MASTER_DEV
>  ---help---
> --
> 2.9.0
>

Sorry  for that, the fix look very reasonable to me.

Thanks,
-- 
Matteo Croce
per aspera ad upstream


[PATCH net-next v4 2/2] ipvlan: selects master_l3 device instead of depending on it

2018-02-20 Thread Matteo Croce
The L3 Master device is just a glue between the core networking code and
device drivers, so it should be selected automatically rather than
requiring to be enabled explicitly.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 3234c6618d75..d88b78a17440 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -150,7 +150,7 @@ config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
 depends on NETFILTER
-depends on NET_L3_MASTER_DEV
+select NET_L3_MASTER_DEV
 ---help---
   This allows one to create virtual devices off of a main interface
   and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
-- 
2.14.3



[PATCH net-next v4 0/2] Remove IPVlan module dependencies on IPv6 and L3 Master dev

2018-02-20 Thread Matteo Croce
The IPVlan module currently depends on IPv6 and L3 Master dev.
Refactor the code to allow building IPVlan module regardless of the value
of CONFIG_IPV6 as done in other drivers like VxLAN or GENEVE.
Also change the CONFIG_NET_L3_MASTER_DEV dependency into a select,
since compiling L3 Master device alone has little sense.

$ grep -wE 'CONFIG_(IPV6|IPVLAN)' .config
CONFIG_IPV6=y
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
48K drivers/net/ipvlan/ipvlan.ko

$ grep -wE 'CONFIG_(IPV6|IPVLAN)' .config
# CONFIG_IPV6 is not set
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
44K drivers/net/ipvlan/ipvlan.ko

Matteo Croce (2):
  ipvlan: drop ipv6 dependency
  ipvlan: selects master_l3 device instead of depending on it

 drivers/net/Kconfig  |  3 +-
 drivers/net/ipvlan/ipvlan_core.c | 72 ++--
 drivers/net/ipvlan/ipvlan_main.c | 48 +--
 3 files changed, 86 insertions(+), 37 deletions(-)

-- 
2.14.3



[PATCH net-next v4 1/2] ipvlan: drop ipv6 dependency

2018-02-20 Thread Matteo Croce
IPVlan has an hard dependency on IPv6, refactor the ipvlan code to allow
compiling it with IPv6 disabled, move duplicate code into addr_equal()
and refactor series of if-else into a switch.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
v4: more descriptive commit message and fix checkpatch.pl warnings

 drivers/net/Kconfig  |  1 -
 drivers/net/ipvlan/ipvlan_core.c | 72 ++--
 drivers/net/ipvlan/ipvlan_main.c | 48 +--
 3 files changed, 85 insertions(+), 36 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 944ec3c9282c..3234c6618d75 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,7 +149,6 @@ config MACVTAP
 config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
-depends on IPV6
 depends on NETFILTER
 depends on NET_L3_MASTER_DEV
 ---help---
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c1f008fe4e1d..1b5dc200b573 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -35,6 +35,7 @@ void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
 }
 EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
+#if IS_ENABLED(CONFIG_IPV6)
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
const struct in6_addr *ip6_addr = iaddr;
@@ -42,6 +43,12 @@ static u8 ipvlan_get_v6_hash(const void *iaddr)
return __ipv6_addr_jhash(ip6_addr, ipvlan_jhash_secret) &
   IPVLAN_HASH_MASK;
 }
+#else
+static u8 ipvlan_get_v6_hash(const void *iaddr)
+{
+   return 0;
+}
+#endif
 
 static u8 ipvlan_get_v4_hash(const void *iaddr)
 {
@@ -51,6 +58,23 @@ static u8 ipvlan_get_v4_hash(const void *iaddr)
   IPVLAN_HASH_MASK;
 }
 
+static bool addr_equal(bool is_v6, struct ipvl_addr *addr, const void *iaddr)
+{
+   if (!is_v6 && addr->atype == IPVL_IPV4) {
+   struct in_addr *i4addr = (struct in_addr *)iaddr;
+
+   return addr->ip4addr.s_addr == i4addr->s_addr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (is_v6 && addr->atype == IPVL_IPV6) {
+   struct in6_addr *i6addr = (struct in6_addr *)iaddr;
+
+   return ipv6_addr_equal(>ip6addr, i6addr);
+#endif
+   }
+
+   return false;
+}
+
 static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
   const void *iaddr, bool is_v6)
 {
@@ -59,15 +83,9 @@ static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct 
ipvl_port *port,
 
hash = is_v6 ? ipvlan_get_v6_hash(iaddr) :
   ipvlan_get_v4_hash(iaddr);
-   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode) {
-   if (is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr))
-   return addr;
-   else if (!is_v6 && addr->atype == IPVL_IPV4 &&
-addr->ip4addr.s_addr ==
-   ((struct in_addr *)iaddr)->s_addr)
+   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   }
return NULL;
 }
 
@@ -93,13 +111,9 @@ struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev 
*ipvlan,
 {
struct ipvl_addr *addr;
 
-   list_for_each_entry(addr, >addrs, anode) {
-   if ((is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr)) ||
-   (!is_v6 && addr->atype == IPVL_IPV4 &&
-   addr->ip4addr.s_addr == ((struct in_addr *)iaddr)->s_addr))
+   list_for_each_entry(addr, >addrs, anode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   }
return NULL;
 }
 
@@ -150,6 +164,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
lyr3h = ip4h;
break;
}
+#if IS_ENABLED(CONFIG_IPV6)
case htons(ETH_P_IPV6): {
struct ipv6hdr *ip6h;
 
@@ -188,6 +203,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
}
break;
}
+#endif
default:
return NULL;
}
@@ -337,14 +353,18 @@ static struct ipvl_addr *ipvlan_addr_lookup(struct 
ipvl_port *port,
 {
struct ipvl_addr *addr = NULL;
 
-   if (addr_type == IPVL_IPV6) {
+   switch (addr_type) {
+#if IS_ENABLED(CONFIG_IPV6)
+   case IPVL_IPV6: {
struct ipv6hdr *ip6h;
struct in6_addr *i6addr;
 
ip6h = (struct ipv6hdr *)lyr3h;
i6addr = use_dest ? >daddr : >saddr;
addr = ipvlan_ht_addr_lookup(port, i6addr, true);
-   } else if (addr_type == IPV

[PATCH v3 net-next 1/2] ipvlan: drop ipv6 dependency

2018-02-19 Thread Matteo Croce
IPVlan has an hard dependency on IPv6.
Refactor the ipvlan code to allow compiling it with IPv6 disabled, move
duplicate code into addr_equal() and refactor series of if-else into a switch.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
v3: more descriptive commit message and fix checkpatch.pl warnings

 drivers/net/Kconfig  |  1 -
 drivers/net/ipvlan/ipvlan_core.c | 72 ++--
 drivers/net/ipvlan/ipvlan_main.c | 48 +--
 3 files changed, 85 insertions(+), 36 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 944ec3c9282c..3234c6618d75 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,7 +149,6 @@ config MACVTAP
 config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
-depends on IPV6
 depends on NETFILTER
 depends on NET_L3_MASTER_DEV
 ---help---
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c1f008fe4e1d..1b5dc200b573 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -35,6 +35,7 @@ void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
 }
 EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
+#if IS_ENABLED(CONFIG_IPV6)
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
const struct in6_addr *ip6_addr = iaddr;
@@ -42,6 +43,12 @@ static u8 ipvlan_get_v6_hash(const void *iaddr)
return __ipv6_addr_jhash(ip6_addr, ipvlan_jhash_secret) &
   IPVLAN_HASH_MASK;
 }
+#else
+static u8 ipvlan_get_v6_hash(const void *iaddr)
+{
+   return 0;
+}
+#endif
 
 static u8 ipvlan_get_v4_hash(const void *iaddr)
 {
@@ -51,6 +58,23 @@ static u8 ipvlan_get_v4_hash(const void *iaddr)
   IPVLAN_HASH_MASK;
 }
 
+static bool addr_equal(bool is_v6, struct ipvl_addr *addr, const void *iaddr)
+{
+   if (!is_v6 && addr->atype == IPVL_IPV4) {
+   struct in_addr *i4addr = (struct in_addr *)iaddr;
+
+   return addr->ip4addr.s_addr == i4addr->s_addr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (is_v6 && addr->atype == IPVL_IPV6) {
+   struct in6_addr *i6addr = (struct in6_addr *)iaddr;
+
+   return ipv6_addr_equal(>ip6addr, i6addr);
+#endif
+   }
+
+   return false;
+}
+
 static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
   const void *iaddr, bool is_v6)
 {
@@ -59,15 +83,9 @@ static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct 
ipvl_port *port,
 
hash = is_v6 ? ipvlan_get_v6_hash(iaddr) :
   ipvlan_get_v4_hash(iaddr);
-   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode) {
-   if (is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr))
-   return addr;
-   else if (!is_v6 && addr->atype == IPVL_IPV4 &&
-addr->ip4addr.s_addr ==
-   ((struct in_addr *)iaddr)->s_addr)
+   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   }
return NULL;
 }
 
@@ -93,13 +111,9 @@ struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev 
*ipvlan,
 {
struct ipvl_addr *addr;
 
-   list_for_each_entry(addr, >addrs, anode) {
-   if ((is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr)) ||
-   (!is_v6 && addr->atype == IPVL_IPV4 &&
-   addr->ip4addr.s_addr == ((struct in_addr *)iaddr)->s_addr))
+   list_for_each_entry(addr, >addrs, anode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   }
return NULL;
 }
 
@@ -150,6 +164,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
lyr3h = ip4h;
break;
}
+#if IS_ENABLED(CONFIG_IPV6)
case htons(ETH_P_IPV6): {
struct ipv6hdr *ip6h;
 
@@ -188,6 +203,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
}
break;
}
+#endif
default:
return NULL;
}
@@ -337,14 +353,18 @@ static struct ipvl_addr *ipvlan_addr_lookup(struct 
ipvl_port *port,
 {
struct ipvl_addr *addr = NULL;
 
-   if (addr_type == IPVL_IPV6) {
+   switch (addr_type) {
+#if IS_ENABLED(CONFIG_IPV6)
+   case IPVL_IPV6: {
struct ipv6hdr *ip6h;
struct in6_addr *i6addr;
 
ip6h = (struct ipv6hdr *)lyr3h;
i6addr = use_dest ? >daddr : >saddr;
addr = ipvlan_ht_addr_lookup(port, i6addr, true);
-   } else if (addr_type == IPV

Re: [PATCH v2 net-next 0/3] Remove IPVlan module dependencies on IPv6 and L3 Master dev

2018-02-17 Thread Matteo Croce
On Sat, Feb 17, 2018 at 8:11 PM, Matteo Croce <mcr...@redhat.com> wrote:
> The IPVlan module currently depends on IPv6 and L3 Master dev.
> Refactor the code to allow building IPVlan module regardless of the value of
> CONFIG_IPV6 and CONFIG_NETFILTER, and change the CONFIG_NET_L3_MASTER_DEV
> dependency into a select, as compiling L3 Master device alone has no sense.
>
> $ grep -wE 'CONFIG_(IPV6|IPVLAN)' .config
> CONFIG_IPV6=y
> CONFIG_IPVLAN=m
> $ ll drivers/net/ipvlan/ipvlan.ko
> 48K drivers/net/ipvlan/ipvlan.ko
>
> $ grep -wE 'CONFIG_(IPV6|IPVLAN)' .config
> # CONFIG_IPV6 is not set
> CONFIG_IPVLAN=m
> $ ll drivers/net/ipvlan/ipvlan.ko
> 44K drivers/net/ipvlan/ipvlan.ko
>
> Matteo Croce (2):
>   ipvlan: drop ipv6 dependency
>   ipvlan: selects master_l3 device instead of depending on it
>
>  drivers/net/Kconfig  |  3 +-
>  drivers/net/ipvlan/ipvlan_core.c | 71 
> ++--
>  drivers/net/ipvlan/ipvlan_main.c | 48 +--
>  3 files changed, 85 insertions(+), 37 deletions(-)
>
> --
> 2.14.3
>

Just noticed the wrong subject, really it's 0/2

Regards,
-- 
Matteo Croce
per aspera ad upstream


[PATCH v2 net-next 1/2] ipvlan: drop ipv6 dependency

2018-02-17 Thread Matteo Croce
IPVlan has an hard dependency on IPv6.
Refactor the ipvlan code to allow compiling it with IPv6 disabled.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/Kconfig  |  1 -
 drivers/net/ipvlan/ipvlan_core.c | 71 ++--
 drivers/net/ipvlan/ipvlan_main.c | 48 +--
 3 files changed, 84 insertions(+), 36 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 944ec3c9282c..3234c6618d75 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,7 +149,6 @@ config MACVTAP
 config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
-depends on IPV6
 depends on NETFILTER
 depends on NET_L3_MASTER_DEV
 ---help---
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c1f008fe4e1d..653b00738616 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -35,6 +35,7 @@ void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
 }
 EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
+#if IS_ENABLED(CONFIG_IPV6)
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
const struct in6_addr *ip6_addr = iaddr;
@@ -42,6 +43,12 @@ static u8 ipvlan_get_v6_hash(const void *iaddr)
return __ipv6_addr_jhash(ip6_addr, ipvlan_jhash_secret) &
   IPVLAN_HASH_MASK;
 }
+#else
+static u8 ipvlan_get_v6_hash(const void *iaddr)
+{
+   return 0;
+}
+#endif
 
 static u8 ipvlan_get_v4_hash(const void *iaddr)
 {
@@ -51,6 +58,22 @@ static u8 ipvlan_get_v4_hash(const void *iaddr)
   IPVLAN_HASH_MASK;
 }
 
+static bool addr_equal(bool is_v6, struct ipvl_addr *addr, const void *iaddr) {
+   if (!is_v6 && addr->atype == IPVL_IPV4) {
+   struct in_addr *i4addr = (struct in_addr *)iaddr;
+
+   return addr->ip4addr.s_addr == i4addr->s_addr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (is_v6 && addr->atype == IPVL_IPV6) {
+   struct in6_addr *i6addr = (struct in6_addr *)iaddr;
+
+   return ipv6_addr_equal(>ip6addr, i6addr);
+#endif
+   }
+
+   return false;
+}
+
 static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
   const void *iaddr, bool is_v6)
 {
@@ -59,15 +82,9 @@ static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct 
ipvl_port *port,
 
hash = is_v6 ? ipvlan_get_v6_hash(iaddr) :
   ipvlan_get_v4_hash(iaddr);
-   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode) {
-   if (is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr))
+   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   else if (!is_v6 && addr->atype == IPVL_IPV4 &&
-addr->ip4addr.s_addr ==
-   ((struct in_addr *)iaddr)->s_addr)
-   return addr;
-   }
return NULL;
 }
 
@@ -93,13 +110,9 @@ struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev 
*ipvlan,
 {
struct ipvl_addr *addr;
 
-   list_for_each_entry(addr, >addrs, anode) {
-   if ((is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr)) ||
-   (!is_v6 && addr->atype == IPVL_IPV4 &&
-   addr->ip4addr.s_addr == ((struct in_addr *)iaddr)->s_addr))
+   list_for_each_entry(addr, >addrs, anode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   }
return NULL;
 }
 
@@ -150,6 +163,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
lyr3h = ip4h;
break;
}
+#if IS_ENABLED(CONFIG_IPV6)
case htons(ETH_P_IPV6): {
struct ipv6hdr *ip6h;
 
@@ -188,6 +202,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
}
break;
}
+#endif
default:
return NULL;
}
@@ -337,14 +352,18 @@ static struct ipvl_addr *ipvlan_addr_lookup(struct 
ipvl_port *port,
 {
struct ipvl_addr *addr = NULL;
 
-   if (addr_type == IPVL_IPV6) {
+   switch (addr_type) {
+#if IS_ENABLED(CONFIG_IPV6)
+   case IPVL_IPV6: {
struct ipv6hdr *ip6h;
struct in6_addr *i6addr;
 
ip6h = (struct ipv6hdr *)lyr3h;
i6addr = use_dest ? >daddr : >saddr;
addr = ipvlan_ht_addr_lookup(port, i6addr, true);
-   } else if (addr_type == IPVL_ICMPV6) {
+   break;
+   }
+   case IPVL_ICMPV6: {
struct nd_msg *ndmh;
struct in6_addr *i6addr;

[PATCH v2 net-next 0/3] Remove IPVlan module dependencies on IPv6 and L3 Master dev

2018-02-17 Thread Matteo Croce
The IPVlan module currently depends on IPv6 and L3 Master dev.
Refactor the code to allow building IPVlan module regardless of the value of
CONFIG_IPV6 and CONFIG_NETFILTER, and change the CONFIG_NET_L3_MASTER_DEV
dependency into a select, as compiling L3 Master device alone has no sense.

$ grep -wE 'CONFIG_(IPV6|IPVLAN)' .config
CONFIG_IPV6=y
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
48K drivers/net/ipvlan/ipvlan.ko

$ grep -wE 'CONFIG_(IPV6|IPVLAN)' .config
# CONFIG_IPV6 is not set
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
44K drivers/net/ipvlan/ipvlan.ko

Matteo Croce (2):
  ipvlan: drop ipv6 dependency
  ipvlan: selects master_l3 device instead of depending on it

 drivers/net/Kconfig  |  3 +-
 drivers/net/ipvlan/ipvlan_core.c | 71 ++--
 drivers/net/ipvlan/ipvlan_main.c | 48 +--
 3 files changed, 85 insertions(+), 37 deletions(-)

-- 
2.14.3



[PATCH v2 net-next 2/2] ipvlan: selects master_l3 device instead of depending on it

2018-02-17 Thread Matteo Croce
The L3 Master device is just a glue between the core networking code and
device drivers, so it should be selected automatically rather than
requiring to be enabled explicitly.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 3234c6618d75..d88b78a17440 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -150,7 +150,7 @@ config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
 depends on NETFILTER
-depends on NET_L3_MASTER_DEV
+select NET_L3_MASTER_DEV
 ---help---
   This allows one to create virtual devices off of a main interface
   and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
-- 
2.14.3



Re: [PATCH 0/3] Remove IPVlan module dependencies on IPv6 and Netfilter

2018-02-15 Thread Matteo Croce
On Thu, Feb 15, 2018 at 2:11 AM, David Miller <da...@davemloft.net> wrote:
> From: Matteo Croce <mcr...@redhat.com>
> Date: Wed, 14 Feb 2018 19:13:42 +0100
>
>> The IPVlan module currently depends on IPv6 and Netfilter.
>> Refactor the code to allow building IPVlan module regardless of the value of
>> CONFIG_IPV6 and CONFIG_NETFILTER.
>> Also change the dependency to CONFIG_NET_L3_MASTER_DEV into a select,
>> as compiling L3 Master device alone has no sense.
>
> As stated, the L3 master and netfilter are hard depenencies when using
> ipvlan in some modes.
>
> You can't just ifdef the driver like this, it changes fundamental
> pieces of functionality.
>
> I would say leave things as they are right now.

Hi David,

yes, I noticed that L3 master and netfilter are really needed in l3s
mode, so let's drop patch 2/3.

What about the other two, removing IPv6 and change the Kconfig?
Other devices like VXLan, Geneve and VRF uses the same architecture to
allow conditional compilation of the IPv6 module,
I think that IPVlan should do the same.

Regards,
-- 
Matteo Croce
per aspera ad upstream


[PATCH 1/3] ipvlan: drop ipv6 dependency

2018-02-14 Thread Matteo Croce
IPVlan has an hard dependency on IPv6.
Refactor the ipvlan code to allow compiling it with IPv6 disabled.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/Kconfig  |  1 -
 drivers/net/ipvlan/ipvlan_core.c | 71 ++--
 drivers/net/ipvlan/ipvlan_main.c | 48 +--
 3 files changed, 84 insertions(+), 36 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 944ec3c9282c..3234c6618d75 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,7 +149,6 @@ config MACVTAP
 config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
-depends on IPV6
 depends on NETFILTER
 depends on NET_L3_MASTER_DEV
 ---help---
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c1f008fe4e1d..653b00738616 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -35,6 +35,7 @@ void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
 }
 EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
+#if IS_ENABLED(CONFIG_IPV6)
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
const struct in6_addr *ip6_addr = iaddr;
@@ -42,6 +43,12 @@ static u8 ipvlan_get_v6_hash(const void *iaddr)
return __ipv6_addr_jhash(ip6_addr, ipvlan_jhash_secret) &
   IPVLAN_HASH_MASK;
 }
+#else
+static u8 ipvlan_get_v6_hash(const void *iaddr)
+{
+   return 0;
+}
+#endif
 
 static u8 ipvlan_get_v4_hash(const void *iaddr)
 {
@@ -51,6 +58,22 @@ static u8 ipvlan_get_v4_hash(const void *iaddr)
   IPVLAN_HASH_MASK;
 }
 
+static bool addr_equal(bool is_v6, struct ipvl_addr *addr, const void *iaddr) {
+   if (!is_v6 && addr->atype == IPVL_IPV4) {
+   struct in_addr *i4addr = (struct in_addr *)iaddr;
+
+   return addr->ip4addr.s_addr == i4addr->s_addr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (is_v6 && addr->atype == IPVL_IPV6) {
+   struct in6_addr *i6addr = (struct in6_addr *)iaddr;
+
+   return ipv6_addr_equal(>ip6addr, i6addr);
+#endif
+   }
+
+   return false;
+}
+
 static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
   const void *iaddr, bool is_v6)
 {
@@ -59,15 +82,9 @@ static struct ipvl_addr *ipvlan_ht_addr_lookup(const struct 
ipvl_port *port,
 
hash = is_v6 ? ipvlan_get_v6_hash(iaddr) :
   ipvlan_get_v4_hash(iaddr);
-   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode) {
-   if (is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr))
+   hlist_for_each_entry_rcu(addr, >hlhead[hash], hlnode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   else if (!is_v6 && addr->atype == IPVL_IPV4 &&
-addr->ip4addr.s_addr ==
-   ((struct in_addr *)iaddr)->s_addr)
-   return addr;
-   }
return NULL;
 }
 
@@ -93,13 +110,9 @@ struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev 
*ipvlan,
 {
struct ipvl_addr *addr;
 
-   list_for_each_entry(addr, >addrs, anode) {
-   if ((is_v6 && addr->atype == IPVL_IPV6 &&
-   ipv6_addr_equal(>ip6addr, iaddr)) ||
-   (!is_v6 && addr->atype == IPVL_IPV4 &&
-   addr->ip4addr.s_addr == ((struct in_addr *)iaddr)->s_addr))
+   list_for_each_entry(addr, >addrs, anode)
+   if (addr_equal(is_v6, addr, iaddr))
return addr;
-   }
return NULL;
 }
 
@@ -150,6 +163,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
lyr3h = ip4h;
break;
}
+#if IS_ENABLED(CONFIG_IPV6)
case htons(ETH_P_IPV6): {
struct ipv6hdr *ip6h;
 
@@ -188,6 +202,7 @@ static void *ipvlan_get_L3_hdr(struct ipvl_port *port, 
struct sk_buff *skb, int
}
break;
}
+#endif
default:
return NULL;
}
@@ -337,14 +352,18 @@ static struct ipvl_addr *ipvlan_addr_lookup(struct 
ipvl_port *port,
 {
struct ipvl_addr *addr = NULL;
 
-   if (addr_type == IPVL_IPV6) {
+   switch (addr_type) {
+#if IS_ENABLED(CONFIG_IPV6)
+   case IPVL_IPV6: {
struct ipv6hdr *ip6h;
struct in6_addr *i6addr;
 
ip6h = (struct ipv6hdr *)lyr3h;
i6addr = use_dest ? >daddr : >saddr;
addr = ipvlan_ht_addr_lookup(port, i6addr, true);
-   } else if (addr_type == IPVL_ICMPV6) {
+   break;
+   }
+   case IPVL_ICMPV6: {
struct nd_msg *ndmh;
struct in6_addr *i6addr;

[PATCH 0/3] Remove IPVlan module dependencies on IPv6 and Netfilter

2018-02-14 Thread Matteo Croce
The IPVlan module currently depends on IPv6 and Netfilter.
Refactor the code to allow building IPVlan module regardless of the value of
CONFIG_IPV6 and CONFIG_NETFILTER.
Also change the dependency to CONFIG_NET_L3_MASTER_DEV into a select,
as compiling L3 Master device alone has no sense.

$ grep -wE 'CONFIG_(IPV6|NETFILTER|IPVLAN)' .config
CONFIG_IPV6=y
CONFIG_NETFILTER=y
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
48K drivers/net/ipvlan/ipvlan.ko

$ grep -wE 'CONFIG_(IPV6|NETFILTER|IPVLAN)' .config
# CONFIG_IPV6 is not set
CONFIG_NETFILTER=y
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
44K drivers/net/ipvlan/ipvlan.ko

$ grep -wE 'CONFIG_(IPV6|NETFILTER|IPVLAN)' .config
CONFIG_IPV6=m
# CONFIG_NETFILTER is not set
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
46K drivers/net/ipvlan/ipvlan.ko

$ grep -wE 'CONFIG_(IPV6|NETFILTER|IPVLAN)' .config
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set
CONFIG_IPVLAN=m
$ ll drivers/net/ipvlan/ipvlan.ko
43K drivers/net/ipvlan/ipvlan.ko

Matteo Croce (3):
  ipvlan: drop ipv6 dependency
  ipvlan: drop netfilter dependency
  ipvlan: selects master_l3 device instead of depending on it

 drivers/net/Kconfig  |  4 +-
 drivers/net/ipvlan/ipvlan.h  |  2 +
 drivers/net/ipvlan/ipvlan_core.c | 73 -
 drivers/net/ipvlan/ipvlan_main.c | 79 +++-
 4 files changed, 111 insertions(+), 47 deletions(-)

-- 
2.14.3



[PATCH 3/3] ipvlan: selects master_l3 device instead of depending on it

2018-02-14 Thread Matteo Croce
The L3 Master device is just a glue between the core networking code and
device drivers, so it should be selected automatically rather than
requiring to be enabled explicitly.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 64d3017ecd01..3dd8a6a6a336 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,7 +149,7 @@ config MACVTAP
 config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
-depends on NET_L3_MASTER_DEV
+select NET_L3_MASTER_DEV
 ---help---
   This allows one to create virtual devices off of a main interface
   and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
-- 
2.14.3



[PATCH 2/3] ipvlan: drop netfilter dependency

2018-02-14 Thread Matteo Croce
IPVlan has an hard dependency on netfilter.
Refactor the ipvlan code to allow compiling it with netfilter disabled.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/Kconfig  |  1 -
 drivers/net/ipvlan/ipvlan.h  |  2 ++
 drivers/net/ipvlan/ipvlan_core.c |  2 ++
 drivers/net/ipvlan/ipvlan_main.c | 31 ++-
 4 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 3234c6618d75..64d3017ecd01 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,7 +149,6 @@ config MACVTAP
 config IPVLAN
 tristate "IP-VLAN support"
 depends on INET
-depends on NETFILTER
 depends on NET_L3_MASTER_DEV
 ---help---
   This allows one to create virtual devices off of a main interface
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 5166575a164d..b7fa5a48a351 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -166,8 +166,10 @@ bool ipvlan_addr_busy(struct ipvl_port *port, void *iaddr, 
bool is_v6);
 void ipvlan_ht_addr_del(struct ipvl_addr *addr);
 struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
  u16 proto);
+#ifdef CONFIG_NETFILTER
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 const struct nf_hook_state *state);
+#endif
 void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
 unsigned int len, bool success, bool mcast);
 int ipvlan_link_new(struct net *src_net, struct net_device *dev,
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 653b00738616..5be846bc6d8c 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -821,6 +821,7 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, 
struct sk_buff *skb,
return skb;
 }
 
+#ifdef CONFIG_NETFILTER
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 const struct nf_hook_state *state)
 {
@@ -837,3 +838,4 @@ unsigned int ipvlan_nf_input(void *priv, struct sk_buff 
*skb,
 out:
return NF_ACCEPT;
 }
+#endif
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 67c91ceda979..2e311251c27c 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -15,6 +15,16 @@ struct ipvlan_netns {
unsigned int ipvl_nf_hook_refcnt;
 };
 
+static const struct l3mdev_ops ipvl_l3mdev_ops = {
+   .l3mdev_l3_rcv = ipvlan_l3_rcv,
+};
+
+static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
+{
+   ipvlan->dev->mtu = dev->mtu;
+}
+
+#ifdef CONFIG_NETFILTER
 static const struct nf_hook_ops ipvl_nfops[] = {
{
.hook = ipvlan_nf_input,
@@ -32,15 +42,6 @@ static const struct nf_hook_ops ipvl_nfops[] = {
 #endif
 };
 
-static const struct l3mdev_ops ipvl_l3mdev_ops = {
-   .l3mdev_l3_rcv = ipvlan_l3_rcv,
-};
-
-static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
-{
-   ipvlan->dev->mtu = dev->mtu;
-}
-
 static int ipvlan_register_nf_hook(struct net *net)
 {
struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
@@ -70,6 +71,16 @@ static void ipvlan_unregister_nf_hook(struct net *net)
nf_unregister_net_hooks(net, ipvl_nfops,
ARRAY_SIZE(ipvl_nfops));
 }
+#else
+static int ipvlan_register_nf_hook(struct net *net)
+{
+   return 0;
+}
+
+static void ipvlan_unregister_nf_hook(struct net *net)
+{
+}
+#endif
 
 static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
 {
@@ -1015,8 +1026,10 @@ static void ipvlan_ns_exit(struct net *net)
 
if (WARN_ON_ONCE(vnet->ipvl_nf_hook_refcnt)) {
vnet->ipvl_nf_hook_refcnt = 0;
+#ifdef CONFIG_NETFILTER
nf_unregister_net_hooks(net, ipvl_nfops,
ARRAY_SIZE(ipvl_nfops));
+#endif
}
 }
 
-- 
2.14.3



Re: [PATCH net] ipv6: set all.accept_dad to 0 by default

2017-11-13 Thread Matteo Croce
On Mon, Nov 13, 2017 at 3:21 PM, Erik Kline <e...@google.com> wrote:
> Should we consider rolling back the patch that caused this?
> "accept_dad = 1" is the proper IETF-expected default behaviour.
>
> Alternatively, if we really want to make all, default, and ifname
> useful perhaps we need to investigate a tristate option (for currently
> boolean values, at least).  -1 could mean no preference, for example.
>
> On 13 November 2017 at 13:45, Nicolas Dichtel <nicolas.dich...@6wind.com> 
> wrote:
>> The commit a2d3f3e33853 modifies the way to disable dad on an interface.
>> Before the patch, setting .accept_dad to 0 was enough to disable it.
>> Because all.accept_dad is set to 1 by default, after the patch, the user
>> needs to set both all.accept_dad and .accept_dad to 0 to disable it.
>>
>> This is not backward compatible. When a user updates its kernel, the dad
>> may be enabled by error.
>>
>> Let's set all.accept_dad to 0 by default to restore the previous behavior.
>>
>> Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for 
>> real")
>> CC: Stefano Brivio <sbri...@redhat.com>
>> CC: Matteo Croce <mcr...@redhat.com>
>> CC: Erik Kline <e...@google.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
>> ---
>>  net/ipv6/addrconf.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 8a1c846d3df9..ef5b61507b9a 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -231,7 +231,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>> .proxy_ndp  = 0,
>> .accept_source_route= 0,/* we do not accept RH0 by default. 
>> */
>> .disable_ipv6   = 0,
>> -   .accept_dad = 1,
>> +   .accept_dad = 0,
>> .suppress_frag_ndisc= 1,
>> .accept_ra_mtu  = 1,
>> .stable_secret  = {
>> --
>> 2.13.2
>>

Another way could be putting the "all" and per interface handlers in
logical AND.
The fact is before the patch, the "all" handler really was a noop.
What do you think?

-- 
Matteo Croce
per aspera ad upstream


[PATCH] ppp: allow usage in namespaces

2017-10-27 Thread Matteo Croce
Check for CAP_NET_ADMIN with ns_capable() instead of capable()
to allow usage of ppp in user namespace other than the init one.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/ppp/ppp_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 6566107cef84..af7f93ed1487 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -390,7 +390,7 @@ static int ppp_open(struct inode *inode, struct file *file)
/*
 * This could (should?) be enforced by the permissions on /dev/ppp.
 */
-   if (!capable(CAP_NET_ADMIN))
+   if (!ns_capable(file->f_cred->user_ns, CAP_NET_ADMIN))
return -EPERM;
return 0;
 }
-- 
2.13.6



[PATCH v2] udp: make some messages more descriptive

2017-10-19 Thread Matteo Croce
In the UDP code there are two leftover error messages with very few meaning.
Replace them with a more descriptive error message as some users
reported them as "strange network error".

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
v2: added proper signed off tag

 net/ipv4/udp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e45177ceb0ee..806b298a3bdd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1061,7 +1061,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
/* ... which is an evident application bug. --ANK */
release_sock(sk);
 
-   net_dbg_ratelimited("cork app bug 2\n");
+   net_dbg_ratelimited("socket already corked\n");
err = -EINVAL;
goto out;
}
@@ -1144,7 +1144,7 @@ int udp_sendpage(struct sock *sk, struct page *page, int 
offset,
if (unlikely(!up->pending)) {
release_sock(sk);
 
-   net_dbg_ratelimited("udp cork app bug 3\n");
+   net_dbg_ratelimited("cork failed\n");
return -EINVAL;
}
 
-- 
2.13.6



[PATCH] udp: make some messages more descriptive

2017-10-17 Thread Matteo Croce
In the UDP code there are two leftover error messages with very few meaning.
Replace them with a more descriptive error message as some users
reported them as "strange network error".
---
 net/ipv4/udp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e45177ceb0ee..806b298a3bdd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1061,7 +1061,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
/* ... which is an evident application bug. --ANK */
release_sock(sk);
 
-   net_dbg_ratelimited("cork app bug 2\n");
+   net_dbg_ratelimited("socket already corked\n");
err = -EINVAL;
goto out;
}
@@ -1144,7 +1144,7 @@ int udp_sendpage(struct sock *sk, struct page *page, int 
offset,
if (unlikely(!up->pending)) {
release_sock(sk);
 
-   net_dbg_ratelimited("udp cork app bug 3\n");
+   net_dbg_ratelimited("cork failed\n");
return -EINVAL;
}
 
-- 
2.13.6



[PATCH net-next] icmp: don't fail on fragment reassembly time exceeded

2017-10-12 Thread Matteo Croce
The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).

However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).

Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.

The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:

  # start netserver in the test namespace
  ip netns add test
  ip netns exec test netserver

  # create a VETH pair
  ip link add name veth0 type veth peer name veth0 netns test
  ip link set veth0 up
  ip -n test link set veth0 up

  for i in $(seq 20 29); do
  # assign addresses to both ends
  ip addr add dev veth0 192.168.$i.1/24
  ip -n test addr add dev veth0 192.168.$i.2/24

  # start the traffic
  netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
  done

  # wait
  send_data: data send error: No route to host (errno 113)
  netperf: send_omni: send_data failed: No route to host

We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".

While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 net/ipv4/icmp.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 681e33998e03..3c1570d3e22f 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -782,7 +782,7 @@ static bool icmp_tag_validation(int proto)
 }
 
 /*
- * Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEED, ICMP_QUENCH, and
+ * Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEEDED, ICMP_QUENCH, and
  * ICMP_PARAMETERPROB.
  */
 
@@ -810,7 +810,8 @@ static bool icmp_unreach(struct sk_buff *skb)
if (iph->ihl < 5) /* Mangled header, drop. */
goto out_err;
 
-   if (icmph->type == ICMP_DEST_UNREACH) {
+   switch (icmph->type) {
+   case ICMP_DEST_UNREACH:
switch (icmph->code & 15) {
case ICMP_NET_UNREACH:
case ICMP_HOST_UNREACH:
@@ -846,8 +847,16 @@ static bool icmp_unreach(struct sk_buff *skb)
}
if (icmph->code > NR_ICMP_UNREACH)
goto out;
-   } else if (icmph->type == ICMP_PARAMETERPROB)
+   break;
+   case ICMP_PARAMETERPROB:
info = ntohl(icmph->un.gateway) >> 24;
+   break;
+   case ICMP_TIME_EXCEEDED:
+   __ICMP_INC_STATS(net, ICMP_MIB_INTIMEEXCDS);
+   if (icmph->code == ICMP_EXC_FRAGTIME)
+   goto out;
+   break;
+   }
 
/*
 *  Throw it at our lower layers
-- 
2.13.6



[RFC net-next] ppp: allow usage in namespaces

2017-10-09 Thread Matteo Croce
Check for CAP_NET_ADMIN with ns_capable() instead of capable()
to allow usage of ppp in user namespace other than the init one.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 drivers/net/ppp/ppp_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a404552555d4..be5634b4835b 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -389,7 +389,7 @@ static int ppp_open(struct inode *inode, struct file *file)
/*
 * This could (should?) be enforced by the permissions on /dev/ppp.
 */
-   if (!capable(CAP_NET_ADMIN))
+   if (!ns_capable(file->f_cred->user_ns, CAP_NET_ADMIN))
return -EPERM;
return 0;
 }
-- 
2.13.6



[PATCH] ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real

2017-10-05 Thread Matteo Croce
Commit 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
was intended to affect accept_dad flag handling in such a way that
DAD operation and mode on a given interface would be selected
according to the maximum value of conf/{all,interface}/accept_dad.

However, addrconf_dad_begin() checks for particular cases in which we
need to skip DAD, and this check was modified in the wrong way.

Namely, it was modified so that, if the accept_dad flag is 0 for the
given interface *or* for all interfaces, DAD would be skipped.

We have instead to skip DAD if accept_dad is 0 for the given interface
*and* for all interfaces.

Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Acked-by: Stefano Brivio <sbri...@redhat.com>
Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 net/ipv6/addrconf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 96861c702c06..4a96ebbf8eda 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3820,8 +3820,8 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
goto out;
 
if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
-   dev_net(dev)->ipv6.devconf_all->accept_dad < 1 ||
-   idev->cnf.accept_dad < 1 ||
+   (dev_net(dev)->ipv6.devconf_all->accept_dad < 1 &&
+idev->cnf.accept_dad < 1) ||
!(ifp->flags_F_TENTATIVE) ||
ifp->flags & IFA_F_NODAD) {
bump_id = ifp->flags & IFA_F_TENTATIVE;
-- 
2.13.6



Re: [PATCH net] ipv6: fix net.ipv6.conf.all interface DAD handlers

2017-09-29 Thread Matteo Croce
On Thu, Sep 28, 2017 at 1:22 PM, Erik Kline <e...@google.com> wrote:
> Upon further reflection, doesn't the whole premise of this change
> means that it's no longer possible to selectively disable these
> features if they are set on "all"?  Or are we saying that this mode is
> only support with "default" enable + "ifname" disable?

Hi Erik, thanks for the review.
Yes the behaviour seems wrong when writing all.accept_dad respect what
the documentation says.

BTW the previous behaviour was not defined, I put them in OR because
that's what other handlers do, eg. send_redirects.
If you think that it's better to put them in AND we can change the
documentation accordingly.
What do you think?

-- 
Matteo Croce
per aspera ad upstream


[PATCH net] ipv6: fix net.ipv6.conf.all interface DAD handlers

2017-09-12 Thread Matteo Croce
Currently, writing into
net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect.
Fix handling of these flags by:

- using the maximum of global and per-interface values for the
  accept_dad flag. That is, if at least one of the two values is
  non-zero, enable DAD on the interface. If at least one value is
  set to 2, enable DAD and disable IPv6 operation on the interface if
  MAC-based link-local address was found

- using the logical OR of global and per-interface values for the
  optimistic_dad flag. If at least one of them is set to one, optimistic
  duplicate address detection (RFC 4429) is enabled on the interface

- using the logical OR of global and per-interface values for the
  use_optimistic flag. If at least one of them is set to one,
  optimistic addresses won't be marked as deprecated during source address
  selection on the interface.

While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(),
drop inline, and let the compiler decide.

Fixes: 7fd2561e4ebd ("net: ipv6: Add a sysctl to make optimistic addresses 
useful candidates")
Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 Documentation/networking/ip-sysctl.txt | 18 ++
 net/ipv6/addrconf.c| 27 ---
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index b3345d0fe0a6..77f4de59dc9c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1680,6 +1680,9 @@ accept_dad - INTEGER
2: Enable DAD, and disable IPv6 operation if MAC-based duplicate
   link-local address has been found.
 
+   DAD operation and mode on a given interface will be selected according
+   to the maximum value of conf/{all,interface}/accept_dad.
+
 force_tllao - BOOLEAN
Enable sending the target link-layer address option even when
responding to a unicast neighbor solicitation.
@@ -1727,16 +1730,23 @@ suppress_frag_ndisc - INTEGER
 
 optimistic_dad - BOOLEAN
Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
-   0: disabled (default)
-   1: enabled
+   0: disabled (default)
+   1: enabled
+
+   Optimistic Duplicate Address Detection for the interface will be enabled
+   if at least one of conf/{all,interface}/optimistic_dad is set to 1,
+   it will be disabled otherwise.
 
 use_optimistic - BOOLEAN
If enabled, do not classify optimistic addresses as deprecated during
source address selection.  Preferred addresses will still be chosen
before optimistic addresses, subject to other ranking in the source
address selection algorithm.
-   0: disabled (default)
-   1: enabled
+   0: disabled (default)
+   1: enabled
+
+   This will be enabled if at least one of
+   conf/{all,interface}/use_optimistic is set to 1, disabled otherwise.
 
 stable_secret - IPv6 address
This IPv6 address will be used as a secret to generate IPv6
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c2e2a78787ec..774d8794248a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1399,10 +1399,18 @@ static inline int ipv6_saddr_preferred(int type)
return 0;
 }
 
-static inline bool ipv6_use_optimistic_addr(struct inet6_dev *idev)
+static bool ipv6_use_optimistic_addr(struct net *net,
+struct inet6_dev *idev)
 {
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
-   return idev && idev->cnf.optimistic_dad && idev->cnf.use_optimistic;
+   if (!idev)
+   return false;
+   if (!net->ipv6.devconf_all->optimistic_dad && !idev->cnf.optimistic_dad)
+   return false;
+   if (!net->ipv6.devconf_all->use_optimistic && !idev->cnf.use_optimistic)
+   return false;
+
+   return true;
 #else
return false;
 #endif
@@ -1472,7 +1480,7 @@ static int ipv6_get_saddr_eval(struct net *net,
/* Rule 3: Avoid deprecated and optimistic addresses */
u8 avoid = IFA_F_DEPRECATED;
 
-   if (!ipv6_use_optimistic_addr(score->ifa->idev))
+   if (!ipv6_use_optimistic_addr(net, score->ifa->idev))
avoid |= IFA_F_OPTIMISTIC;
ret = ipv6_saddr_preferred(score->addr_type) ||
  !(score->ifa->flags & avoid);
@@ -2460,7 +2468,8 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct 
net_device *dev,
int max_addresses = in6_dev->cnf.max_addresses;
 
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
-   if (in6_dev->cnf.optimistic_dad &&
+   if ((net->ipv6.devconf_all->optimistic_dad ||
+in6_dev->cnf.optimistic_dad) &&

Re: hung task in mac80211

2017-09-06 Thread Matteo Croce
On Wed, Sep 6, 2017 at 2:58 PM, Johannes Berg <johan...@sipsolutions.net> wrote:
> On Wed, 2017-09-06 at 13:57 +0200, Matteo Croce wrote:
>
>> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
>> The problem is present both on my AP and on my notebook,
>> so it seems it affects AP and STA mode as well.
>> The generated messages are:
>>
>> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
>>   Not tainted 4.13.0 #57
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>> kworker/u16:6   D0   120  2 0x
>> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
>> Call Trace:
>>  ? __schedule+0x174/0x5b0
>>  ? schedule+0x31/0x80
>>  ? schedule_preempt_disabled+0x9/0x10
>>  ? __mutex_lock.isra.2+0x163/0x480
>>  ? select_task_rq_fair+0xb9f/0xc60
>>  ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>>  ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>
> Yeah - obviously as Stefano found, both take >ampdu_mlme.mtx.
>
> Can you try this?
>
> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
> index 2b36eff5d97e..d8d32776031e 100644
> --- a/net/mac80211/agg-rx.c
> +++ b/net/mac80211/agg-rx.c
> @@ -245,10 +245,10 @@ static void ieee80211_send_addba_resp(struct 
> ieee80211_sub_if_data *sdata, u8 *d
> ieee80211_tx_skb(sdata, skb);
>  }
>
> -void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> -u8 dialog_token, u16 timeout,
> -u16 start_seq_num, u16 ba_policy, u16 
> tid,
> -u16 buf_size, bool tx, bool auto_seq)
> +void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
> + u8 dialog_token, u16 timeout,
> + u16 start_seq_num, u16 ba_policy, u16 
> tid,
> + u16 buf_size, bool tx, bool auto_seq)
>  {
> struct ieee80211_local *local = sta->sdata->local;
> struct tid_ampdu_rx *tid_agg_rx;
> @@ -267,7 +267,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> ht_dbg(sta->sdata,
>"STA %pM requests BA session on unsupported tid %d\n",
>sta->sta.addr, tid);
> -   goto end_no_lock;
> +   goto end;
> }
>
> if (!sta->sta.ht_cap.ht_supported) {
> @@ -275,14 +275,14 @@ void __ieee80211_start_rx_ba_session(struct sta_info 
> *sta,
>"STA %pM erroneously requests BA session on tid %d w/o 
> QoS\n",
>sta->sta.addr, tid);
> /* send a response anyway, it's an error case if we get here 
> */
> -   goto end_no_lock;
> +   goto end;
> }
>
> if (test_sta_flag(sta, WLAN_STA_BLOCK_BA)) {
> ht_dbg(sta->sdata,
>"Suspend in progress - Denying ADDBA request (%pM tid 
> %d)\n",
>sta->sta.addr, tid);
> -   goto end_no_lock;
> +   goto end;
> }
>
> /* sanity check for incoming parameters:
> @@ -296,7 +296,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
> ht_dbg_ratelimited(sta->sdata,
>"AddBA Req with bad params from %pM on tid 
> %u. policy %d, buffer size %d\n",
>sta->sta.addr, tid, ba_policy, buf_size);
> -   goto end_no_lock;
> +   goto end;
> }
> /* determine default buffer size */
> if (buf_size == 0)
> @@ -311,7 +311,6 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
>buf_size, sta->sta.addr);
>
> /* examine state machine */
> -   mutex_lock(>ampdu_mlme.mtx);
>
> if (test_bit(tid, sta->ampdu_mlme.agg_session_valid)) {
> if (sta->ampdu_mlme.tid_rx_token[tid] == dialog_token) {
> @@ -415,15 +414,25 @@ void __ieee80211_start_rx_ba_session(struct sta_info 
> *sta,
> __clear_bit(tid, sta->ampdu_mlme.unexpected_agg);
> sta->ampdu_mlme.tid_rx_token[tid] = dialog_token;
> }
> -   mutex_unlock(>ampdu_mlme.mtx);
>
> -end_no_lock:
> if (tx)
> ieee80211_send_addba_resp(sta->sdata, sta->sta.addr, tid,
>   dialog_token, status, 1, buf_size,
>

Re: hung task in mac80211

2017-09-06 Thread Matteo Croce
On Wed, Sep 6, 2017 at 2:40 PM, Stefano Brivio <sbri...@redhat.com> wrote:
> On Wed, 6 Sep 2017 13:57:47 +0200
> Matteo Croce <mcr...@redhat.com> wrote:
>
>> Hi,
>>
>> I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
>> The problem is present both on my AP and on my notebook,
>> so it seems it affects AP and STA mode as well.
>> The generated messages are:
>>
>> INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
>>   Not tainted 4.13.0 #57
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/u16:6   D0   120  2 0x
>> Workqueue: phy0 ieee80211_ba_session_work [mac80211]
>> Call Trace:
>>  ? __schedule+0x174/0x5b0
>>  ? schedule+0x31/0x80
>>  ? schedule_preempt_disabled+0x9/0x10
>>  ? __mutex_lock.isra.2+0x163/0x480
>>  ? select_task_rq_fair+0xb9f/0xc60
>>  ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>>  ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
>
> This is ugly and maybe wrong, but you could check perhaps...:
>
> diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
> index c92df492e898..bd7512a656f2 100644
> --- a/net/mac80211/ht.c
> +++ b/net/mac80211/ht.c
> @@ -320,28 +320,40 @@ void ieee80211_ba_session_work(struct work_struct *work)
>
> mutex_lock(>ampdu_mlme.mtx);
> for (tid = 0; tid < IEEE80211_NUM_TIDS; tid++) {
> -   if (test_and_clear_bit(tid, 
> sta->ampdu_mlme.tid_rx_timer_expired))
> +   if (test_and_clear_bit(tid, 
> sta->ampdu_mlme.tid_rx_timer_expired)) {
> +   mutex_unlock(>ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
> sta, tid, WLAN_BACK_RECIPIENT,
> WLAN_REASON_QSTA_TIMEOUT, true);
> +   mutex_lock(>ampdu_mlme.mtx);
> +   }
>
> if (test_and_clear_bit(tid,
> -  sta->ampdu_mlme.tid_rx_stop_requested))
> +  
> sta->ampdu_mlme.tid_rx_stop_requested)) {
> +   mutex_unlock(>ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
> sta, tid, WLAN_BACK_RECIPIENT,
> WLAN_REASON_UNSPECIFIED, true);
> +   mutex_lock(>ampdu_mlme.mtx);
> +   }
>
> if (test_and_clear_bit(tid,
> -  sta->ampdu_mlme.tid_rx_manage_offl))
> +  sta->ampdu_mlme.tid_rx_manage_offl)) {
> +   mutex_unlock(>ampdu_mlme.mtx);
> __ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
> 
> IEEE80211_MAX_AMPDU_BUF,
> false, true);
> +   mutex_lock(>ampdu_mlme.mtx);
> +   }
>
> if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
> -  sta->ampdu_mlme.tid_rx_manage_offl))
> +  sta->ampdu_mlme.tid_rx_manage_offl)) {
> +   mutex_unlock(>ampdu_mlme.mtx);
> ___ieee80211_stop_rx_ba_session(
>     sta, tid, WLAN_BACK_RECIPIENT,
> 0, false);
> +   mutex_lock(>ampdu_mlme.mtx);
> +   }
>
> spin_lock_bh(>lock);
>
> --
> Stefano
>

ACK, I have it running since 12 minutes.
The hang usually appears shortly after boot as I set
kernel.hung_task_timeout_secs=10

-- 
Matteo Croce
per aspera ad upstream


hung task in mac80211

2017-09-06 Thread Matteo Croce
Hi,

I have an hung task on vanilla 4.13 kernel which I haven't on 4.12.
The problem is present both on my AP and on my notebook,
so it seems it affects AP and STA mode as well.
The generated messages are:

INFO: task kworker/u16:6:120 blocked for more than 120 seconds.
  Not tainted 4.13.0 #57
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6   D0   120  2 0x
Workqueue: phy0 ieee80211_ba_session_work [mac80211]
Call Trace:
 ? __schedule+0x174/0x5b0
 ? schedule+0x31/0x80
 ? schedule_preempt_disabled+0x9/0x10
 ? __mutex_lock.isra.2+0x163/0x480
 ? select_task_rq_fair+0xb9f/0xc60
 ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
 ? __ieee80211_start_rx_ba_session+0x135/0x4d0 [mac80211]
 ? try_to_wake_up+0x1f1/0x340
 ? update_curr+0x88/0xd0
 ? ieee80211_ba_session_work+0x148/0x230 [mac80211]
 ? process_one_work+0x1a5/0x330
 ? worker_thread+0x42/0x3c0
 ? create_worker+0x170/0x170
 ? kthread+0x10d/0x130
 ? kthread_create_on_node+0x40/0x40
 ? ret_from_fork+0x22/0x30

I did a bisect and the offending commit is:

commit 699cb58c8a52ff39bf659bff7971893ebe111bf2
Author: Johannes Berg <johannes.b...@intel.com>
Date:   Tue May 30 16:34:46 2017 +0200

mac80211: manage RX BA session offload without SKB queue

Instead of using the SKB queue with the fake pkt_type for the
offloaded RX BA session management, also handle this with the
normal aggregation state machine worker. This also makes the
use of this more reliable since it gets rid of the allocation
of the fake skb.

Combined with the previous patch, this finally allows us to
get rid of the pkt_type hack entirely, so do that as well.

Signed-off-by: Johannes Berg <johannes.b...@intel.com>

Regards,
-- 
Matteo Croce
per aspera ad upstream


Re: [PATCH] drivers: net: wireless: atmel: check memory allocation failure

2017-08-22 Thread Matteo Croce
Il giorno mar, 22/08/2017 alle 13.41 +0530, Himanshu Jha ha scritto:
> Check memory allocation failure and return -ENOMEM if failure
> occurs.
> 
> Signed-off-by: Himanshu Jha <himanshujha199...@gmail.com>
> ---
>  drivers/net/wireless/atmel/at76c50x-usb.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/wireless/atmel/at76c50x-usb.c
> b/drivers/net/wireless/atmel/at76c50x-usb.c
> index 09defbc..73f0924 100644
> --- a/drivers/net/wireless/atmel/at76c50x-usb.c
> +++ b/drivers/net/wireless/atmel/at76c50x-usb.c
> @@ -940,7 +940,7 @@ static void at76_dump_mib_mac_addr(struct
> at76_priv *priv)
>GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_MAC_ADDR, m,
>  sizeof(struct mib_mac_addr));
> @@ -969,7 +969,7 @@ static void at76_dump_mib_mac_wep(struct
> at76_priv *priv)
>   struct mib_mac_wep *m = kmalloc(sizeof(struct mib_mac_wep),
> GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_MAC_WEP, m,
>  sizeof(struct mib_mac_wep));
> @@ -1006,7 +1006,7 @@ static void at76_dump_mib_mac_mgmt(struct
> at76_priv *priv)
>GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_MAC_MGMT, m,
>  sizeof(struct mib_mac_mgmt));
> @@ -1043,7 +1043,7 @@ static void at76_dump_mib_mac(struct at76_priv
> *priv)
>   struct mib_mac *m = kmalloc(sizeof(struct mib_mac),
> GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_MAC, m, sizeof(struct
> mib_mac));
>   if (ret < 0) {
> @@ -1080,7 +1080,7 @@ static void at76_dump_mib_phy(struct at76_priv
> *priv)
>   struct mib_phy *m = kmalloc(sizeof(struct mib_phy),
> GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_PHY, m, sizeof(struct
> mib_phy));
>   if (ret < 0) {
> @@ -1113,7 +1113,7 @@ static void at76_dump_mib_local(struct
> at76_priv *priv)
>   struct mib_local *m = kmalloc(sizeof(*m), GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_LOCAL, m, sizeof(*m));
>   if (ret < 0) {
> @@ -1138,7 +1138,7 @@ static void at76_dump_mib_mdomain(struct
> at76_priv *priv)
>   struct mib_mdomain *m = kmalloc(sizeof(struct mib_mdomain),
> GFP_KERNEL);
>  
>   if (!m)
> - return;
> + return -ENOMEM;
>  
>   ret = at76_get_mib(priv->udev, MIB_MDOMAIN, m,
>  sizeof(struct mib_mdomain));

Perhaps these functions should return something instead of being void.

Regards,

-- 
Matteo Croce
per aspera ad upstream


Re: skb allocation from interrupt handler?

2017-08-08 Thread Matteo Croce
Il giorno mar, 08/08/2017 alle 18.17 -0400, Murali Karicheri ha
scritto:
> Is there an skb_alloc function that can be used from interrupt
> handler? Looks like netdev_alloc_skb()
> can't be used since I see following trace with kernel hack debug
> options enabled.
> 
> [  652.481713] [] (unwind_backtrace) from []
> (show_stack+0x10/0x14)
> [  652.481725] [] (show_stack) from []
> (dump_stack+0x98/0xc4)
> [  652.481736] [] (dump_stack) from []
> (___might_sleep+0x1b8/0x2a4)
> [  652.481746] [] (___might_sleep) from []
> (rt_spin_lock+0x24/0x5c)
> [  652.481755] [] (rt_spin_lock) from []
> (__netdev_alloc_skb+0xd0/0x254)
> [  652.481774] [] (__netdev_alloc_skb) from []
> (emac_rx_hardirq+0x374/0x554 [prueth])
> [  652.481793] [] (emac_rx_hardirq [prueth]) from
> [] (__handle_irq_event_percpu+0x9c/0x128)
> 
> This is running under RT kernel off 4.9.y
> 

netdev_alloc_skb() passes GFP_ATOMIC to alloc_skb() so it should work
in an interrupt handler too.

-- 
Matteo Croce
per aspera ad upstream


[RFC] net: make net.core.{r,w}mem_{default,max} namespaced

2017-07-26 Thread Matteo Croce
The following sysctl are global and can't be read or set from a netns:

net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max

Make the following sysctl parameters available from within a network
namespace, allowing to set unique values per network namespace.

My concern is about the initial value of this sysctl in the newly
creates netns: I'm not sure if is better to copy them from the init
namespace or set them to the default values.

Setting them to the default value has the advantage that a new namespace
behaves like a freshly booted system, while copying them from the init
netns has the advantage of keeping the current behaviour as the values
from the init netns are used.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 include/net/netns/core.h|  5 +++
 include/net/sock.h  |  6 
 include/net/tcp.h   |  3 +-
 net/core/net_namespace.c| 22 +
 net/core/sock.c | 31 +-
 net/core/sysctl_net_core.c  | 70 ++---
 net/ipv4/ip_output.c|  2 +-
 net/ipv4/syncookies.c   |  3 +-
 net/ipv4/tcp_minisocks.c|  3 +-
 net/ipv4/tcp_output.c   | 12 ---
 net/ipv6/syncookies.c   |  3 +-
 net/netfilter/ipvs/ip_vs_sync.c |  4 +--
 12 files changed, 89 insertions(+), 75 deletions(-)

diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 78eb1ff75475..9b613162467d 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -9,6 +9,11 @@ struct netns_core {
struct ctl_table_header *sysctl_hdr;
 
int sysctl_somaxconn;
+   u32 sysctl_wmem_max;
+   u32 sysctl_rmem_max;
+
+   u32 sysctl_wmem_default;
+   u32 sysctl_rmem_default;
 
struct prot_inuse __percpu *inuse;
 };
diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..e62a279e420f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2363,13 +2363,7 @@ bool sk_net_capable(const struct sock *sk, int cap);
 
 void sk_get_meminfo(const struct sock *sk, u32 *meminfo);
 
-extern __u32 sysctl_wmem_max;
-extern __u32 sysctl_rmem_max;
-
 extern int sysctl_tstamp_allow_data;
 extern int sysctl_optmem_max;
 
-extern __u32 sysctl_wmem_default;
-extern __u32 sysctl_rmem_default;
-
 #endif /* _SOCK_H */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70483296157f..460f4373d42a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1300,7 +1300,8 @@ static inline void tcp_slow_start_after_idle_check(struct 
sock *sk)
 /* Determine a window scaling and initial window to offer. */
 void tcp_select_initial_window(int __space, __u32 mss, __u32 *rcv_wnd,
   __u32 *window_clamp, int wscale_ok,
-  __u8 *rcv_wscale, __u32 init_rcv_wnd);
+  __u8 *rcv_wscale, __u32 init_rcv_wnd,
+  __u32 rmem_max);
 
 static inline int tcp_win_from_space(int space)
 {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 8726d051f31d..2d72b2bd6eab 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -23,6 +23,16 @@
 #include 
 #include 
 
+/* Take into consideration the size of the struct sk_buff overhead in the
+ * determination of these values, since that is non-constant across
+ * platforms.  This makes socket queueing behavior and performance
+ * not depend upon such differences.
+ */
+#define _SK_MEM_PACKETS256
+#define _SK_MEM_OVERHEAD   SKB_TRUESIZE(256)
+#define SK_WMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+#define SK_RMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+
 /*
  * Our network namespace constructor/destructor lists
  */
@@ -318,6 +328,18 @@ static __net_init int setup_net(struct net *net, struct 
user_namespace *user_ns)
 static int __net_init net_defaults_init_net(struct net *net)
 {
net->core.sysctl_somaxconn = SOMAXCONN;
+   if (net_eq(net, _net)) {
+   init_net.core.sysctl_wmem_max = SK_WMEM_MAX;
+   init_net.core.sysctl_rmem_max = SK_RMEM_MAX;
+   init_net.core.sysctl_wmem_default = SK_WMEM_MAX;
+   init_net.core.sysctl_rmem_default = SK_RMEM_MAX;
+   } else {
+   net->core.sysctl_wmem_max = init_net.core.sysctl_wmem_max;
+   net->core.sysctl_rmem_max = init_net.core.sysctl_rmem_max;
+   net->core.sysctl_wmem_default = 
init_net.core.sysctl_wmem_default;
+   net->core.sysctl_rmem_default = 
init_net.core.sysctl_rmem_default;
+   }
+
return 0;
 }
 
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..8086a660d75f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -307,24 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX];
 static struct lock_class_key af_elock_keys[AF_MAX];
 static struct lock_class_key 

Re: [PATCH] netns: more input validation

2017-07-25 Thread Matteo Croce
Il giorno mar, 25/07/2017 alle 13.47 +, David Laight ha scritto:
> Think I'd check:
>   !name[0] || !memchr(name, 0, NAME_MAX) || strchr(name, '/') ||
>   (name[0] == '.' && (!name[1] || (name[1] == '.' &&
> !name[2])))
> 
>   David

Nice optimization, but as strchr() and strcmp() are builtin functions,
at least in GCC, I don't know if there is any real advantage.


[PATCH] netns: more input validation

2017-07-25 Thread Matteo Croce
ip netns accepts invalid input as namespace name like an empty string or a
string longer than the maximum file name length.
Check that the netns name is not empty and less than or equal to NAME_MAX.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 ip/ipnetns.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 42549944..198e9de8 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -768,7 +768,8 @@ static int netns_monitor(int argc, char **argv)
 
 static int invalid_name(const char *name)
 {
-   return strchr(name, '/') || !strcmp(name, ".") || !strcmp(name, "..");
+   return !*name || strlen(name) > NAME_MAX ||
+   strchr(name, '/') || !strcmp(name, ".") || !strcmp(name, "..");
 }
 
 int do_netns(int argc, char **argv)
-- 
2.11.0



[PATCH v2] netns: avoid directory traversal (was: ip netns: Make sure netns name is sane)

2017-07-19 Thread Matteo Croce
v2: reword commit message

ip netns keeps track of created namespaces with bind mounts named
/var/run/netns/. No input sanitization is done, allowing creation and
deletion of files relatives to /var/run/netns or, if the path is non existent or
invalid, allows to create "untracked" namespaces (invisible to the tool).

This commit denies creation or deletion of namespaces with names contaning
"/" or matching exactly "." or "..".

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 ip/ipnetns.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 0b0378ab..42549944 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -766,6 +766,11 @@ static int netns_monitor(int argc, char **argv)
return 0;
 }
 
+static int invalid_name(const char *name)
+{
+   return strchr(name, '/') || !strcmp(name, ".") || !strcmp(name, "..");
+}
+
 int do_netns(int argc, char **argv)
 {
netns_nsid_socket_init();
@@ -775,6 +780,11 @@ int do_netns(int argc, char **argv)
return netns_list(0, NULL);
}
 
+   if (argc > 1 && invalid_name(argv[1])) {
+   fprintf(stderr, "Invalid netns name \"%s\"\n", argv[1]);
+   exit(-1);
+   }
+
if ((matches(*argv, "list") == 0) || (matches(*argv, "show") == 0) ||
(matches(*argv, "lst") == 0)) {
netns_map_init();
-- 
2.13.3



[PATCH] netns: avoid directory traversal (was: ip netns: Make sure netns name is sane)

2017-07-10 Thread Matteo Croce
Hi Phil,

I noticed that your patch still leaves an uncovered scenario, the one where the
namespace name is "." or "..".
Calling 'ip netns del ..' will remove /var/run which is a symlink to /run on
most systems causing some daemons, eg. dbus, to fail.

ip netns doesn't validate input, allowing creation and deletion of files
relatives to /var/run/netns.
This patch denies creation or deletion of namespaces with names contaning
"/" or that matches exactly "." or "..".
---
 ip/ipnetns.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 0b0378a..4254994 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -766,6 +766,11 @@ static int netns_monitor(int argc, char **argv)
return 0;
 }
 
+static int invalid_name(const char *name)
+{
+   return strchr(name, '/') || !strcmp(name, ".") || !strcmp(name, "..");
+}
+
 int do_netns(int argc, char **argv)
 {
netns_nsid_socket_init();
@@ -775,6 +780,11 @@ int do_netns(int argc, char **argv)
return netns_list(0, NULL);
}
 
+   if (argc > 1 && invalid_name(argv[1])) {
+   fprintf(stderr, "Invalid netns name \"%s\"\n", argv[1]);
+   exit(-1);
+   }
+
if ((matches(*argv, "list") == 0) || (matches(*argv, "show") == 0) ||
(matches(*argv, "lst") == 0)) {
netns_map_init();
-- 
2.9.4



[PATCH iproute2] tc: fix typo in manpage

2017-07-07 Thread Matteo Croce
Fix a typo in the 'tc' manpage and reword some sentences.

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 man/man8/tc-csum.8 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-csum.8 b/man/man8/tc-csum.8
index 718301d..409ab71 100644
--- a/man/man8/tc-csum.8
+++ b/man/man8/tc-csum.8
@@ -29,9 +29,9 @@ csum - checksum update action
 The
 .B csum
 action triggers checksum recalculation of specified packet headers. It is
-commonly used after packet editing using the
+commonly used to fix incorrect checksums after the
 .B pedit
-action to fix for then incorrect checksums.
+action has modified the packet content.
 .SH OPTIONS
 .TP
 .I TARGET
-- 
2.9.4



[PATCH v2] Documentation: fix wrong example command

2017-06-30 Thread Matteo Croce
In the IPVLAN documentation there is an example command line where the
master and slave interface names are inverted.
Fix the command line and also add the optional `name' keyword to better
describe what the command is doing.

v2: added commit message

Signed-off-by: Matteo Croce <mcr...@redhat.com>
---
 Documentation/networking/ipvlan.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ipvlan.txt 
b/Documentation/networking/ipvlan.txt
index 24196ce..1fe42a8 100644
--- a/Documentation/networking/ipvlan.txt
+++ b/Documentation/networking/ipvlan.txt
@@ -22,9 +22,9 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or 
as a module
There are no module parameters for this driver and it can be configured
 using IProute2/ip utility.
 
-   ip link add link   type ipvlan mode { l2 | l3 | 
l3s }
+   ip link add link  name  type ipvlan mode { l2 | 
l3 | l3s }
 
-   e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
+   e.g. ip link add link eth0 name ipvl0 type ipvlan mode l2
 
 
 4. Operating modes:
-- 
2.9.4



Re: [PATCH v4] add stealth mode

2015-09-23 Thread Matteo Croce
2015-09-16 13:06 GMT+02:00 Florian Westphal <f...@strlen.de>:
>
> Matteo Croce <mat...@openwrt.org> wrote:
> > Add option to disable any reply not related to a listening socket,
> > like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
> > Also disables ICMP replies to echo request and timestamp.
> > The stealth mode can be enabled selectively for a single interface.
>
> I think it would make more sense to extend the socket match
> in xtables if it can't be used to achive this already.
>
> seems like
> *filter
> :INPUT ACCEPT [0:0]
> -A INPUT -p tcp -m socket --nowildcard -j ACCEPT
> -A INPUT -p tcp -j DROP
> COMMIT
>
> Already does what you want for tcp, udp should work too.
> I'd much rather see xtables and/or nftables to be extended
> with whatever feature(s) are needed to configure such a policy
> rather than pushing this into the core network stack.

The point is to do the filtering without *tables at all,
like /proc/sys/net/ipv4/icmp_echo_ignore_all does for pings

-- 
Matteo Croce
OpenWrt Developer
  ___ __
 |   |.-.-.-.|  |  |  |..|  |_
 |   -   ||  _  |  -__| ||  |  |  ||   _||   _|
 |___||   __|_|__|__||||__|  ||
  |__| W I R E L E S S   F R E E D O M
 -
 CHAOS CALMER
 -
  * 1 1/2 oz GinShake with a glassful
  * 1/4 oz Triple Sec   of broken ice and pour
  * 3/4 oz Lime Juice   unstrained into a goblet.
  * 1 1/2 oz Orange Juice
  * 1 tsp. Grenadine Syrup
 -
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] add stealth mode

2015-09-23 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce <mat...@openwrt.org>
---
rebased against v4.3-rc2
use __in_dev_get_rcu() to access skb->dev->ip_ptr

 Documentation/networking/ip-sysctl.txt | 14 ++
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/ip_input.c|  5 +++--
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/ip6_input.c   |  5 +++--
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 14 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ebe94f2..1d46adc 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1206,6 +1206,13 @@ igmp_link_local_mcast_reports - BOOLEAN
224.0.0.X range.
Default TRUE
 
+stealth - BOOLEAN
+   Disable any reply not related to a listening socket,
+   like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+   Also disables ICMP replies to echo requests and timestamp
+   and ICMP errors for unknown protocols.
+   Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru
 
@@ -1635,6 +1642,13 @@ stable_secret - IPv6 address
 
By default the stable secret is unset.
 
+stealth - BOOLEAN
+   Disable any reply not related to a listening socket,
+   like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+   Also disables ICMPv6 replies to echo requests
+   and ICMP errors for unknown protocols.
+   Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct in_device 
*in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev)  IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)  IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)
 
 struct in_ifaddr {
struct hlist_node   hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index f1f32af..a9d0172 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -55,6 +55,7 @@ struct ipv6_devconf {
__s32   ndisc_notify;
__s32   suppress_frag_ndisc;
__s32   accept_ra_mtu;
+   __s32   stealth;
struct ipv6_stable_secret {
bool initialized;
struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+   IPV4_DEVCONF_STEALTH,
__IPV4_DEVCONF_MAX
 };
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2d9cb17..6d9c080 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2190,6 +2190,7 @@ static struct devinet_sysctl_table {
  "promote_secondaries"),
DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
  "route_localnet"),
+   DEVINET_SYSCTL_RW_ENTRY(STEALTH, "stealth"),
},
 };
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 79fe05b..4cd35b2 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -889,6 +889,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
struct net *net;
 
+   if (IN_DEV_STEALTH(__in_dev_get_rcu(skb->dev)))
+   return true;
+
net = dev_net(skb_dst(skb)->dev);
if (!net->ipv4.sysctl_icmp_echo_ignore_all) {
struct icmp_bxm icmp_param;
@@ -922,6 +925,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
if (skb->len < 4)
goto out_err;
 
+   if (IN_DEV_STEALTH(__in_dev_get_rcu(skb->dev)))
+   return true;
+
/*
 *  Fill in the current time as ms since midnight UT:
 */

[PATCH v4] add stealth mode

2015-09-16 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce <mat...@openwrt.org>
---
rebased on 4.3-rc1

 Documentation/networking/ip-sysctl.txt | 14 ++
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/ip_input.c|  5 +++--
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/ip6_input.c   |  5 +++--
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 14 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ebe94f2..1d46adc 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1206,6 +1206,13 @@ igmp_link_local_mcast_reports - BOOLEAN
224.0.0.X range.
Default TRUE
 
+stealth - BOOLEAN
+   Disable any reply not related to a listening socket,
+   like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+   Also disables ICMP replies to echo requests and timestamp
+   and ICMP errors for unknown protocols.
+   Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru
 
@@ -1635,6 +1642,13 @@ stable_secret - IPv6 address
 
By default the stable secret is unset.
 
+stealth - BOOLEAN
+   Disable any reply not related to a listening socket,
+   like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+   Also disables ICMPv6 replies to echo requests
+   and ICMP errors for unknown protocols.
+   Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct in_device 
*in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev)  IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)  IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)
 
 struct in_ifaddr {
struct hlist_node   hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index f1f32af..a9d0172 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -55,6 +55,7 @@ struct ipv6_devconf {
__s32   ndisc_notify;
__s32   suppress_frag_ndisc;
__s32   accept_ra_mtu;
+   __s32   stealth;
struct ipv6_stable_secret {
bool initialized;
struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+   IPV4_DEVCONF_STEALTH,
__IPV4_DEVCONF_MAX
 };
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2d9cb17..6d9c080 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2190,6 +2190,7 @@ static struct devinet_sysctl_table {
  "promote_secondaries"),
DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
  "route_localnet"),
+   DEVINET_SYSCTL_RW_ENTRY(STEALTH, "stealth"),
},
 };
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 79fe05b..4cd35b2 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -889,6 +889,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
struct net *net;
 
+   if (IN_DEV_STEALTH(skb->dev->ip_ptr))
+   return true;
+
net = dev_net(skb_dst(skb)->dev);
if (!net->ipv4.sysctl_icmp_echo_ignore_all) {
struct icmp_bxm icmp_param;
@@ -922,6 +925,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
if (skb->len < 4)
goto out_err;
 
+   if (IN_DEV_STEALTH(skb->dev->ip_ptr))
+   return true;
+
/*
 *  Fill in the current time as ms since midnight UT:
 */
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index f4fc8a7..e75f25

[PATCH] [PATCH v3] add stealth mode

2015-09-02 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce <mat...@openwrt.org>
---
 Documentation/networking/ip-sysctl.txt | 14 ++
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/ip_input.c|  5 +++--
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/ip6_input.c   |  5 +++--
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 14 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 5fae770..50fe7df 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,13 @@ tag - INTEGER
Allows you to write a number, which can be used as required.
Default value is 0.
 
+stealth - BOOLEAN
+   Disable any reply not related to a listening socket,
+   like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+   Also disables ICMP replies to echo requests and timestamp
+   and ICMP errors for unknown protocols.
+   Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru
 
@@ -1584,6 +1591,13 @@ stable_secret - IPv6 address
 
By default the stable secret is unset.
 
+stealth - BOOLEAN
+   Disable any reply not related to a listening socket,
+   like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+   Also disables ICMPv6 replies to echo requests
+   and ICMP errors for unknown protocols.
+   Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct in_device 
*in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev)  IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)  IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)
 
 struct in_ifaddr {
struct hlist_node   hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
__s32   ndisc_notify;
__s32   suppress_frag_ndisc;
__s32   accept_ra_mtu;
+   __s32   stealth;
struct ipv6_stable_secret {
bool initialized;
struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+   IPV4_DEVCONF_STEALTH,
__IPV4_DEVCONF_MAX
 };
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2d9cb17..6d9c080 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2190,6 +2190,7 @@ static struct devinet_sysctl_table {
  "promote_secondaries"),
DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
  "route_localnet"),
+   DEVINET_SYSCTL_RW_ENTRY(STEALTH, "stealth"),
},
 };
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..e8e71fb 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
struct net *net;
 
+   if (IN_DEV_STEALTH(skb->dev->ip_ptr))
+   return true;
+
net = dev_net(skb_dst(skb)->dev);
if (!net->ipv4.sysctl_icmp_echo_ignore_all) {
struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
if (skb->len < 4)
goto out_err;
 
+   if (IN_DEV_STEALTH(skb->dev->ip_ptr))
+   return true;
+
/*
 *  Fill in the current time as ms since midnight UT:
 */
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 2db4

Re: [PATCH v2] add stealth mode

2015-07-14 Thread Matteo Croce
2015-07-13 15:03 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com:
 How about FIN/ACK and FIN/PSH/URG?

Silent:

root@debian64:~# hping3 192.168.0.2 -p 32 -FA
HPING 192.168.0.2 (eth0 192.168.0.2): AF set, 40 headers + 0 data bytes
^C
--- 192.168.0.2 hping statistic ---
3 packets transmitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
root@debian64:~# hping3 192.168.0.2 -p 32 -FPU
HPING 192.168.0.2 (eth0 192.168.0.2): FPU set, 40 headers + 0 data bytes
^C
--- 192.168.0.2 hping statistic ---
3 packets transmitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms


Matteo Croce
OpenWrt Developer
  ___ __
 |   |.-.-.-.|  |  |  |..|  |_
 |   -   ||  _  |  -__| ||  |  |  ||   _||   _|
 |___||   __|_|__|__||||__|  ||
  |__| W I R E L E S S   F R E E D O M
 -
 CHAOS CALMER
 -
  * 1 1/2 oz GinShake with a glassful
  * 1/4 oz Triple Sec   of broken ice and pour
  * 3/4 oz Lime Juice   unstrained into a goblet.
  * 1 1/2 oz Orange Juice
  * 1 tsp. Grenadine Syrup
 -
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add stealth mode

2015-07-12 Thread Matteo Croce
2015-07-08 15:32 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com:
 On 2015-07-06 15:44, Matteo Croce wrote:
 Just to name a few that I know of off the top of my head:
 1. IP packets with any protocol number not supported by your current kernel
 (these return a special ICMP message).

Right, I'll handle them

 2. SCTP INIT and COOKIE_ECHO chunks when you have SCTP enabled in the
 kernel.

Well, I've never played with SCTP before

 3. Theoretically, some IGMP messages.
 4. NDP messages.
 5. ARP queries looking for the machine's IP addresses.

Yes I know, but it's unlikely to receive this packets from WAN, right?
My flag is intended to be used mostly on WAN interfaces,
machines in LAN should be easily discoverable IMHO

 6. Certain odd flag combinations on single TCP packets (check the
 documentation for Nmap for more info regarding these), which I believe
 (although I may be reading the code wrong) you aren't accounting for.

I've tried many TCP flags combination with hping3, NUL, SYN/ACK, ACK,
SYN/FIN, etc.
They doesn't get any response when the flag is set

 7. DAD queries.

Never looked at this packets, are a subset of NDP?

 8. ICMP address mask queries (which you also don't appear to account for).

It's deprecated and actually it doesn't get any response already

 This is by no means an exhaustive list, but all of them really should be
 addressed if you want to do this properly.



Thank you,
-- 
Matteo Croce
OpenWrt Developer
  ___ __
 |   |.-.-.-.|  |  |  |..|  |_
 |   -   ||  _  |  -__| ||  |  |  ||   _||   _|
 |___||   __|_|__|__||||__|  ||
  |__| W I R E L E S S   F R E E D O M
 -
 CHAOS CALMER
 -
  * 1 1/2 oz GinShake with a glassful
  * 1/4 oz Triple Sec   of broken ice and pour
  * 3/4 oz Lime Juice   unstrained into a goblet.
  * 1 1/2 oz Orange Juice
  * 1 tsp. Grenadine Syrup
 -
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add stealth mode

2015-07-07 Thread Matteo Croce
2015-07-07 9:01 GMT+02:00 Clemens Ladisch clem...@ladisch.de:
 valdis.kletni...@vt.edu wrote:
 On Thu, 02 Jul 2015 10:56:01 +0200, Matteo Croce said:
 Add option to disable any reply not related to a listening socket

 2) You *do* realize that this isn't anywhere near sufficient in order
 to actually make your machine invisible, right?  (Hint: What *other*
 packets can be sent to a machine to provoke a response?)

 Even worse: if you want to pretend that the entire machine is not there,
 you must make the router in front on you reply with an ICMP destination
 unreachable message.

You can't do sometimes, like in DSL lines where the router in front of
you is an ISP owned DSLAM

-- 
Matteo Croce
OpenWrt Developer
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add stealth mode

2015-07-07 Thread Matteo Croce
2015-07-07 10:07 GMT+02:00 Hannes Frederic Sowa han...@stressinduktion.org:


 On Mon, Jul 6, 2015, at 21:44, Matteo Croce wrote:
 2015-07-06 12:49 GMT+02:00  valdis.kletni...@vt.edu:
  On Thu, 02 Jul 2015 10:56:01 +0200, Matteo Croce said:
  Add option to disable any reply not related to a listening socket,
  like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
  Also disables ICMP replies to echo request and timestamp.
  The stealth mode can be enabled selectively for a single interface.
 
  A few notes.
 
  1) Do you have an actual use case where an iptables '-j DROP' isn't usable?

 If you mean using a default DROP policy and allowing only the traffic
 do you want,
 then the use case is where the port can change at runtime and you may not
 want
 to update the firewall every time

 Can't you use socket match in netfilter to accomplish exactly that?

You mean the owner --uid match?
Yes  sort of, but my was a different goal, I want just to disable any
kind of reply from a specific interface (usually WAN) unless there is
a listening socket, to mitigate port scanning and flood attacks
without having a firewall.

Obviously you can do it with a firewall,
but why do we have /proc/sys/net/ipv4/icmp_echo_ignore_all when we can
drop ICMP echoes?

-- 
Matteo Croce
OpenWrt Developer
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add stealth mode

2015-07-06 Thread Matteo Croce
2015-07-06 12:49 GMT+02:00  valdis.kletni...@vt.edu:
 On Thu, 02 Jul 2015 10:56:01 +0200, Matteo Croce said:
 Add option to disable any reply not related to a listening socket,
 like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
 Also disables ICMP replies to echo request and timestamp.
 The stealth mode can be enabled selectively for a single interface.

 A few notes.

 1) Do you have an actual use case where an iptables '-j DROP' isn't usable?

If you mean using a default DROP policy and allowing only the traffic
do you want,
then the use case is where the port can change at runtime and you may not want
to update the firewall every time


 2) You *do* realize that this isn't anywhere near sufficient in order
 to actually make your machine invisible, right?  (Hint: What *other*
 packets can be sent to a machine to provoke a response?)

Other than ICMP, UDP and TCP excluding open TCP/UDP ports?

 3) At least my copy had massive whitespace damage, where all the tab 
 characters
 appear to have evaporated

Sorry, I was using git sendemail first, but I got a security error from gmail,
so I copied/pasted the patch in gmail which corrupted it

-- 
Matteo Croce
OpenWrt Developer
  ___ __
 |   |.-.-.-.|  |  |  |..|  |_
 |   -   ||  _  |  -__| ||  |  |  ||   _||   _|
 |___||   __|_|__|__||||__|  ||
  |__| W I R E L E S S   F R E E D O M
 -
 CHAOS CALMER
 -
  * 1 1/2 oz GinShake with a glassful
  * 1/4 oz Triple Sec   of broken ice and pour
  * 3/4 oz Lime Juice   unstrained into a goblet.
  * 1 1/2 oz Orange Juice
  * 1 tsp. Grenadine Syrup
 -
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add stealth mode

2015-07-02 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce mat...@openwrt.org
---
 Documentation/networking/ip-sysctl.txt | 12 
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 12 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt
b/Documentation/networking/ip-sysctl.txt
index 5fae770..9eed021 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,12 @@ tag - INTEGER
  Allows you to write a number, which can be used as required.
  Default value is 0.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMP replies to echo requests and timestamp.
+ Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru

@@ -1584,6 +1590,12 @@ stable_secret - IPv6 address

  By default the stable secret is unset.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMPv6 replies to echo requests.
+ Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
  Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct
in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)

 struct in_ifaddr {
  struct hlist_node hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
  __s32   ndisc_notify;
  __s32 suppress_frag_ndisc;
  __s32 accept_ra_mtu;
+ __s32 stealth;
  struct ipv6_stable_secret {
  bool initialized;
  struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
  IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+ IPV4_DEVCONF_STEALTH,
  __IPV4_DEVCONF_MAX
 };

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7498716..6b9930a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table {
   promote_secondaries),
  DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
   route_localnet),
+ DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth),
  },
 };

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..e8e71fb 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
  struct net *net;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  net = dev_net(skb_dst(skb)-dev);
  if (!net-ipv4.sysctl_icmp_echo_ignore_all) {
  struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
  if (skb-len  4)
  goto out_err;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  /*
  * Fill in the current time as ms since midnight UT:
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b..6f3e6e9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -77,6 +77,7 @@
 #include net/busy_poll.h

 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/ipv6.h
 #include linux/stddef.h
 #include linux/proc_fs.h
@@ -1652,7 +1653,7 @@ csum_error:
  TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS);
 bad_packet:
  TCP_INC_STATS_BH(net, TCP_MIB_INERRS);
- } else {
+ } else if (!IN_DEV_STEALTH(skb-dev-ip_ptr)) {
  tcp_v4_send_reset(NULL, skb);
  }

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604..780069d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -96,6 +96,7 @@
 #include linux/timer.h
 #include linux/mm.h
 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/netdevice.h
 #include

Re: [PATCH v2] add stealth mode

2015-07-02 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce mat...@openwrt.org
---
check the patch with checkpatch.pl and add documentation in ip-sysctl.txt

 Documentation/networking/ip-sysctl.txt | 12 
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 12 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt
b/Documentation/networking/ip-sysctl.txt
index 5fae770..9eed021 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,12 @@ tag - INTEGER
  Allows you to write a number, which can be used as required.
  Default value is 0.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMP replies to echo requests and timestamp.
+ Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru

@@ -1584,6 +1590,12 @@ stable_secret - IPv6 address

  By default the stable secret is unset.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMPv6 replies to echo requests.
+ Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
  Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct
in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)

 struct in_ifaddr {
  struct hlist_node hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
  __s32   ndisc_notify;
  __s32 suppress_frag_ndisc;
  __s32 accept_ra_mtu;
+ __s32 stealth;
  struct ipv6_stable_secret {
  bool initialized;
  struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
  IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+ IPV4_DEVCONF_STEALTH,
  __IPV4_DEVCONF_MAX
 };

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7498716..6b9930a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table {
   promote_secondaries),
  DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
   route_localnet),
+ DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth),
  },
 };

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..e8e71fb 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
  struct net *net;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  net = dev_net(skb_dst(skb)-dev);
  if (!net-ipv4.sysctl_icmp_echo_ignore_all) {
  struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
  if (skb-len  4)
  goto out_err;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  /*
  * Fill in the current time as ms since midnight UT:
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b..6f3e6e9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -77,6 +77,7 @@
 #include net/busy_poll.h

 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/ipv6.h
 #include linux/stddef.h
 #include linux/proc_fs.h
@@ -1652,7 +1653,7 @@ csum_error:
  TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS);
 bad_packet:
  TCP_INC_STATS_BH(net, TCP_MIB_INERRS);
- } else {
+ } else if (!IN_DEV_STEALTH(skb-dev-ip_ptr)) {
  tcp_v4_send_reset(NULL, skb);
  }

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604..780069d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -96,6 +96,7 @@
 #include linux/timer.h
 #include linux/mm.h
 #include linux

[PATCH] add stealth mode

2015-07-01 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Dest-Unreach for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.
---
 include/linux/inetdevice.h | 1 +
 include/linux/ipv6.h   | 1 +
 include/uapi/linux/ip.h| 1 +
 net/ipv4/devinet.c | 1 +
 net/ipv4/icmp.c| 6 ++
 net/ipv4/tcp_ipv4.c| 3 ++-
 net/ipv4/udp.c | 4 +++-
 net/ipv6/addrconf.c| 7 +++
 net/ipv6/icmp.c| 3 ++-
 net/ipv6/tcp_ipv6.c| 2 +-
 net/ipv6/udp.c | 3 ++-
 11 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct
in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)

 struct in_ifaddr {
  struct hlist_node hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
  __s32   ndisc_notify;
  __s32 suppress_frag_ndisc;
  __s32 accept_ra_mtu;
+ __s32 stealth;
  struct ipv6_stable_secret {
  bool initialized;
  struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
  IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+ IPV4_DEVCONF_STEALTH,
  __IPV4_DEVCONF_MAX
 };

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7498716..6b9930a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table {
   promote_secondaries),
  DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
   route_localnet),
+ DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth),
  },
 };

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..2f1b31f 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
  struct net *net;

+ if(IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  net = dev_net(skb_dst(skb)-dev);
  if (!net-ipv4.sysctl_icmp_echo_ignore_all) {
  struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
  if (skb-len  4)
  goto out_err;

+ if(IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  /*
  * Fill in the current time as ms since midnight UT:
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b..c887d6e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -77,6 +77,7 @@
 #include net/busy_poll.h

 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/ipv6.h
 #include linux/stddef.h
 #include linux/proc_fs.h
@@ -1652,7 +1653,7 @@ csum_error:
  TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS);
 bad_packet:
  TCP_INC_STATS_BH(net, TCP_MIB_INERRS);
- } else {
+ } else if(!IN_DEV_STEALTH(skb-dev-ip_ptr)) {
  tcp_v4_send_reset(NULL, skb);
  }

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604..b3b0dee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -96,6 +96,7 @@
 #include linux/timer.h
 #include linux/mm.h
 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/netdevice.h
 #include linux/slab.h
 #include net/tcp_states.h
@@ -1823,7 +1824,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct
udp_table *udptable,
  goto csum_error;

  UDP_INC_STATS_BH(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE);
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+ if(!IN_DEV_STEALTH(skb-dev-ip_ptr))
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);

  /*
  * Hmm.  We got an UDP packet to a port to which we
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..b9e44e2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5585,6 +5585,13 @@ static struct addrconf_sysctl_table
  .proc_handler = addrconf_sysctl_stable_secret,
  },
  {
+ .procname = stealth,
+ .data = ipv6_devconf.stealth,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
  /* sentinel */
  }
  },
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 713d743..94b08ac 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -723,7 +723,8 @@ static int icmpv6_rcv(struct sk_buff *skb)

  switch (type) {
  case ICMPV6_ECHO_REQUEST:
- icmpv6_echo_reply(skb);
+ if(!idev-cnf.stealth)
+ icmpv6_echo_reply(skb);
  break;

  case ICMPV6_ECHO_REPLY:
diff --git a/net/ipv6/tcp_ipv6.c 

Re: [PATCH][MIPS] AR7 ethernet

2007-10-23 Thread Matteo Croce
Il Monday 15 October 2007 20:24:21 Jeff Garzik ha scritto:
 applied

Small update to the driver, please apply

Signed-off-by: Matteo Croce [EMAIL PROTECTED]
Signed-off-by: Eugene Konev [EMAIL PROTECTED]
Signed-off-by: Felix Fietkau [EMAIL PROTECTED]

diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
index ae41973..57541d2 100644
--- a/drivers/net/cpmac.c
+++ b/drivers/net/cpmac.c
@@ -460,18 +460,11 @@ static int cpmac_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct cpmac_desc *desc;
struct cpmac_priv *priv = netdev_priv(dev);
 
-   if (unlikely(skb_padto(skb, ETH_ZLEN))) {
-   if (netif_msg_tx_err(priv)  net_ratelimit())
-   printk(KERN_WARNING
-  %s: tx: padding failed, dropping\n, dev-name);
-   spin_lock(priv-lock);
-   dev-stats.tx_dropped++;
-   spin_unlock(priv-lock);
-   return -ENOMEM;
-   }
+   if (unlikely(skb_padto(skb, ETH_ZLEN)))
+   return NETDEV_TX_OK;
 
len = max(skb-len, ETH_ZLEN);
-   queue = skb_get_queue_mapping(skb);
+   queue = skb-queue_mapping;
 #ifdef CONFIG_NETDEVICES_MULTIQUEUE
netif_stop_subqueue(dev, queue);
 #else
@@ -481,13 +474,9 @@ static int cpmac_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
desc = priv-desc_ring[queue];
if (unlikely(desc-dataflags  CPMAC_OWN)) {
if (netif_msg_tx_err(priv)  net_ratelimit())
-   printk(KERN_WARNING %s: tx dma ring full, dropping\n,
+   printk(KERN_WARNING %s: tx dma ring full\n,
   dev-name);
-   spin_lock(priv-lock);
-   dev-stats.tx_dropped++;
-   spin_unlock(priv-lock);
-   dev_kfree_skb_any(skb);
-   return -ENOMEM;
+   return NETDEV_TX_BUSY;
}
 
spin_lock(priv-lock);
@@ -509,7 +498,7 @@ static int cpmac_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
cpmac_dump_skb(dev, skb);
cpmac_write(priv-regs, CPMAC_TX_PTR(queue), (u32)desc-mapping);
 
-   return 0;
+   return NETDEV_TX_OK;
 }
 
 static void cpmac_end_xmit(struct net_device *dev, int queue)
@@ -646,12 +635,14 @@ static void cpmac_clear_tx(struct net_device *dev)
int i;
if (unlikely(!priv-desc_ring))
return;
-   for (i = 0; i  CPMAC_QUEUES; i++)
+   for (i = 0; i  CPMAC_QUEUES; i++) {
+   priv-desc_ring[i].dataflags = 0;
if (priv-desc_ring[i].skb) {
dev_kfree_skb_any(priv-desc_ring[i].skb);
if (netif_subqueue_stopped(dev, i))
netif_wake_subqueue(dev, i);
}
+   }
 }
 
 static void cpmac_hw_error(struct work_struct *work)
@@ -727,11 +718,13 @@ static void cpmac_tx_timeout(struct net_device *dev)
 #ifdef CONFIG_NETDEVICES_MULTIQUEUE
for (i = 0; i  CPMAC_QUEUES; i++)
if (priv-desc_ring[i].skb) {
+   priv-desc_ring[i].dataflags = 0;
dev_kfree_skb_any(priv-desc_ring[i].skb);
netif_wake_subqueue(dev, i);
break;
}
 #else
+   priv-desc_ring[0].dataflags = 0;
if (priv-desc_ring[0].skb)
dev_kfree_skb_any(priv-desc_ring[0].skb);
netif_wake_queue(dev);
@@ -794,7 +787,7 @@ static int cpmac_set_ringparam(struct net_device *dev, 
struct ethtool_ringparam*
 {
struct cpmac_priv *priv = netdev_priv(dev);
 
-   if (dev-flags  IFF_UP)
+   if (netif_running(dev))
return -EBUSY;
priv-ring_size = ring-rx_pending;
return 0;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][MIPS] AR7 ethernet

2007-10-14 Thread Matteo Croce
New version which uses less locking and drops old API

Signed-off-by: Matteo Croce [EMAIL PROTECTED]
Signed-off-by: Eugene Konev [EMAIL PROTECTED]
Signed-off-by: Felix Fietkau [EMAIL PROTECTED]

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index e412582..5ab5d5b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1780,6 +1780,15 @@ config SC92031
  To compile this driver as a module, choose M here: the module
  will be called sc92031.  This is recommended.
 
+config CPMAC
+   tristate TI AR7 CPMAC Ethernet support (EXPERIMENTAL)
+   depends on NET_ETHERNET  EXPERIMENTAL  AR7
+   select PHYLIB
+   select FIXED_PHY
+   select FIXED_MII_100_FDX
+   help
+ TI AR7 CPMAC Ethernet support
+
 config NET_POCKET
bool Pocket and portable adapters
depends on PARPORT
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e14bf49..411fcd8 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -161,6 +161,7 @@ obj-$(CONFIG_8139CP) += 8139cp.o
 obj-$(CONFIG_8139TOO) += 8139too.o
 obj-$(CONFIG_ZNET) += znet.o
 obj-$(CONFIG_LAN_SAA9730) += saa9730.o
+obj-$(CONFIG_CPMAC) += cpmac.o
 obj-$(CONFIG_DEPCA) += depca.o
 obj-$(CONFIG_EWRK3) += ewrk3.o
 obj-$(CONFIG_ATP) += atp.o
diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
new file mode 100644
index 000..ed53aaa
--- /dev/null
+++ b/drivers/net/cpmac.c
@@ -0,0 +1,1174 @@
+/*
+ * Copyright (C) 2006, 2007 Eugene Konev
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/moduleparam.h
+
+#include linux/sched.h
+#include linux/kernel.h
+#include linux/slab.h
+#include linux/errno.h
+#include linux/types.h
+#include linux/delay.h
+#include linux/version.h
+
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/ethtool.h
+#include linux/skbuff.h
+#include linux/mii.h
+#include linux/phy.h
+#include linux/platform_device.h
+#include linux/dma-mapping.h
+#include asm/gpio.h
+
+MODULE_AUTHOR(Eugene Konev [EMAIL PROTECTED]);
+MODULE_DESCRIPTION(TI AR7 ethernet driver (CPMAC));
+MODULE_LICENSE(GPL);
+
+static int debug_level = 8;
+static int dumb_switch;
+
+/* Next 2 are only used in cpmac_probe, so it's pointless to change them */
+module_param(debug_level, int, 0444);
+module_param(dumb_switch, int, 0444);
+
+MODULE_PARM_DESC(debug_level, Number of NETIF_MSG bits to enable);
+MODULE_PARM_DESC(dumb_switch, Assume switch is not connected to MDIO bus);
+
+#define CPMAC_VERSION 0.5.0
+/* stolen from net/ieee80211.h */
+#ifndef MAC_FMT
+#define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x
+#define MAC_ARG(x) ((u8*)(x))[0], ((u8*)(x))[1], ((u8*)(x))[2], \
+  ((u8*)(x))[3], ((u8*)(x))[4], ((u8*)(x))[5]
+#endif
+/* frame size + 802.1q tag */
+#define CPMAC_SKB_SIZE (ETH_FRAME_LEN + 4)
+#define CPMAC_QUEUES   8
+
+/* Ethernet registers */
+#define CPMAC_TX_CONTROL   0x0004
+#define CPMAC_TX_TEARDOWN  0x0008
+#define CPMAC_RX_CONTROL   0x0014
+#define CPMAC_RX_TEARDOWN  0x0018
+#define CPMAC_MBP  0x0100
+# define MBP_RXPASSCRC 0x4000
+# define MBP_RXQOS 0x2000
+# define MBP_RXNOCHAIN 0x1000
+# define MBP_RXCMF 0x0100
+# define MBP_RXSHORT   0x0080
+# define MBP_RXCEF 0x0040
+# define MBP_RXPROMISC 0x0020
+# define MBP_PROMISCCHAN(channel)  (((channel)  0x7)  16)
+# define MBP_RXBCAST   0x2000
+# define MBP_BCASTCHAN(channel)(((channel)  0x7)  8)
+# define MBP_RXMCAST   0x0020
+# define MBP_MCASTCHAN(channel)((channel)  0x7)
+#define CPMAC_UNICAST_ENABLE   0x0104
+#define CPMAC_UNICAST_CLEAR0x0108
+#define CPMAC_MAX_LENGTH   0x010c
+#define CPMAC_BUFFER_OFFSET0x0110
+#define CPMAC_MAC_CONTROL  0x0160
+# define MAC_TXPTYPE   0x0200
+# define MAC_TXPACE0x0040
+# define MAC_MII   0x0020
+# define MAC_TXFLOW0x0010
+# define MAC_RXFLOW0x0008
+# define MAC_MTEST

Re: [PATCH][MIPS][6/6] AR7: leds driver

2007-10-10 Thread Matteo Croce
The new led driver, uses leds-gpio now

Signed-off-by: Matteo Croce [EMAIL PROTECTED]
Signed-off-by: Nicolas Thill [EMAIL PROTECTED]

diff --git a/drivers/leds/leds-ar7.c b/drivers/leds/leds-ar7.c
new file mode 100644
index 000..72b958a
--- /dev/null
+++ b/drivers/leds/leds-ar7.c
@@ -0,0 +1,130 @@
+/*
+ * Copyright (C) 2007 Nicolas Thill [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/platform_device.h
+#include linux/leds.h
+#include linux/err.h
+#include linux/io.h
+#include gpio.h
+
+#define DRVNAME ar7-leds
+#define LONGNAME TI AR7 LEDs driver
+#define AR7_GPIO_BIT_STATUS_LED 8
+
+MODULE_AUTHOR(Nicolas Thill [EMAIL PROTECTED]);
+MODULE_DESCRIPTION(LONGNAME);
+MODULE_LICENSE(GPL);
+
+static void ar7_status_led_set(struct led_classdev *pled,
+   enum led_brightness value)
+{
+   gpio_set_value(AR7_GPIO_BIT_STATUS_LED, value ? 0 : 1);
+}
+
+static struct led_classdev ar7_status_led = {
+   .name   = ar7:status,
+   .brightness_set = ar7_status_led_set,
+};
+
+#ifdef CONFIG_PM
+static int ar7_leds_suspend(struct platform_device *dev,
+   pm_message_t state)
+{
+   led_classdev_suspend(ar7_status_led);
+   return 0;
+}
+
+static int ar7_leds_resume(struct platform_device *dev)
+{
+   led_classdev_resume(ar7_status_led);
+   return 0;
+}
+#else /* CONFIG_PM */
+#define ar7_leds_suspend NULL
+#define ar7_leds_resume NULL
+#endif /* CONFIG_PM */
+
+static int ar7_leds_probe(struct platform_device *pdev)
+{
+   int rc;
+
+   rc = led_classdev_register(pdev-dev, ar7_status_led);
+   if (rc  0)
+   goto out;
+
+   ar7_gpio_enable(AR7_GPIO_BIT_STATUS_LED);
+   gpio_direction_output(AR7_GPIO_BIT_STATUS_LED, 0);
+
+out:
+   return rc;
+}
+
+static int ar7_leds_remove(struct platform_device *pdev)
+{
+   led_classdev_unregister(ar7_status_led);
+
+   return 0;
+}
+
+static struct platform_device *ar7_leds_device;
+
+static struct platform_driver ar7_leds_driver = {
+   .probe  = ar7_leds_probe,
+   .remove = ar7_leds_remove,
+   .suspend= ar7_leds_suspend,
+   .resume = ar7_leds_resume,
+   .driver = {
+   .name   = DRVNAME,
+   },
+};
+
+static int __init ar7_leds_init(void)
+{
+   int rc;
+
+   ar7_leds_device = platform_device_alloc(DRVNAME, -1);
+   if (!ar7_leds_device)
+   return -ENOMEM;
+
+   rc = platform_device_add(ar7_leds_device);
+   if (rc  0)
+   goto out_put;
+
+   rc = platform_driver_register(ar7_leds_driver);
+   if (rc  0)
+   goto out_put;
+
+   goto out;
+
+out_put:
+   platform_device_put(ar7_leds_device);
+out:
+   return rc;
+}
+
+static void __exit ar7_leds_exit(void)
+{
+   platform_driver_unregister(ar7_leds_driver);
+   platform_device_unregister(ar7_leds_device);
+}
+
+module_init(ar7_leds_init);
+module_exit(ar7_leds_exit);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][MIPS][7/7] AR7: ethernet

2007-09-20 Thread Matteo Croce
Driver for the cpmac 100M ethernet driver.
Jeff, here is the meat ;)

Signed-off-by: Matteo Croce [EMAIL PROTECTED]
Signed-off-by: Eugene Konev [EMAIL PROTECTED]

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 6a0863e..28ba0dc 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1822,6 +1822,15 @@ config SC92031
  To compile this driver as a module, choose M here: the module
  will be called sc92031.  This is recommended.
 
+config CPMAC
+   tristate TI AR7 CPMAC Ethernet support (EXPERIMENTAL)
+   depends on NET_ETHERNET  EXPERIMENTAL  AR7
+   select PHYLIB
+   select FIXED_PHY
+   select FIXED_MII_100_FDX
+   help
+ TI AR7 CPMAC Ethernet support
+
 config NET_POCKET
bool Pocket and portable adapters
depends on PARPORT
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 9501d64..b536934 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -157,6 +157,7 @@ obj-$(CONFIG_8139CP) += 8139cp.o
 obj-$(CONFIG_8139TOO) += 8139too.o
 obj-$(CONFIG_ZNET) += znet.o
 obj-$(CONFIG_LAN_SAA9730) += saa9730.o
+obj-$(CONFIG_CPMAC) += cpmac.o
 obj-$(CONFIG_DEPCA) += depca.o
 obj-$(CONFIG_EWRK3) += ewrk3.o
 obj-$(CONFIG_ATP) += atp.o
diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
new file mode 100644
index 000..50aad94
--- /dev/null
+++ b/drivers/net/cpmac.c
@@ -0,0 +1,1166 @@
+/*
+ * Copyright (C) 2006, 2007 Eugene Konev
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/moduleparam.h
+
+#include linux/sched.h
+#include linux/kernel.h
+#include linux/slab.h
+#include linux/errno.h
+#include linux/types.h
+#include linux/delay.h
+#include linux/version.h
+
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/ethtool.h
+#include linux/skbuff.h
+#include linux/mii.h
+#include linux/phy.h
+#include linux/platform_device.h
+#include linux/dma-mapping.h
+#include asm/gpio.h
+
+MODULE_AUTHOR(Eugene Konev);
+MODULE_DESCRIPTION(TI AR7 ethernet driver (CPMAC));
+MODULE_LICENSE(GPL);
+
+static int rx_ring_size = 64;
+static int disable_napi;
+static int debug_level = 8;
+static int dumb_switch;
+
+module_param(rx_ring_size, int, 0644);
+module_param(disable_napi, int, 0644);
+/* Next 2 are only used in cpmac_probe, so it's pointless to change them */
+module_param(debug_level, int, 0444);
+module_param(dumb_switch, int, 0444);
+
+MODULE_PARM_DESC(rx_ring_size, Size of rx ring (in skbs));
+MODULE_PARM_DESC(disable_napi, Disable NAPI polling);
+MODULE_PARM_DESC(debug_level, Number of NETIF_MSG bits to enable);
+MODULE_PARM_DESC(dumb_switch, Assume switch is not connected to MDIO bus);
+
+/* frame size + 802.1q tag */
+#define CPMAC_SKB_SIZE (ETH_FRAME_LEN + 4)
+#define CPMAC_TX_RING_SIZE 8
+
+/* Ethernet registers */
+#define CPMAC_TX_CONTROL   0x0004
+#define CPMAC_TX_TEARDOWN  0x0008
+#define CPMAC_RX_CONTROL   0x0014
+#define CPMAC_RX_TEARDOWN  0x0018
+#define CPMAC_MBP  0x0100
+# define MBP_RXPASSCRC 0x4000
+# define MBP_RXQOS 0x2000
+# define MBP_RXNOCHAIN 0x1000
+# define MBP_RXCMF 0x0100
+# define MBP_RXSHORT   0x0080
+# define MBP_RXCEF 0x0040
+# define MBP_RXPROMISC 0x0020
+# define MBP_PROMISCCHAN(channel)  (((channel)  0x7)  16)
+# define MBP_RXBCAST   0x2000
+# define MBP_BCASTCHAN(channel)(((channel)  0x7)  8)
+# define MBP_RXMCAST   0x0020
+# define MBP_MCASTCHAN(channel)((channel)  0x7)
+#define CPMAC_UNICAST_ENABLE   0x0104
+#define CPMAC_UNICAST_CLEAR0x0108
+#define CPMAC_MAX_LENGTH   0x010c
+#define CPMAC_BUFFER_OFFSET0x0110
+#define CPMAC_MAC_CONTROL  0x0160
+# define MAC_TXPTYPE   0x0200
+# define MAC_TXPACE0x0040
+# define MAC_MII   0x0020
+# define MAC_TXFLOW0x0010
+# define MAC_RXFLOW0x0008
+# define MAC_MTEST 0x0004
+# define MAC_LOOPBACK

Re: [PATCH][MIPS][7/7] AR7: ethernet

2007-09-06 Thread Matteo Croce
Il Friday 07 September 2007 00:30:25 Andrew Morton ha scritto:
  On Thu, 6 Sep 2007 17:34:10 +0200 Matteo Croce [EMAIL PROTECTED] wrote:
  Driver for the cpmac 100M ethernet driver.
  It works fine disabling napi support, enabling it gives a kernel panic
  when the first IPv6 packet has to be forwarded.
  Other than that works fine.
  
 
 I'm not too sure why I got cc'ed on this (and not on patches 1-6?) but
 whatever.

I mailed every maintainer in the respective section in the file MAINTAINERS
and you were in the NETWORK DEVICE DRIVERS section

 This patch introduces quite a number of basic coding-style mistakes. 
 Please run it through scripts/checkpatch.pl and review the output.

Already done. I'm collecting other suggestions before committing

 The patch introduces vast number of volatile structure fields.  Please see
 Documentation/volatile-considered-harmful.txt.

Removing them and the kernel hangs at module load

 The patch inroduces a modest number of unneeded (and undesirable) casts of
 void*, such as
 
 + struct cpmac_mdio_regs *regs = (struct cpmac_mdio_regs *)bus-priv;
 
 please check for those and fix them up.

Done

 The driver implements a driver-private skb pool.  I don't know if this is
 something which we like net drivers doing?  If it is approved then surely
 there should be a common implementation for it somewhere?

Are you referring at cpmac_poll?

 The driver has some LINUX_VERSION_CODE ifdefs.  We usually prefer that such
 code not be present in a merged-up driver.

I will remove in the final release, now I need for testing: my running kernel
is older than current git

 
  +   priv-regs-mac_hash_low = 0x;
  +   priv-regs-mac_hash_high = 0x;
  +   } else {
  +   for (i = 0, iter = dev-mc_list; i  dev-mc_count;
  +   i++, iter = iter-next) {
  +   hash = 0;
  +   tmp = iter-dmi_addr[0];
  +   hash  ^= (tmp  2) ^ (tmp  4);
  +   tmp = iter-dmi_addr[1];
  +   hash  ^= (tmp  4) ^ (tmp  2);
  +   tmp = iter-dmi_addr[2];
  +   hash  ^= (tmp  6) ^ tmp;
  +   tmp = iter-dmi_addr[4];
  +   hash  ^= (tmp  2) ^ (tmp  4);
  +   tmp = iter-dmi_addr[5];
  +   hash  ^= (tmp  4) ^ (tmp  2);
  +   tmp = iter-dmi_addr[6];
  +   hash  ^= (tmp  6) ^ tmp;
  +   hash = 0x3f;
  +   if (hash  32) {
  +   hashlo |= 1hash;
  +   } else {
  +   hashhi |= 1(hash - 32);
  +   }
  +   }
  +
  +   priv-regs-mac_hash_low = hashlo;
  +   priv-regs-mac_hash_high = hashhi;
  +   }
 
 Do we not have a library function anywhere which will perform this little
 multicasting hash?

Can you tell me the function so i'll implement it?

  +static inline struct sk_buff *cpmac_rx_one(struct net_device *dev,
  +  struct cpmac_priv *priv,
  +  struct cpmac_desc *desc)
  +{
  +   unsigned long flags;
  +   char *data;
  +   struct sk_buff *skb, *result = NULL;
  +
  +   priv-regs-rx_ack[0] = virt_to_phys(desc);
  +   if (unlikely(!desc-datalen)) {
  +   if (printk_ratelimit())
  +   printk(KERN_WARNING %s: rx: spurious interrupt\n,
  +  dev-name);
  +   priv-stats.rx_errors++;
  +   return NULL;
  +   }
  +
  +   spin_lock_irqsave(priv-lock, flags);
  +   skb = cpmac_get_skb(dev);
  +   if (likely(skb)) {
  +   data = (char *)phys_to_virt(desc-hw_data);
  +   dma_cache_inv((u32)data, desc-datalen);
  +   skb_put(desc-skb, desc-datalen);
  +   desc-skb-protocol = eth_type_trans(desc-skb, dev);
  +   desc-skb-ip_summed = CHECKSUM_NONE;
  +   priv-stats.rx_packets++;
  +   priv-stats.rx_bytes += desc-datalen;
  +   result = desc-skb;
  +   desc-skb = skb;
  +   } else {
  +#ifdef CPMAC_DEBUG
  +   if (printk_ratelimit())
  +   printk(%s: low on skbs, dropping packet\n,
  +  dev-name);
  +#endif
  +   priv-stats.rx_dropped++;
  +   }
  +   spin_unlock_irqrestore(priv-lock, flags);
  +
  +   desc-hw_data = virt_to_phys(desc-skb-data);
  +   desc-buflen = CPMAC_SKB_SIZE;
  +   desc-dataflags = CPMAC_OWN;
  +   dma_cache_wback((u32)desc, 16);
  +
  +   return result;
  +}
 
 This function is far too large to be inlined.
 
  +static irqreturn_t cpmac_irq(int irq, void *dev_id)
  +{
  +   struct net_device *dev = (struct net_device *)dev_id;
 
 unneeded cast

fixed