Re: [net-next PATCH v2] drivers: net: cpsw: Add support to make gpio drive which slave connected to phy

2015-09-03 Thread Tony Lindgren
* Mugunthan V N  [150902 23:05]:
> In DRA72x EVM, by default slave 1 is connected to the onboard
> phy, but slave 2 pins are also muxed with video input module
> which is controlled by pcf857x gpio and currently to select slave
> 0 to connect to phy gpio hogging is used, but with
> omap2plus_defconfig the pcf857x gpio is built as module. So when
> using NFS on DRA72x EVM, board doesn't boot as gpio hogging do
> not set proper gpio state to connect slave 0 to phy as it is
> built as module and you do not see any errors for not setting
> gpio and just mentions dhcp reply not got.
> 
> To solve this issue, introducing "mode-gpio" in DT when gpio
> based muxing is required. This will throw a warning when gpio
> get fails and returns probe defer. When gpio-pcf857x module is
> installed, cpsw probes again and ethernet becomes functional.
> Verified this on DRA72x with pcf as module and ramdisk.

Hmm you might be able to make it even a little bit more generic.
The gpios can be an array.. So typically they're named "-gpios":

[linux] $ git grep "\-gpio " arch/arm/boot/dts/*.dts* | wc -l
219
[linux] $ git grep "\-gpios " arch/arm/boot/dts/*.dts* | wc -l
704

So I'd use mode-gpios even though there's just one gpio in
this case. Up to you though, and should be retested after
the change naturally. At some point gpio code was not parsing
"gpio" or "gpios" properly.. But that's probably been fixed
a long time ago.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/2] couple of sctp fixes for 0ca50d12fe46

2015-09-03 Thread Neil Horman
On Wed, Sep 02, 2015 at 04:20:20PM -0300, Marcelo Ricardo Leitner wrote:
> These are two fixes for sctp after my patch on 0ca50d12fe46 ("sctp: fix
> src address selection if using secondary addresses")
> 
> The first, fix a dst leak on those it decided to skip.
> 
> The second, adds the fallback on src selection that Vlad had asked
> about. Unfortunatelly a lot of ipvs setups relies on the old behavior
> and I don't see a better fix for it.
> 
> Please consider both to -stable tree.
> 
> Thanks!
> 
> Marcelo Ricardo Leitner (2):
>   sctp: fix dst leak
>   sctp: add routing output fallback
> 
>  net/sctp/protocol.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> -- 
> 2.4.3
> 
> 

for the series
Acked-by: Neil Horman 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 08/10] lib: auxiliary files for auto-generated asm-generic files of libos

2015-09-03 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/include/asm/Kbuild   | 57 
 arch/lib/include/asm/atomic.h | 62 +++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 +
 arch/lib/include/asm/current.h|  7 
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 
 arch/lib/include/asm/pgtable.h| 30 +
 arch/lib/include/asm/processor.h  | 19 +++
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 
 arch/lib/include/asm/thread_info.h| 36 
 arch/lib/include/asm/uaccess.h| 14 
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 334 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index ..c647b1ca8cca
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index ..f72c3a8ca48c
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,62 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include 
+#include 
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)&(v)->counter)
+static inline void atomic64_add(long i, atomic64_t *v)
+{
+   v->counter += i;
+}
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v->counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v->counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)

[PATCH v6 10/10] lib: tools used for test scripts

2015-09-03 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki 
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index ..57a74a05482c
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index ..a27eb84e7712
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo "bzr is missing" 
&& ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo "autotools 
is missing" && ./bake/bake.py check && exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
"pygoocanvas|graphviz|python-dev" | grep Missing && (echo "required packages 
are missing") || echo ""
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build $(BAKEBUILD_PARAMS)
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index ..51ac5a52336e
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index ..9377ac3214c1
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM=" -s dce-umip"
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index ..e81e2d84c156
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE="-v"
+VALGRIND=""
+FAULT_INJECTION=""
+
+if [ "$1" = "-g" ] ; then
+ VALGRIND="-g"
+# Not implemneted yet.
+#elif [ "$1" = "-f" ] ; then
+# FAULT_INJECTION="-f"
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index ..198e7e4c66ac
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/"`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat > ${NUSE_CONF} << ENDCONF
+
+interface ${IFNAME}
+   address ${IPADDR}
+   netmask 255.255.255.0
+   macaddr 00:01:01:01:01:02
+   viftype RAW
+
+route

[PATCH v6 06/10] lib: sysctl handling (kernel glue code)

2015-09-03 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registered via
lib_init() API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index ..5f08f9f97103
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sim-assert.h"
+#include "sim-types.h"
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int nr_pdflush_threads = 0;
+unsigned long 

[PATCH v6 09/10] lib: libos build scripts and documentation

2015-09-03 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 235 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 655 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1265 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index ..fbf7946f42ef
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a shared part of

[PATCH v6 07/10] lib: other kernel glue layer code

2015-09-03 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Christoph Paasch 
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 +
 arch/lib/glue.c   | 284 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 ++
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  36 +++
 arch/lib/random.c |  54 ++
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 731 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index ..3a1f30129fb7
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "linux/capability.h"
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index ..ce424ffae8c2
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index ..33efe5f1da32
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include 
+
+#include "sim-assert.h"
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = _mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return _mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type->name;
+
+   if (flags & MS_KERNMOUNT)
+   mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type->mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt->mnt.mnt_root = root;
+   mnt->mnt.mnt_sb = root->d_sb;
+   mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+   mnt->mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(>mnt_instance, >d_sb->s_mounts);
+
+   return >mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 65535;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index ..bdbed913ee9e
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,284 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-03 Thread Jay Vosburgh
Uwe Koziolek  wrote:

>On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:
>> On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:
>>> On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:
 Uwe Koziolek  wrote:

> On2015-08-17 07:12 PM,Jarod Wilson wrote:
>> On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
>>> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
 From: Uwe Koziolek 

 With some very finicky switch hardware, active backup bonding can get
 into
 a situation where we play ping-pong between interfaces, trying to get
 one
 to come up as the active slave. There seems to be an issue with the
 switch's arp replies either taking too long, or simply getting lost,
 so we
 wind up unable to get any interface up and active. Sometimes, the issue
 sorts itself out after a while, sometimes it doesn't.

 Testing with num_grat_arp has proven fruitless, but sending an
 additional
 arp on curr_arp_slave if we're still in the arp_interval timeslice in
 bond_ab_arp_probe(), has shown to produce 100% reliability in testing
 with
 this hardware combination.
>>> Sorry, I don't understand the logic of why it works, and what exactly
>>> are
>>> we fixiing here.
>>>
>>> It also breaks completely the logic for link state management in case
>>> of no
>>> current active slave for 2*arp_interval.
>>>
>>> Could you please elaborate what exactly is fixed here, and how it
>>> works? :)
>> I can either duplicate some information from the bug, or Uwe can, to
>> illustrate the exact nature of the problem.
>>
>>> p.s. num_grat_arp maybe could help?
>> That was my thought as well, but as I understand it, that route was
>> explored, and it didn't help any. I don't actually have a reproducer
>> setup of my own, unfortunately, so I'm kind of caught in the middle
>> here...
>>
>> Uwe, can you perhaps further enlighten us as to what num_grat_arp
>> settings were tried that didn't help? I'm still of the mind that if
>> num_grat_arp *didn't* help, we probably need to do something keyed off
>> num_grat_arp.
> The bonding slaves are connected to high available switches, each of the
> slaves is connected to a different switch. If the bond is starting, only
> the selected slave sends one arp-request. If a matching arp_response was
> received, this slave and the bond is going into state up, sending the
> gratitious arps...
> But if you got no arp reply the next slave was selected.
> With most of the newer switches, not overloaded, or with other software
> bugs, or with a single switch configuration, you would get a arp response
> on the first arp request.
> But in case of high availability configuration with non perfect switches
> like HP ProCurve 54xx, also with some Cisco models, you may not get a
> response on the first arp request.
>
> I have seen network snoops, there the switches are not responding to the
> first arp request on slave 1, the second arp request was sent on slave 2
> but the response was received on slave one,  and all following arp
> requests are anwsered on the wrong slave for a longer time.
Could you elaborate on the exact "high availability
 configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
 updating the forwarding tables and the bond flip flopping between the
 slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
 the forwarding table, but (for whatever reason) there is no reply.  ARP
 on slave 2 is sent (using the same source MAC, unless you set
 fail_over_mac), but forwarding tables still send that MAC to slave 1, so
 reply is sent there.
>>> High availability:
>>> 2 managed switches with routing capabilities have an interconnect.
>>> One slave of a bonding interface is connected to the first switch, the
>>> second slave is connected to the other switch.
>>> The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
>>> remember also HP E 3500 and  E 3800 are also
>>> affected, for the affected Cisco models I can't answer today.
>>> Affected single switch configurations was not seen.
>>>
>>> Yes, race conditions with delayed upgrades of the forwarding tables is a
>>> well matching explanation for the problem.
>>>
> The proposed change sents up to 3 arp requests on a down bond using the
> same slave, delayed by arp_interval.
> Using problematic switches i have seen the the arp response on the right
> slave at latest on the second arp request. So the bond is going into state
> up.
>
> How does it works:
> 

RE: [PATCH net-next 1/1] net: fec: clear receive interrupts before processing a packet

2015-09-03 Thread Duan Andy
From: Philippe De Muyter  Sent: Thursday, September 03, 2015 4:00 
PM
> To: Duan Fugang-B38611
> Cc: da...@davemloft.net; netdev@vger.kernel.org; li...@arm.linux.org.uk
> Subject: Re: [PATCH net-next 1/1] net: fec: clear receive interrupts
> before processing a packet
> 
> Hi Andy,
> 
> can you resubmit it, adding also my
> 
> Reported-by: Philippe De Muyter 
> 
> and explaining that it also prevents a complete rx blockage failure ?
> 
> Philippe
> 
Sorry, consider the patch was submitted/reviewed/applied, so I just re-submited 
it with keeping the original author/commit log/sign-in information.

> On Wed, Sep 02, 2015 at 11:40:15AM +0200, Philippe De Muyter wrote:
> > On Wed, Sep 02, 2015 at 05:24:14PM +0800, Fugang Duan wrote:
> > > From: Russell King 
> > >
> > > The patch just to re-submit the patch "db3421c114cfa6326" because
> > > the patch "4d494cdc92b3b9a0" remove the change.
> >
> > I think you should mention also the titles of the commits.
> >
> > And maybe send it also to stable.
> > >
> > > Clear any pending receive interrupt before we process a pending
> packet.
> > > This helps to avoid any spurious interrupts being raised after we
> > > have fully cleaned the receive ring, while still allowing an
> > > interrupt to be raised if we receive another packet.
> > >
> > > The position of this is critical: we must do this prior to reading
> > > the next packet status to avoid potentially dropping an interrupt
> > > when a packet is still pending.
> > >
> > > Acked-by: Fugang Duan 
> > > Signed-off-by: Russell King 
> > > ---
> > >  drivers/net/ethernet/freescale/fec_main.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/net/ethernet/freescale/fec_main.c
> > > b/drivers/net/ethernet/freescale/fec_main.c
> > > index 1f89c59..6bed0ff 100644
> > > --- a/drivers/net/ethernet/freescale/fec_main.c
> > > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > > @@ -1400,6 +1400,7 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
> > >   if ((status & BD_ENET_RX_LAST) == 0)
> > >   netdev_err(ndev, "rcv is not +last\n");
> > >
> > Could a comment be added here to avoid another future removal ?
> >
> > > + writel(FEC_ENET_RXF, fep->hwp + FEC_IEVENT);
> > >
> > >   /* Check for errors. */
> > >   if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
> > > --
> > > 1.9.1
> >
> > Philippe
> 
> --
> Philippe De Muyter +32 2 6101532 Macq SA rue de l'Aeronef 2 B-1140
> Bruxelles
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-09-03 Thread Hajime Tazaki
context primitives of kernel such as soft interrupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 238 ++
 4 files changed, 828 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index ..98a568a16903
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "lib.h"
+#include "sim.h"
+#include "sim-assert.h"
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(>kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid->numbers[0].nr = pid;
+   cred->fsuid = make_kuid(current_user_ns(), 0);
+   cred->fsgid = make_kgid(current_user_ns(), 0);
+   cred->user = user;
+   atomic_set(>usage, 1);
+   info->task = >kernel_task;
+   info->preempt_count = 0;
+   info->flags = 0;
+   atomic_set(>count, 1);
+   ns->uts_ns = 0;
+   ns->ipc_ns = 0;
+   ns->mnt_ns = 0;
+   ns->pid_ns_for_children = 0;
+   ns->net_ns = _net;
+   task->kernel_task.cred = cred;
+   task->kernel_task.pid = pid;
+   task->kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task->kernel_task.nsproxy = ns;
+   task->kernel_task.stack = info;
+   /* this is a hack. */
+   task->kernel_task.group_leader = >kernel_task;
+   task->private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task->kernel_task.nsproxy);
+   lib_free((void *)task->kernel_task.cred);
+   lib_free((void *)task->kernel_task.cred->user);
+   free_thread_info(task->kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task->private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task->kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return _task->kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   list_add(>task_list, >task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(>task_list, >task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait->task_list.prev != LIST_POISON2)
+   list_del(>task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(>task_list))
+   list_add_tail(>task_list, >task_list);
+   set_current_state(state);
+}
+void 

[PATCH v6 03/10] lib: public headers and API implementations for userspace programs

2015-09-03 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 +++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +
 arch/lib/lib-socket.c | 370 ++
 arch/lib/lib.c| 296 +
 arch/lib/lib.h|  21 +++
 9 files changed, 1148 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index ..974122c3a0f1
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include "sim-printf.h"
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf("Assert failed %s:%u \"" #v "\"\n",  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index ..e871a594b82c
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include 
+#include "sim-types.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice 

[PATCH v6 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-09-03 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki 
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 209 +++
 5 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a99f0e5243e1..104c1aeec560 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index ..d9fe7d59bd4e
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eaeabdcb..7d8314f95ce3 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 8da63e4e470f..2cf4f0f67a19 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include 
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#endif
+
 #include 
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index ..974c8aed0275
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,209 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+#include 
+#include 
+#include 
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0 && (flags & __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, , sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret && p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache->name = name;
+   cache->size = size;
+   cache->align = align;
+   cache->flags = flags;
+   cache->ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{
+   kfree(cache);
+}
+int kmem_cache_shrink(struct kmem_cache *cache)
+{

[PATCH] net: tipc: fix stall during bclink wakeup procedure

2015-09-03 Thread Kolmakov Dmitriy
From: Dmitry S Kolmakov 
 
If an attempt to wake up users of broadcast link is made when there is
no enough place in send queue than it may hang up inside the
tipc_sk_rcv() function since the loop breaks only after the wake up
queue becomes empty. This can lead to complete CPU stall with the
following message generated by RCU:
 
INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies g=54225 
c=54224 q=11465)
Task dump for CPU 0:
tpchR  running task0 39949  39948 0x000a
 818536c0 88181fa037a0 8106a4be 
 818536c0 88181fa037c0 8106d8a8 88181fa03800
 0001 88181fa037f0 81094a50 88181fa15680
Call Trace:
   [] sched_show_task+0xae/0x120
 [] dump_cpu_task+0x38/0x40
 [] rcu_dump_cpu_stacks+0x90/0xd0
 [] rcu_check_callbacks+0x3eb/0x6e0
 [] ? account_system_time+0x7f/0x170
 [] update_process_times+0x34/0x60
 [] tick_sched_handle.isra.18+0x31/0x40
 [] tick_sched_timer+0x3c/0x70
 [] __run_hrtimer.isra.34+0x3d/0xc0
 [] hrtimer_interrupt+0xc5/0x1e0
 [] ? native_smp_send_reschedule+0x42/0x60
 [] local_apic_timer_interrupt+0x34/0x60
 [] smp_apic_timer_interrupt+0x3c/0x60
 [] apic_timer_interrupt+0x6b/0x70
 [] ? _raw_spin_unlock_irqrestore+0x9/0x10
 [] __wake_up_sync_key+0x4f/0x60
 [] tipc_write_space+0x31/0x40 [tipc]
 [] filter_rcv+0x31f/0x520 [tipc]
 [] ? tipc_sk_lookup+0xc9/0x110 [tipc]
 [] ? _raw_spin_lock_bh+0x19/0x30
 [] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
 [] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
 [] tipc_node_unlock+0x186/0x190 [tipc]
 [] ? kfree_skb+0x2c/0x40
 [] tipc_rcv+0x2ac/0x8c0 [tipc]
 [] tipc_l2_rcv_msg+0x38/0x50 [tipc]
 [] __netif_receive_skb_core+0x5a3/0x950
 [] __netif_receive_skb+0x13/0x60
 [] netif_receive_skb_internal+0x1e/0x90
 [] napi_gro_receive+0x78/0xa0
 [] tg3_poll_work+0xc54/0xf40 [tg3]
 [] ? consume_skb+0x2c/0x40
 [] tg3_poll_msix+0x41/0x160 [tg3]
 [] net_rx_action+0xe2/0x290
 [] __do_softirq+0xda/0x1f0
 [] irq_exit+0x76/0xa0
 [] do_IRQ+0x55/0xf0
 [] common_interrupt+0x6b/0x6b
 

The issue occurs only when tipc_sk_rcv() is used to wake up postponed senders:

tipc_bclink_wakeup_users()
// wakeupq - is a queue which consists of special 
//   messages with SOCK_WAKEUP type. 
tipc_sk_rcv(wakeupq)
...
while (skb_queue_len(inputq)) {
filter_rcv(skb)
// Here the type of message is checked 
// and if it is SOCK_WAKEUP than
// it tries to wake up a sender.
tipc_write_space(sk)

wake_up_interruptible_sync_poll()
}

After the sender thread is woke up it can gather control and perform an attempt 
to send a message. But if there is no enough place in send queue it will call 
link_schedule_user() function which puts a message of type SOCK_WAKEUP to the 
wakeup queue and put the sender to sleep. Thus the size of the queue actually 
is not changed and the while() loop never exits. 

The approach I proposed is to wake up only senders for which there is enough 
place in send queue so the described issue can't occur. Moreover the same 
approach is already used to wake up senders on unicast links.

I have got into the issue on our product code but to reproduce the issue I 
changed a benchmark test application (from tipcutils/demos/benchmark) to 
perform the following scenario:
1. Run 64 instances of test application (nodes). It can be done on the 
one physical machine.
2. Each application connects to all other using TIPC sockets in RDM 
mode.
3. When setup is done all nodes start simultaneously send broadcast 
messages. 
4. Everything hangs up.

The issue is reproducible only when a congestion on broadcast link occurs. For 
example, when there are only 8 nodes it works fine since congestion doesn't 
occur. Send queue limit is 40 in my case (I use a critical importance level) 
and when 64 nodes send a message at the same moment a congestion occurs every 
time.

Signed-off-by: Dmitry S Kolmakov 
---
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index c5cbdcb..997dd60 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -169,6 +169,30 @@ static void bclink_retransmit_pkt(struct tipc_net *tn, u32 
after, u32 to)
 }
 
 /**
+ * bclink_prepare_wakeup - prepare users for wakeup after congestion
+ * @bcl: broadcast link
+ * @resultq: queue for users which can be woken up
+ * Move a number of waiting users, as permitted by available space in
+ * the send queue, from link wait queue to specified queue for wakeup
+ */
+static void bclink_prepare_wakeup(struct tipc_link *bcl, struct sk_buff_head 
*resultq)
+{

Re: [PATCH net 0/2] couple of sctp fixes for 0ca50d12fe46

2015-09-03 Thread Vlad Yasevich
On 09/02/2015 03:20 PM, Marcelo Ricardo Leitner wrote:
> These are two fixes for sctp after my patch on 0ca50d12fe46 ("sctp: fix
> src address selection if using secondary addresses")
> 
> The first, fix a dst leak on those it decided to skip.
> 
> The second, adds the fallback on src selection that Vlad had asked
> about. Unfortunatelly a lot of ipvs setups relies on the old behavior
> and I don't see a better fix for it.
> 
> Please consider both to -stable tree.
> 
> Thanks!
> 
> Marcelo Ricardo Leitner (2):
>   sctp: fix dst leak
>   sctp: add routing output fallback
> 
>  net/sctp/protocol.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 

For the series

Acked-by: Vlad Yasevich 

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread Alexander Duyck

On 09/03/2015 03:05 AM, Nikola Forró wrote:

Man page of ip-route(8) says following about route types:

   unreachable - these destinations are unreachable.  Packets are dis‐
   carded and the ICMP message host unreachable is generated.  The local
   senders get an EHOSTUNREACH error.

   blackhole - these destinations are unreachable.  Packets are dis‐
   carded silently.  The local senders get an EINVAL error.

   prohibit - these destinations are unreachable.  Packets are discarded
   and the ICMP message communication administratively prohibited is
   generated.  The local senders get an EACCES error.

In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.

In both address families all three route types now behave consistently
with documentation.

Signed-off-by: Nikola Forró 
---
  include/net/ip_fib.h | 22 +-
  net/ipv4/route.c |  6 --
  net/ipv6/route.c |  4 +++-
  3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 5fa643b..8e7b3e1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -233,8 +233,11 @@ static inline int fib_lookup(struct net *net, const struct 
flowi4 *flp,
rcu_read_lock();
  
  	tb = fib_get_table(net, RT_TABLE_MAIN);

-   if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
-   err = 0;
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
+   if (err == -EAGAIN)
+   err = -ENETUNREACH;
+   }
  
  	rcu_read_unlock();
  


The likelihood of tb being NULL is next to 0%.  You would probably be 
better off pulling out the EAGAIN check from the if statement and just 
placing it before the rcu_read_unlock with the correct indentation.



@@ -267,11 +270,20 @@ static inline int fib_lookup(struct net *net, struct 
flowi4 *flp,
  
  	for (err = 0; !err; err = -ENETUNREACH) {

tb = rcu_dereference_rtnl(net->ipv4.fib_main);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
-   break;
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags);
+   if (!err)
+   break;
+   }
  
  		tb = rcu_dereference_rtnl(net->ipv4.fib_default);

-   if (tb && !fib_table_lookup(tb, flp, res, flags))
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags);
+   if (!err)
+   break;
+   }
+
+   if (err && err != -EAGAIN)
break;
}
  


The loop stuff can just be dropped.  This code is now getting a bit too 
tangled up to justify it.  Probably just use some goto labels instead.  
I would probably just initialize err to -ENETUNREACH and drop the check 
for err at the end and just handle the -EAGAIN case.


I would also probably just pull the -EAGAIN check and place it at the 
end before the unlock and after whatever label it is you add. No point 
in optimizing for unlikely cases.



diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e681b85..4ce3f87 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2020,6 +2020,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
struct fib_result res;
struct rtable *rth;
int orig_oif;
+   int err = ENETUNREACH;
  
  	res.tclassid	= 0;

res.fi  = NULL;
@@ -2123,7 +2124,8 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
goto make_route;
}
  
-	if (fib_lookup(net, fl4, , 0)) {

+   err = fib_lookup(net, fl4, , 0);
+   if (err) {
res.fi = NULL;
res.table = NULL;
if (fl4->flowi4_oif) {
@@ -2151,7 +2153,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
res.type = RTN_UNICAST;
goto make_route;
}
-   rth = ERR_PTR(-ENETUNREACH);
+   rth = ERR_PTR(err);
goto out;
}
  
diff --git a/net/ipv6/route.c b/net/ipv6/route.c

index d155864..d33a6a5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1847,9 +1847,11 @@ int ip6_route_add(struct fib6_config *cfg)
rt->dst.input = ip6_pkt_prohibit;
break;
case RTN_THROW:
+   case RTN_UNREACHABLE:
default:
rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN
-   : -ENETUNREACH;
+   

[PATCH v6 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-09-03 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki 
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fdda62e6115e..e1003cf51d22 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -57,7 +57,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -99,8 +99,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -335,7 +336,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(_lock);
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (entry && use_table(head))
*phead = head;
else
@@ -356,7 +357,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -374,7 +375,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -707,7 +708,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, , ); h; next_entry(, )) {
+   for (ctl_table_first_entry(ctl_dir, , ); h;
+ctl_table_next_entry(, )) {
if (!scan(h, entry, , file, ctx)) {
sysctl_head_finish(h);
break;
@@ -865,7 +867,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry->mode))
@@ -961,13 +963,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir->header.parent)
return >dir;
-   parent = xlate_dir(set, dir->header.parent);
+   parent = ctl_table_xlate_dir(set, dir->header.parent);
if (IS_ERR(parent))
return parent;
procname = dir->header.ctl_table[0].procname;
@@ -988,13 +990,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(_lock);
root = (*pentry)->data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)->parent);
+   dir = ctl_table_xlate_dir(set, (*phead)->parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)->procname;
head = NULL;
-   entry = find_entry(, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry && use_table(head)) {
unuse_table(*phead);
@@ -1106,7 +1108,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry->procname; entry++) {
const char *procname = entry->procname;
-   link = find_entry(, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(, dir, procname, 
strlen(procname));
if (!link)
return false;
if (S_ISDIR(link->mode) && S_ISDIR(entry->mode))
@@ 

[PATCH v6 04/10] lib: time handling (kernel glue code)

2015-09-03 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/hrtimer.c | 117 ++
 arch/lib/tasklet-hrtimer.c |  57 +
 arch/lib/time.c| 116 ++
 arch/lib/timer.c   | 299 +
 4 files changed, 589 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index ..6a99bad6c5b7
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,117 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim-assert.h"
+#include "sim.h"
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer->function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer->_softexpires),
+ , timer);
+   timer->base = event;
+   } else {
+   /* mark as completed. */
+   timer->base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   "slack" range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer->_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, , timer);
+   timer->base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer->base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer->base);
+   timer->base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret >= 0);
+   return ret;
+}
+void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index ..fef4902d4938
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim.h"
+#include "sim-assert.h"
+
+static enum hrtimer_restart 

[PATCH v6 00/10] an introduction of Linux library operating system (LibOS)

2015-09-03 Thread Hajime Tazaki
Although I've been kept quiet, I'm preparing this 6th version of Linux
LibOS patchset.

During the time, I've observed the growth of our branch and seen two
good news regarding the maintenance overhead and benefit of libos patches.

* news 1

To answer the following suggestion raised by Richard Weinberger, I've
observed the maintenance burden (manual modification, follow stub
function signature, etc) which we were worried about.

> I'd suggest the following:
> Maintain LibOS in your git tree and follow Linus' tree.
> Make sure that all kernel releases build and work.
> 
> This way you can experiment with automation and other
> stuff. If it works well you can ask for mainline inclusion
> after a few kernel releases.
> 
> Your git history will show how much maintenance burden
> LibOS has and how much with every merge window breaks and
> needs manual fixup.


here is the list of commits which I updated stub functions since
around 4.0 release. As I mentioned, each of stub functions updates are
mostly happened during merge window and those are 2 to 5. We may see
similar numbers in coming the 4.3 version.

4.1-4.2
- 2000c14 lib: fix the signature of vfs_caches_init() (4.2-rc6)
- 576b8d7 lib: fix new proc behavior for emulated proc files (4.2-rc1)
- b72596c lib: fix workqueue stub for new header (4.2-rc1)
- 06289a2 merge conflicts (MAINTAINERS) (4.2-rc1)
- 12c0b79 lib: fix new signature of struct timer_list (4.2-rc0)
4.0-4.1
- acbf6ed lib: fix __ktime_divns() to latest update (4.1-rc7)
- 0fe9ba4 lib: merge fix from net-next around 4.0.0-next (4.1-rc0)
- 070691d lib: fix new sock_sendmsg() API (4.1-rc0)
3.19-4.0
- 609d8c7 lib: adapt linux-4.0.0-rc7
- 796347a lib: adapt linux-4.0.0-rc5

the full list of commit history can be found below, which also
includes other commits of libos enhancements.

https://github.com/libos-nuse/net-next-nuse/commits/master?author=thehajime

Considering the number of commits in Linus tree since v4.0 to now
(4.2+) is around 29K, I think the number of stubs update (in libos) is
not that big matter.

I believe saying something with the-number-of-commits makes almost
zero sense, but will help to smell something a bit.

I would like to hear your honest opinions.


* news 2
on the other hand, I also have a good news which libos has detected a
couple of regressions in net-next tree.

- [net-next,v2] ipv6: Do not iterate over all interfaces when finding source 
address on specific interface. (v4.2-rc0)
 patchwork:
 http://patchwork.ozlabs.org/patch/493675/
 detected by:
 http://ns-3-dce.cloud.wide.ad.jp/jenkins/job/daily-net-next-sim/958/

- [v3] ipv6: Fix protocol resubmission (v4.1-rc7)
 patchwork:
 http://patchwork.ozlabs.org/patch/482645/
 detected by:
 http://ns-3-dce.cloud.wide.ad.jp/jenkins/job/umip-net-next/716/

- [net-next] ipv6: Check RTF_LOCAL on rt->rt6i_flags instead of rt->dst.flags
 patchwork:
 http://patchwork.ozlabs.org/patch/467447/
 detected by:
 http://ns-3-dce.cloud.wide.ad.jp/jenkins/job/daily-net-next-sim/878/

- [net-next] xfrm6: Fix a offset value for network header in _decode_session6 
(v3.19-rc7?)
 patchwork:
 http://patchwork.ozlabs.org/patch/436351/

some of detected bugs with other tests like kbuild test robot are not
included: above bugs purely require real and (sometimes) complex
setup, which DCE (libos) eases with the virtualized environment.



changes from v5:
- Patch 09/10 ("lib: libos build scripts and documentation")
1) introduce symbol namespace for the symbols in Linux libos to avoid conflicts
- Patch 04/10 ("lib: time handling (kernel glue code)")
2) lib: un-stub timekeeping code and reuse timekeeping.c and co.
- Overall
3) rebased to Linux 4.2+ (revision 1e1a4e8f439113b7820bc7150569f685e1cc2b43)

changes from v4:
- Patch 09/10 ("lib: libos build scripts and documentation")
1) lib: fix dependency detection of kernel/time/timeconst.h
   (commented by Richard Weinberger)
- Overall
2) rebased to Linux 4.1-rc3 (4cfceaf0c087f47033f5e61a801f4136d6fb68c6)

changes from v3:
- Patch 09/10 ("lib: libos build scripts and documentation")
1) Remove RFC (now it's a proposal)
2) build environment cleanup (commented by Paul Bolle)
- Overall
3) change based tree from arnd/asm-generic to torvalds/linux.git
   (commented by Richard Weinberger)
4) rebased to Linux 4.1-rc1 (b787f68c36d49bb1d9236f403813641efa74a031)
5) change the title of cover letter a bit

changes from v2:
- Patch 02/11 ("slab: add private memory allocator header for arch/lib")
1) add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
   to 02 (commented by Christoph Lameter)
- Overall
2) rewrite commit log messages

changes from v1:
- Patch 01/11 ("sysctl: make some functions unstatic to access by arch/lib"):
1) add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
- Patch 08/11 ("lib: other kernel glue layer code"):
2) significantly reduce glue codes (stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild 

Re: [patch net] switchdev: fix return value of switchdev_port_fdb_dump in case of error

2015-09-03 Thread Samudrala, Sridhar

On 9/3/2015 5:04 AM, Jiri Pirko wrote:

From: Jiri Pirko 

switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is
idx, so we cannot return errval.

Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via 
switchdev_port_obj ops.")
Signed-off-by: Jiri Pirko 

Acked-by: Sridhar Samudrala 


---
  net/switchdev/switchdev.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 16c1c43..fda38f8 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -853,12 +853,8 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
.cb = cb,
.idx = idx,
};
-   int err;
-
-   err = switchdev_port_obj_dump(dev, );
-   if (err)
-   return err;
  
+	switchdev_port_obj_dump(dev, );

return dump.idx;
  }
  EXPORT_SYMBOL_GPL(switchdev_port_fdb_dump);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread David Miller
From: Nikola Forró 
Date: Thu, 03 Sep 2015 11:08:51 +0200

> @@ -233,8 +233,10 @@ static inline int fib_lookup(struct net *net, const 
> struct flowi4 *flp,
>   rcu_read_lock();
>  
>   tb = fib_get_table(net, RT_TABLE_MAIN);
> - if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
> - err = 0;
> + if (tb)
> + err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
> + if (err == -EAGAIN)
> + err = -ENETUNREACH;

You didn't test this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net] switchdev: fix return value of switchdev_port_fdb_dump in case of error

2015-09-03 Thread Scott Feldman
On Thu, Sep 3, 2015 at 5:04 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is
> idx, so we cannot return errval.
>
> Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via 
> switchdev_port_obj ops.")
> Signed-off-by: Jiri Pirko 
Acked-by: Scott Feldman
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Linus Torvalds
On Wed, Sep 2, 2015 at 10:35 PM, David Miller  wrote:
>
> Another merge window, another set of networking changes.  I've heard
> rumblings that the lightweight tunnels infrastructure has been voted
> networking change of the year.

.. and others say that the most notable feature is the idiotic bugs
that it introduces, and the compiler even complains about.

Christ, people. Learn C, instead of just stringing random characters
together until it compiles (with warnings).

This:

  static bool rate_control_cap_mask(struct ieee80211_sub_if_data *sdata,
   struct ieee80211_supported_band *sband,
   struct ieee80211_sta *sta, u32 *mask,
   u8 mcs_mask[IEEE80211_HT_MCS_MASK_LEN])

is horribly broken to begin with, because array arguments in C don't
actually exist. Sadly, compilers accept it for various bad historical
reasons, and silently turn it into just a pointer argument. There are
arguments for them, but they are from weak minds.

But happily gcc has a really really valid warning (kudos - I often end
up ragging on the bad warnings gcc has, but this one is a keeper),
because a few lines down the mistake then turns into pure and utter
garbage.

It's garbage that was basically encouraged by the first mistake
(thinking that C allows array arguments), namely:

  for (i = 0; i < sizeof(mcs_mask); i++)

the "sizeof(mcs_mask)" is _shit_. Since array arguments don't actually
exist in C, it is the size of the pointer, not the array. The first
mistake makes the bug look like reasonable code. Although I'd argue
that the code would actually be bad regardless, since "sizeof" is the
size in bytes, and the code actually wants the number of entries (and
we do have ARRAY_SIZE() for that).

Sure, in this case the entries are just one byte each, so it would
have *worked* had it not been for the array argument issue, but it's
misleading and the code is just fundamentally buggy and nonsensical in
two entirely different ways that fed on each other.

That line should read

  for (i = 0; i < IEEE80211_HT_MCS_MASK_LEN; i++)

and the argument should just have been declared as the pointer it actually is.

A later patch then added onto the pile of manure by adding *another*
broken array argument, but at least that one then used the proper loop
for traversal of that array.

The fact that I notice this bug from a very basic "let's just compile
each pull request and make sure it isn't complete crap" is sad.

Now, it *looks* like the code was just moved, and the "sizeof()" was
initially correct (because it was a size of an actual array). Well, it
was "correct" in the sense that it generated the right code, even if
the whole confusion between "number of entries" and "size in bytes"
was still there. Then it got moved and turned from "confused but
happens to generate correct code" into "buggy pile of bovine manure".
See commit 90c66bd2232a ("mac80211: remove ieee80211_tx_rate
dependency in rate mask code").

So I can see how this bug happened, and I am only slightly upset with
Lorenzo who is the author of that commit.

What I can't see is why the code has existed in at least two
maintainer trees (Johannes' and David's) for a couple of weeks, and
nobody cared about the new compiler warnings? And it was sent to me
despite that new warning?

I realy want people to take a really hard look at functions that use
arrays as arguments. It really is very misleading, even if it can look
"prettier", and some people will argue that it's "documentation" about
how the pointer is a particular size. But it's neither. It's basically
just lying about what is going on, and the only thing it documents is
"I don't know how to C". Misleading documentation isn't documentation,
it's a mistake.

I see it in that file for at least the functions rate_idx_match_mask()
and rate_control_cap_mask(). I tried - and failed - to come up with a
reasonable grep pattern to try to see how common it is, and I'm too
lazy to add some sparse check for it.

Please people. When I see these kinds of obviously bogus code
problems, that just makes me very upset. Because it makes me worry
about all the non-obvious stuff that I miss.  Sadly, this time I had
pushed out the merge early (because I wanted to test the wireless
changes on my laptop), so now the bug is out there.

I'm not sure what the practical *impact* of the bug is. Yes, it only
traverses four or eight rate entries (depending on 32-bit or
64-bitness of the kernel) out of the ten that it should. But maybe in
practice one of the first entries are always good enough matches. So
maybe _testing_ doesn't actually show this bug, but I sure wish people
just took compiler warnings more seriously (and were a lot more
careful about moving things to functions, and never ever used the
"function argument is an array" model).

   Linus
--
To unsubscribe from this list: send the line 

Re: [PATCH v3] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread David Miller
From: Nikola Forró 
Date: Thu, 03 Sep 2015 12:05:22 +0200

> @@ -233,8 +233,11 @@ static inline int fib_lookup(struct net *net, const 
> struct flowi4 *flp,
>   rcu_read_lock();
>  
>   tb = fib_get_table(net, RT_TABLE_MAIN);
> - if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
> - err = 0;
> + if (tb) {
> + err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
> + if (err == -EAGAIN)
> + err = -ENETUNREACH;
> + }
>  

You resubmitted this so quickly, there is no way you did any testing
of this code path even after fixing up the curly brace issue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Julia Lawall


On Thu, 3 Sep 2015, Joe Perches wrote:

> On Thu, 2015-09-03 at 11:22 -0700, Linus Torvalds wrote:
> > On Thu, Sep 3, 2015 at 10:40 AM, David Miller  wrote:
> > >
> > > Linus, what GCC version are you using and what does the warning look
> > > like?
> > 
> > I'm on whatever is in F22. gcc -v says
> > 
> >gcc version 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC)
> > 
> > and the warning looks like so:
> > 
> >   net/mac80211/rate.c: In function ‘rate_control_cap_mask’:
> >   net/mac80211/rate.c:719:25: warning: ‘sizeof’ on array function
> > parameter ‘mcs_mask’ will return size of ‘u8 * {aka unsigned char *}’
> > [-Wsizeof-array-argument]
> >  for (i = 0; i < sizeof(mcs_mask); i++)
> >^
> > 
> > (note the lack of warning about the use of an array in the function
> > definition parameter list - I tried to find if there's any way to
> > enable such a warning, but couldn't find anything. Maybe my google-fu
> > is weak, but more probably that just doesn't exist).

I find 518 occurrences of a function parameter declaration that contains 
an explicit size.  But only the sizeof(mcs_mask) where there is a sizeof 
on such a parameter.  I also checked for ARRAY_SIZE on such parameters, 
and didn't find any occurrences of that either.

julia


> Coccinelle might be a better tool for this but
> a possible checkpatch patch is below:
> 
> It produces output like:
> 
> $ ./scripts/checkpatch.pl -f net/iucv/iucv.c --types=sized_array_argument
> WARNING: Avoid sized array arguments
> #716: FILE: net/iucv/iucv.c:716:
> +static int iucv_sever_pathid(u16 pathid, u8 userdata[16])
> +{
> 
> WARNING: Avoid sized array arguments
> #878: FILE: net/iucv/iucv.c:878:
> +int iucv_path_accept(struct iucv_path *path, struct iucv_handler *handler,
> +  u8 userdata[16], void *private)
> +{
> 
> WARNING: Avoid sized array arguments
> #925: FILE: net/iucv/iucv.c:925:
> +int iucv_path_connect(struct iucv_path *path, struct iucv_handler *handler,
> +   u8 userid[8], u8 system[8], u8 userdata[16],
> +   void *private)
> +{
> 
> WARNING: Avoid sized array arguments
> #988: FILE: net/iucv/iucv.c:988:
> +int iucv_path_quiesce(struct iucv_path *path, u8 userdata[16])
> +{
> 
> WARNING: Avoid sized array arguments
> #1020: FILE: net/iucv/iucv.c:1020:
> +int iucv_path_resume(struct iucv_path *path, u8 userdata[16])
> +{
> 
> WARNING: Avoid sized array arguments
> #1050: FILE: net/iucv/iucv.c:1050:
> +int iucv_path_sever(struct iucv_path *path, u8 userdata[16])
> +{
> 
> total: 0 errors, 6 warnings, 0 checks, 2119 lines checked
> ---
>  scripts/checkpatch.pl | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index e14dcdb..747b164 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -5422,6 +5422,24 @@ sub process {
>"externs should be avoided in .c files\n" .  
> $herecurr);
>   }
>  
> +# check for function arguments using arg[SIZE]
> + if ($^V && $^V ge 5.10.0 &&
> + defined $stat &&
> + $stat =~ 
> /^.\s*(?:$Declare|$DeclareMisordered)\s*$Ident\s*($balanced_parens)\s*\{/s) {
> + my $func_args = $1;
> + if ($func_args =~ 
> /(.*)\[\s*(?:$Constant|[A-Z0-9_]+)\s*\]/ && (!defined($1) || $1 !~ 
> /\[\s*\]\s*$/)) {
> + my $ctx = '';
> + my $herectx = $here . "\n";
> + my $cnt = statement_rawlines($stat);
> + for (my $n = 0; $n < $cnt; $n++) {
> + $herectx .= raw_line($linenr, $n) . 
> "\n";
> + $n = $cnt if ($herectx =~ /{/);
> + }
> + WARN("SIZED_ARRAY_ARGUMENT",
> +  "Avoid sized array arguments\n" . 
> $herectx);
> + }
> + }
> +
>  # checks for new __setup's
>   if ($rawline =~ /\b__setup\("([^"]*)"/) {
>   my $name = $1;
> 
> 
> 

Re: [GIT] Networking

2015-09-03 Thread Linus Torvalds
On Thu, Sep 3, 2015 at 11:22 AM, Linus Torvalds
 wrote:
> [-Wsizeof-array-argument]

Ahh. Google shows that it's an old clang warning that gcc has recently
picked up.

But even clang doesn't seem to have any way for a project to say
"please warn about arrays in function argument declaration". It *is*
very traditional idiomatic C, it's just that I personally think it's
one of those bad traditional C things exactly because it's so
misleading about what actually goes on. But I guess that in practice,
the only thing that it actually *affects* is "sizeof" (and assignment
to the variable name - something that would be invalid for a real
array, but works on argument arrays because they are really just
pointers).

The "array as function argument" syntax is occasionally useful
(particularly for the multi-dimensional array case), so I very much
understand why it exists, I just think that in the kernel we'd be
better off with the rule that it's against our coding practices.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel panic in pppoe_release

2015-09-03 Thread Murat Sezgin
Hi all,

I have already sent this email to linux-ppp group, but I have not received
any response yet. So, I want to send it to this group as well, because it is
most likely a netdev issue in pppoe kernel driver.

I see the bug in the below email discussion on the kernel that we are
currently using (3.4.103) with our openwrt distribution.

https://www.mail-archive.com/netdev@vger.kernel.org/msg70367.html

I did some debug on this and I see that the ref count of the po->pppoe_dev
doesn’t go to zero before releasing it with dev_put() and set its value to
NULL.

I also found the below patches from openwrt patch site for 3.18 and 4.0
kernels which can be applicable to our kernel. 

https://dev.openwrt.org/changeset/45653

But as described in the netdev mail-archive link above, it doesn’t solve
this issue completely and we still see the crash. I just wonder , if the
proposed patch by “Denys Fedoryshchenko”, which is below, fixes this issue
completely.

    pppox_unbind_sock(sk);
    +/* Signal the death of the socket. */
    +sk->sk_state = PPPOX_DEAD;


Do you have a conclusion on this bug? Is it safe to get this patch along
with the other workqueue patches?

Regards,
Murat


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 2/5] net: phy: extend fixed driver with fixed_phy_register()

2015-09-03 Thread Sergei Shtylyov

Hello.

On 05/16/2014 06:14 PM, Thomas Petazzoni wrote:


The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:

  * The address of the PHY on the fake bus needs to be passed, while a
dynamic allocation is desired.

  * Since the phy_device instantiation is post-poned until the next
mdiobus scan, there is no way to associate the fixed PHY with its
OF node, which later prevents of_phy_connect() from finding this
fixed PHY from a given OF node.

To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.

Signed-off-by: Thomas Petazzoni 
Acked-by: Florian Fainelli 
Acked-by: Grant Likely 
---
  drivers/net/phy/fixed.c   | 61 +++
  include/linux/phy_fixed.h | 11 +
  2 files changed, 72 insertions(+)

diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index e41546d..d60d875 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c

[...]

@@ -203,6 +204,66 @@ err_regs:

[...]

+int fixed_phy_register(unsigned int irq,
+  struct fixed_phy_status *status,
+  struct device_node *np)
+{
+   struct fixed_mdio_bus *fmb = _fmb;
+   struct phy_device *phy;
+   int phy_addr;
+   int ret;
+
+   /* Get the next available PHY address, up to PHY_MAX_ADDR */
+   spin_lock(_fixed_addr_lock);
+   if (phy_fixed_addr == PHY_MAX_ADDR) {
+   spin_unlock(_fixed_addr_lock);
+   return -ENOSPC;
+   }
+   phy_addr = phy_fixed_addr++;
+   spin_unlock(_fixed_addr_lock);
+
+   ret = fixed_phy_add(PHY_POLL, phy_addr, status);


   Was rummaging in the fixed_phy driver and a bug sprang right at me: 'phy' 
should have been passed here, not PHY_POLL. Luckily, all callers pass PHY_POLL 
anyway...


[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 2/5] net: phy: extend fixed driver with fixed_phy_register()

2015-09-03 Thread Sergei Shtylyov

On 09/03/2015 10:20 PM, Sergei Shtylyov wrote:


The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:

  * The address of the PHY on the fake bus needs to be passed, while a
dynamic allocation is desired.

  * Since the phy_device instantiation is post-poned until the next
mdiobus scan, there is no way to associate the fixed PHY with its
OF node, which later prevents of_phy_connect() from finding this
fixed PHY from a given OF node.

To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.

Signed-off-by: Thomas Petazzoni 
Acked-by: Florian Fainelli 
Acked-by: Grant Likely 
---
  drivers/net/phy/fixed.c   | 61
+++
  include/linux/phy_fixed.h | 11 +
  2 files changed, 72 insertions(+)

diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index e41546d..d60d875 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c

[...]

@@ -203,6 +204,66 @@ err_regs:

[...]

+int fixed_phy_register(unsigned int irq,
+   struct fixed_phy_status *status,
+   struct device_node *np)
+{
+struct fixed_mdio_bus *fmb = _fmb;
+struct phy_device *phy;
+int phy_addr;
+int ret;
+
+/* Get the next available PHY address, up to PHY_MAX_ADDR */
+spin_lock(_fixed_addr_lock);
+if (phy_fixed_addr == PHY_MAX_ADDR) {
+spin_unlock(_fixed_addr_lock);
+return -ENOSPC;
+}
+phy_addr = phy_fixed_addr++;
+spin_unlock(_fixed_addr_lock);
+
+ret = fixed_phy_add(PHY_POLL, phy_addr, status);


Was rummaging in the fixed_phy driver and a bug sprang right at me: 'phy'


   Sorry, s/phy/irq/ of course. Just noticed. :-/


should have been passed here, not PHY_POLL. Luckily, all callers pass PHY_POLL
anyway...



[...]


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 2/5] net: phy: extend fixed driver with fixed_phy_register()

2015-09-03 Thread Florian Fainelli
On 03/09/15 12:37, Sergei Shtylyov wrote:
> On 09/03/2015 10:20 PM, Sergei Shtylyov wrote:
> 
>>> The existing fixed_phy_add() function has several drawbacks that
>>> prevents it from being used as is for OF-based declaration of fixed
>>> PHYs:
>>>
>>>   * The address of the PHY on the fake bus needs to be passed, while a
>>> dynamic allocation is desired.
>>>
>>>   * Since the phy_device instantiation is post-poned until the next
>>> mdiobus scan, there is no way to associate the fixed PHY with its
>>> OF node, which later prevents of_phy_connect() from finding this
>>> fixed PHY from a given OF node.
>>>
>>> To solve this, this commit introduces fixed_phy_register(), which will
>>> allocate an available PHY address, add the PHY using fixed_phy_add()
>>> and instantiate the phy_device structure associated with the provided
>>> OF node.
>>>
>>> Signed-off-by: Thomas Petazzoni 
>>> Acked-by: Florian Fainelli 
>>> Acked-by: Grant Likely 
>>> ---
>>>   drivers/net/phy/fixed.c   | 61
>>> +++
>>>   include/linux/phy_fixed.h | 11 +
>>>   2 files changed, 72 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
>>> index e41546d..d60d875 100644
>>> --- a/drivers/net/phy/fixed.c
>>> +++ b/drivers/net/phy/fixed.c
>> [...]
>>> @@ -203,6 +204,66 @@ err_regs:
>> [...]
>>> +int fixed_phy_register(unsigned int irq,
>>> +   struct fixed_phy_status *status,
>>> +   struct device_node *np)
>>> +{
>>> +struct fixed_mdio_bus *fmb = _fmb;
>>> +struct phy_device *phy;
>>> +int phy_addr;
>>> +int ret;
>>> +
>>> +/* Get the next available PHY address, up to PHY_MAX_ADDR */
>>> +spin_lock(_fixed_addr_lock);
>>> +if (phy_fixed_addr == PHY_MAX_ADDR) {
>>> +spin_unlock(_fixed_addr_lock);
>>> +return -ENOSPC;
>>> +}
>>> +phy_addr = phy_fixed_addr++;
>>> +spin_unlock(_fixed_addr_lock);
>>> +
>>> +ret = fixed_phy_add(PHY_POLL, phy_addr, status);
>>
>> Was rummaging in the fixed_phy driver and a bug sprang right at
>> me: 'phy'
> 
>Sorry, s/phy/irq/ of course. Just noticed. :-/

Ok, that makes sense then, and yes, this "irq" argument should have been
passed down to fixed_phy_add(). Might be worth adding a WARN_ON(irq !=
PHY_POLL) just to catch callers that expect something else.

Thanks!
--
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Linus Torvalds
On Thu, Sep 3, 2015 at 10:40 AM, David Miller  wrote:
>
> Linus, what GCC version are you using and what does the warning look
> like?

I'm on whatever is in F22. gcc -v says

   gcc version 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC)

and the warning looks like so:

  net/mac80211/rate.c: In function ‘rate_control_cap_mask’:
  net/mac80211/rate.c:719:25: warning: ‘sizeof’ on array function
parameter ‘mcs_mask’ will return size of ‘u8 * {aka unsigned char *}’
[-Wsizeof-array-argument]
 for (i = 0; i < sizeof(mcs_mask); i++)
   ^

(note the lack of warning about the use of an array in the function
definition parameter list - I tried to find if there's any way to
enable such a warning, but couldn't find anything. Maybe my google-fu
is weak, but more probably that just doesn't exist).

  Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 2/5] net: phy: extend fixed driver with fixed_phy_register()

2015-09-03 Thread Sergei Shtylyov

Hello.

On 09/03/2015 10:23 PM, Florian Fainelli wrote:


The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:

   * The address of the PHY on the fake bus needs to be passed, while a
 dynamic allocation is desired.

   * Since the phy_device instantiation is post-poned until the next
 mdiobus scan, there is no way to associate the fixed PHY with its
 OF node, which later prevents of_phy_connect() from finding this
 fixed PHY from a given OF node.

To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.

Signed-off-by: Thomas Petazzoni 
Acked-by: Florian Fainelli 
Acked-by: Grant Likely 
---
   drivers/net/phy/fixed.c   | 61
+++
   include/linux/phy_fixed.h | 11 +
   2 files changed, 72 insertions(+)

diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index e41546d..d60d875 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c

[...]

@@ -203,6 +204,66 @@ err_regs:

[...]

+int fixed_phy_register(unsigned int irq,
+   struct fixed_phy_status *status,
+   struct device_node *np)
+{
+struct fixed_mdio_bus *fmb = _fmb;
+struct phy_device *phy;
+int phy_addr;
+int ret;
+
+/* Get the next available PHY address, up to PHY_MAX_ADDR */
+spin_lock(_fixed_addr_lock);
+if (phy_fixed_addr == PHY_MAX_ADDR) {
+spin_unlock(_fixed_addr_lock);
+return -ENOSPC;
+}
+phy_addr = phy_fixed_addr++;
+spin_unlock(_fixed_addr_lock);
+
+ret = fixed_phy_add(PHY_POLL, phy_addr, status);


Was rummaging in the fixed_phy driver and a bug sprang right at me:
'phy' should have been passed here, not PHY_POLL. Luckily, all callers
pass PHY_POLL anyway...



Are we looking at the same header file for the prototype of fixed_phy_add()?


   Probably not. I was looking at Linus' tree, yours is probably net-next. :-)


extern int fixed_phy_add(unsigned int irq, int phy_id,
  struct fixed_phy_status *status,
  int link_gpio);



First argument is correct here..


   No, fixed_phy_register() gets 'irq' passed to it and it should in its turn 
call fixed_phy_add() with this argument, not PHY_POLL; otherwise the 'irq' 
parameter gets completely ignored...



at any rate, if something needs fixing, just go ahead and submit a patch.


   OK.


--
Florian


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Joe Perches
On Thu, 2015-09-03 at 11:22 -0700, Linus Torvalds wrote:
> On Thu, Sep 3, 2015 at 10:40 AM, David Miller  wrote:
> >
> > Linus, what GCC version are you using and what does the warning look
> > like?
> 
> I'm on whatever is in F22. gcc -v says
> 
>gcc version 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC)
> 
> and the warning looks like so:
> 
>   net/mac80211/rate.c: In function ‘rate_control_cap_mask’:
>   net/mac80211/rate.c:719:25: warning: ‘sizeof’ on array function
> parameter ‘mcs_mask’ will return size of ‘u8 * {aka unsigned char *}’
> [-Wsizeof-array-argument]
>  for (i = 0; i < sizeof(mcs_mask); i++)
>^
> 
> (note the lack of warning about the use of an array in the function
> definition parameter list - I tried to find if there's any way to
> enable such a warning, but couldn't find anything. Maybe my google-fu
> is weak, but more probably that just doesn't exist).

Coccinelle might be a better tool for this but
a possible checkpatch patch is below:

It produces output like:

$ ./scripts/checkpatch.pl -f net/iucv/iucv.c --types=sized_array_argument
WARNING: Avoid sized array arguments
#716: FILE: net/iucv/iucv.c:716:
+static int iucv_sever_pathid(u16 pathid, u8 userdata[16])
+{

WARNING: Avoid sized array arguments
#878: FILE: net/iucv/iucv.c:878:
+int iucv_path_accept(struct iucv_path *path, struct iucv_handler *handler,
+u8 userdata[16], void *private)
+{

WARNING: Avoid sized array arguments
#925: FILE: net/iucv/iucv.c:925:
+int iucv_path_connect(struct iucv_path *path, struct iucv_handler *handler,
+ u8 userid[8], u8 system[8], u8 userdata[16],
+ void *private)
+{

WARNING: Avoid sized array arguments
#988: FILE: net/iucv/iucv.c:988:
+int iucv_path_quiesce(struct iucv_path *path, u8 userdata[16])
+{

WARNING: Avoid sized array arguments
#1020: FILE: net/iucv/iucv.c:1020:
+int iucv_path_resume(struct iucv_path *path, u8 userdata[16])
+{

WARNING: Avoid sized array arguments
#1050: FILE: net/iucv/iucv.c:1050:
+int iucv_path_sever(struct iucv_path *path, u8 userdata[16])
+{

total: 0 errors, 6 warnings, 0 checks, 2119 lines checked
---
 scripts/checkpatch.pl | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index e14dcdb..747b164 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5422,6 +5422,24 @@ sub process {
 "externs should be avoided in .c files\n" .  
$herecurr);
}
 
+# check for function arguments using arg[SIZE]
+   if ($^V && $^V ge 5.10.0 &&
+   defined $stat &&
+   $stat =~ 
/^.\s*(?:$Declare|$DeclareMisordered)\s*$Ident\s*($balanced_parens)\s*\{/s) {
+   my $func_args = $1;
+   if ($func_args =~ 
/(.*)\[\s*(?:$Constant|[A-Z0-9_]+)\s*\]/ && (!defined($1) || $1 !~ 
/\[\s*\]\s*$/)) {
+   my $ctx = '';
+   my $herectx = $here . "\n";
+   my $cnt = statement_rawlines($stat);
+   for (my $n = 0; $n < $cnt; $n++) {
+   $herectx .= raw_line($linenr, $n) . 
"\n";
+   $n = $cnt if ($herectx =~ /{/);
+   }
+   WARN("SIZED_ARRAY_ARGUMENT",
+"Avoid sized array arguments\n" . 
$herectx);
+   }
+   }
+
 # checks for new __setup's
if ($rawline =~ /\b__setup\("([^"]*)"/) {
my $name = $1;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread David Miller
From: Linus Torvalds 
Date: Thu, 3 Sep 2015 11:22:10 -0700

> (note the lack of warning about the use of an array in the function
> definition parameter list - I tried to find if there's any way to
> enable such a warning, but couldn't find anything. Maybe my google-fu
> is weak, but more probably that just doesn't exist).

I would love to see such a warning if it doesn't exist.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 2/5] net: phy: extend fixed driver with fixed_phy_register()

2015-09-03 Thread Florian Fainelli
On 03/09/15 12:20, Sergei Shtylyov wrote:
> Hello.
> 
> On 05/16/2014 06:14 PM, Thomas Petazzoni wrote:
> 
>> The existing fixed_phy_add() function has several drawbacks that
>> prevents it from being used as is for OF-based declaration of fixed
>> PHYs:
>>
>>   * The address of the PHY on the fake bus needs to be passed, while a
>> dynamic allocation is desired.
>>
>>   * Since the phy_device instantiation is post-poned until the next
>> mdiobus scan, there is no way to associate the fixed PHY with its
>> OF node, which later prevents of_phy_connect() from finding this
>> fixed PHY from a given OF node.
>>
>> To solve this, this commit introduces fixed_phy_register(), which will
>> allocate an available PHY address, add the PHY using fixed_phy_add()
>> and instantiate the phy_device structure associated with the provided
>> OF node.
>>
>> Signed-off-by: Thomas Petazzoni 
>> Acked-by: Florian Fainelli 
>> Acked-by: Grant Likely 
>> ---
>>   drivers/net/phy/fixed.c   | 61
>> +++
>>   include/linux/phy_fixed.h | 11 +
>>   2 files changed, 72 insertions(+)
>>
>> diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
>> index e41546d..d60d875 100644
>> --- a/drivers/net/phy/fixed.c
>> +++ b/drivers/net/phy/fixed.c
> [...]
>> @@ -203,6 +204,66 @@ err_regs:
> [...]
>> +int fixed_phy_register(unsigned int irq,
>> +   struct fixed_phy_status *status,
>> +   struct device_node *np)
>> +{
>> +struct fixed_mdio_bus *fmb = _fmb;
>> +struct phy_device *phy;
>> +int phy_addr;
>> +int ret;
>> +
>> +/* Get the next available PHY address, up to PHY_MAX_ADDR */
>> +spin_lock(_fixed_addr_lock);
>> +if (phy_fixed_addr == PHY_MAX_ADDR) {
>> +spin_unlock(_fixed_addr_lock);
>> +return -ENOSPC;
>> +}
>> +phy_addr = phy_fixed_addr++;
>> +spin_unlock(_fixed_addr_lock);
>> +
>> +ret = fixed_phy_add(PHY_POLL, phy_addr, status);
> 
>Was rummaging in the fixed_phy driver and a bug sprang right at me:
> 'phy' should have been passed here, not PHY_POLL. Luckily, all callers
> pass PHY_POLL anyway...

Are we looking at the same header file for the prototype of fixed_phy_add()?

extern int fixed_phy_add(unsigned int irq, int phy_id,
 struct fixed_phy_status *status,
 int link_gpio);

First argument is correct here.. at any rate, if something needs fixing,
just go ahead and submit a patch.
--
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Linus Torvalds
On Thu, Sep 3, 2015 at 12:32 PM, Julia Lawall  wrote:
>
> I find 518 occurrences of a function parameter declaration that contains
> an explicit size.  But only the sizeof(mcs_mask) where there is a sizeof
> on such a parameter.  I also checked for ARRAY_SIZE on such parameters,
> and didn't find any occurrences of that either.

Are there any cases of multi-dimensional arrays? Because those
actually have semantic meaning outside of sizeof(), just in things
like adding offsets.

Eg something like

 int fn(int a[][10])

ends up being equivalent to something like

 int fn(int (*a)[10])

and "a+1" is actually 40 bytes ahead of "a", so it does *not* act like
an "int *".

(And I might have screwed that up mightily - C multidimensional arrays
and the conversions to pointers are really easy to get confused about.
Which is why I hope we don't have them)

  Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bug 103991] New: skb legth doesn't match the sum of all the fragments length

2015-09-03 Thread Stephen Hemminger

There is really not enough information here to make a real evaluation.
kernel is old, hardware is unspecified, and the real problem is not well
identified.


Begin forwarded message:

Date: Thu, 3 Sep 2015 10:51:44 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 103991] New: skb legth doesn't match the sum of all the fragments 
length


https://bugzilla.kernel.org/show_bug.cgi?id=103991

Bug ID: 103991
   Summary: skb legth doesn't match the sum of all the fragments
length
   Product: Networking
   Version: 2.5
Kernel Version: 3.13.0-62-generic
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: IPV4
  Assignee: shemmin...@linux-foundation.org
  Reporter: tht1...@mailinator.com
Regression: No

The DUT has 2 interfaces with the same 10GB Ethernet card .
The driver and H/W support TSO at 10G speed.
IP forwarding on the DUT is enabled.
Run iperf from a system connected to one interface to a system connedted to the
second interface:

  system A ---> DUT ---> system B

Watch the outgoing skbs on the DUT to system B:

  On packets with nr_frags == MAX_SKB_FRAGS (17) skb_len is *NOT* equal the
some of the skb->len - skb->data_len + all of its frag->size.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] net: tipc: fix stall during bclink wakeup procedure

2015-09-03 Thread Jon Maloy


> -Original Message-
> From: Kolmakov Dmitriy [mailto:kolmakov.dmit...@huawei.com]
> Sent: Thursday, 03 September, 2015 10:39
> To: da...@davemloft.net
> Cc: Jon Maloy; Ying Xue; tipc-discuss...@lists.sourceforge.net;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] net: tipc: fix stall during bclink wakeup procedure
> 
> From: Dmitry S Kolmakov 
> 
> If an attempt to wake up users of broadcast link is made when there is no
> enough place in send queue than it may hang up inside the
> tipc_sk_rcv() function since the loop breaks only after the wake up queue
> becomes empty. This can lead to complete CPU stall with the following
> message generated by RCU:
> 
> INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies g=54225
> c=54224 q=11465) Task dump for CPU 0:
> tpchR  running task0 39949  39948 0x000a
>  818536c0 88181fa037a0 8106a4be 
>  818536c0 88181fa037c0 8106d8a8 88181fa03800
>  0001 88181fa037f0 81094a50 88181fa15680 Call
> Trace:
>[] sched_show_task+0xae/0x120
> [] dump_cpu_task+0x38/0x40  []
> rcu_dump_cpu_stacks+0x90/0xd0  []
> rcu_check_callbacks+0x3eb/0x6e0  [] ?
> account_system_time+0x7f/0x170  []
> update_process_times+0x34/0x60  []
> tick_sched_handle.isra.18+0x31/0x40
>  [] tick_sched_timer+0x3c/0x70  []
> __run_hrtimer.isra.34+0x3d/0xc0  []
> hrtimer_interrupt+0xc5/0x1e0  [] ?
> native_smp_send_reschedule+0x42/0x60
>  [] local_apic_timer_interrupt+0x34/0x60
>  [] smp_apic_timer_interrupt+0x3c/0x60
>  [] apic_timer_interrupt+0x6b/0x70  [] ?
> _raw_spin_unlock_irqrestore+0x9/0x10
>  [] __wake_up_sync_key+0x4f/0x60  []
> tipc_write_space+0x31/0x40 [tipc]  []
> filter_rcv+0x31f/0x520 [tipc]  [] ?
> tipc_sk_lookup+0xc9/0x110 [tipc]  [] ?
> _raw_spin_lock_bh+0x19/0x30  []
> tipc_sk_rcv+0x2dc/0x3e0 [tipc]  []
> tipc_bclink_wakeup_users+0x2f/0x40 [tipc]  []
> tipc_node_unlock+0x186/0x190 [tipc]  [] ?
> kfree_skb+0x2c/0x40  [] tipc_rcv+0x2ac/0x8c0 [tipc]
> [] tipc_l2_rcv_msg+0x38/0x50 [tipc]  []
> __netif_receive_skb_core+0x5a3/0x950
>  [] __netif_receive_skb+0x13/0x60  []
> netif_receive_skb_internal+0x1e/0x90
>  [] napi_gro_receive+0x78/0xa0  []
> tg3_poll_work+0xc54/0xf40 [tg3]  [] ?
> consume_skb+0x2c/0x40  [] tg3_poll_msix+0x41/0x160
> [tg3]  [] net_rx_action+0xe2/0x290  []
> __do_softirq+0xda/0x1f0  [] irq_exit+0x76/0xa0
> [] do_IRQ+0x55/0xf0  []
> common_interrupt+0x6b/0x6b  
> 
> The issue occurs only when tipc_sk_rcv() is used to wake up postponed
> senders:
> 
>   tipc_bclink_wakeup_users()
>   // wakeupq - is a queue which consists of special
>   //   messages with SOCK_WAKEUP type.
>   tipc_sk_rcv(wakeupq)
>   ...
>   while (skb_queue_len(inputq)) {
>   filter_rcv(skb)
>   // Here the type of message is
> checked
>   // and if it is SOCK_WAKEUP than
>   // it tries to wake up a sender.
>   tipc_write_space(sk)
> 
>   wake_up_interruptible_sync_poll()
>   }
> 
> After the sender thread is woke up it can gather control and perform an
> attempt to send a message. But if there is no enough place in send queue it
> will call link_schedule_user() function which puts a message of type
> SOCK_WAKEUP to the wakeup queue and put the sender to sleep. Thus the
> size of the queue actually is not changed and the while() loop never exits.
> 
> The approach I proposed is to wake up only senders for which there is
> enough place in send queue so the described issue can't occur. Moreover
> the same approach is already used to wake up senders on unicast links.

I looked closer at the code, and I don't see how you can enter into this loop.
SOCK_WAKEP is only issued if buffers actually have been released from the
transmit queue, so sooner or later there should be space in the queue for
any sender. I am starting to suspect that the root of this problem is elsewhere.

Maybe we should continue this thread at tipc-dicussion, so we don't pollute
the netdev list with our internal discussions?

///jon

> 
> I have got into the issue on our product code but to reproduce the issue I
> changed a benchmark test application (from tipcutils/demos/benchmark) to
> perform the following scenario:
>   1. Run 64 instances of test application (nodes). It can be done on the
> one physical machine.
>   2. Each application connects to all other using TIPC sockets in RDM
> mode.
>   3. When setup is done all nodes start simultaneously send broadcast
> messages.
>   4. Everything hangs up.
> 
> The issue is reproducible only when a congestion on broadcast link occurs.
> For example, when there are only 8 nodes it works fine since congestion

Re: [GIT] Networking

2015-09-03 Thread David Miller
From: Linus Torvalds 
Date: Thu, 3 Sep 2015 09:45:44 -0700

> But happily gcc has a really really valid warning (kudos - I often end
> up ragging on the bad warnings gcc has, but this one is a keeper),
> because a few lines down the mistake then turns into pure and utter
> garbage.

I really wish my GCC had emitted a warning for this, I'm on 4.9.2 here:

[davem@localhost linux]$ make net/mac80211/rate.o 
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CC [M]  net/mac80211/rate.o
[davem@localhost linux]$

Linus, what GCC version are you using and what does the warning look
like?

Anyways, Johannes please get this fixed, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 2/5] net: phy: extend fixed driver with fixed_phy_register()

2015-09-03 Thread Sergei Shtylyov

On 09/03/2015 10:38 PM, Florian Fainelli wrote:


The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:

   * The address of the PHY on the fake bus needs to be passed, while a
 dynamic allocation is desired.

   * Since the phy_device instantiation is post-poned until the next
 mdiobus scan, there is no way to associate the fixed PHY with its
 OF node, which later prevents of_phy_connect() from finding this
 fixed PHY from a given OF node.

To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.

Signed-off-by: Thomas Petazzoni 
Acked-by: Florian Fainelli 
Acked-by: Grant Likely 
---
   drivers/net/phy/fixed.c   | 61
+++
   include/linux/phy_fixed.h | 11 +
   2 files changed, 72 insertions(+)

diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index e41546d..d60d875 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c

[...]

@@ -203,6 +204,66 @@ err_regs:

[...]

+int fixed_phy_register(unsigned int irq,
+   struct fixed_phy_status *status,
+   struct device_node *np)
+{
+struct fixed_mdio_bus *fmb = _fmb;
+struct phy_device *phy;
+int phy_addr;
+int ret;
+
+/* Get the next available PHY address, up to PHY_MAX_ADDR */
+spin_lock(_fixed_addr_lock);
+if (phy_fixed_addr == PHY_MAX_ADDR) {
+spin_unlock(_fixed_addr_lock);
+return -ENOSPC;
+}
+phy_addr = phy_fixed_addr++;
+spin_unlock(_fixed_addr_lock);
+
+ret = fixed_phy_add(PHY_POLL, phy_addr, status);


 Was rummaging in the fixed_phy driver and a bug sprang right at
me: 'phy'


Sorry, s/phy/irq/ of course. Just noticed. :-/


   I've reported the bug on #miplsinux, there I used the correct word. :-)


Ok, that makes sense then, and yes, this "irq" argument should have been
passed down to fixed_phy_add(). Might be worth adding a WARN_ON(irq !=
PHY_POLL) just to catch callers that expect something else.


   In-tree callers all seem to pass PHY_POLL to fixed_phy_register(). Do we 
care about out of tree stuff?



Thanks!
--
Florian


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fixed_phy: pass 'irq' to fixed_phy_add()

2015-09-03 Thread Sergei Shtylyov
I've noticed  that fixed_phy_register() ignores its 'irq' parameter instead of
passing it to fixed_phy_add(). Luckily, fixed_phy_register()  seems to  always
be  called with PHY_POLL  for 'irq'... :-)

Fixes: a75951217472 ("net: phy: extend fixed driver with fixed_phy_register()")
Signed-off-by: Sergei Shtylyov 

---
The patch is against DaveM's 'net.git' repo.

 drivers/net/phy/fixed_phy.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: net/drivers/net/phy/fixed_phy.c
===
--- net.orig/drivers/net/phy/fixed_phy.c
+++ net/drivers/net/phy/fixed_phy.c
@@ -325,7 +325,7 @@ struct phy_device *fixed_phy_register(un
phy_addr = phy_fixed_addr++;
spin_unlock(_fixed_addr_lock);
 
-   ret = fixed_phy_add(PHY_POLL, phy_addr, status, link_gpio);
+   ret = fixed_phy_add(irq, phy_addr, status, link_gpio);
if (ret < 0)
return ERR_PTR(ret);
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] RDS: rds_conn_lookup() should factor in the struct net for a match

2015-09-03 Thread Sowmini Varadhan

Only return a conn if the rds_conn_net(conn) matches the struct
net passed to rds_conn_lookup().

Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints,
   one per netns.")

Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index a50e652..9b2de5e 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -70,7 +70,8 @@ static struct hlist_head *rds_conn_bucket(__be32 laddr, 
__be32 faddr)
 } while (0)
 
 /* rcu read lock must be held or the connection spinlock */
-static struct rds_connection *rds_conn_lookup(struct hlist_head *head,
+static struct rds_connection *rds_conn_lookup(struct net *net,
+ struct hlist_head *head,
  __be32 laddr, __be32 faddr,
  struct rds_transport *trans)
 {
@@ -78,7 +79,7 @@ static struct rds_connection *rds_conn_lookup(struct 
hlist_head *head,
 
hlist_for_each_entry_rcu(conn, head, c_hash_node) {
if (conn->c_faddr == faddr && conn->c_laddr == laddr &&
-   conn->c_trans == trans) {
+   conn->c_trans == trans && net == rds_conn_net(conn)) {
ret = conn;
break;
}
@@ -132,7 +133,7 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
if (!is_outgoing && otrans->t_type == RDS_TRANS_TCP)
goto new_conn;
rcu_read_lock();
-   conn = rds_conn_lookup(head, laddr, faddr, trans);
+   conn = rds_conn_lookup(net, head, laddr, faddr, trans);
if (conn && conn->c_loopback && conn->c_trans != _loop_transport &&
laddr == faddr && !is_outgoing) {
/* This is a looped back IB connection, and we're
@@ -239,7 +240,7 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
if (!is_outgoing && otrans->t_type == RDS_TRANS_TCP)
found = NULL;
else
-   found = rds_conn_lookup(head, laddr, faddr, trans);
+   found = rds_conn_lookup(net, head, laddr, faddr, trans);
if (found) {
trans->conn_free(conn->c_transport_data);
kmem_cache_free(rds_conn_slab, conn);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Julia Lawall


On Thu, 3 Sep 2015, Linus Torvalds wrote:

> On Thu, Sep 3, 2015 at 12:32 PM, Julia Lawall  wrote:
> >
> > I find 518 occurrences of a function parameter declaration that contains
> > an explicit size.  But only the sizeof(mcs_mask) where there is a sizeof
> > on such a parameter.  I also checked for ARRAY_SIZE on such parameters,
> > and didn't find any occurrences of that either.
> 
> Are there any cases of multi-dimensional arrays? Because those
> actually have semantic meaning outside of sizeof(), just in things
> like adding offsets.
> 
> Eg something like
> 
>  int fn(int a[][10])
> 
> ends up being equivalent to something like
> 
>  int fn(int (*a)[10])
> 
> and "a+1" is actually 40 bytes ahead of "a", so it does *not* act like
> an "int *".
> 
> (And I might have screwed that up mightily - C multidimensional arrays
> and the conversions to pointers are really easy to get confused about.
> Which is why I hope we don't have them)

There are 32 2-dimensional arrays in function parameters, and 1 
3-dimensional array.  No 4-dimensional arrays.  I didn't check past that.  
None of these has a sizeof or ARRAY_SIZE.

The three dimensional array is here: drivers/media/dvb-frontends/stv0367.c

static int stv0367ter_filt_coeff_init(struct stv0367_state *state,
u16 CellsCoeffs[3][6][5], u32 DemodXtal)

It is used as follows:

   stv0367_writereg(state,
(R367TER_IIRCX_COEFF1_MSB + 2 * (j - 1)),
MSB(CellsCoeffs[k][i-1][j-1]));
stv0367_writereg(state,
(R367TER_IIRCX_COEFF1_LSB + 2 * (j - 1)),
LSB(CellsCoeffs[k][i-1][j-1]));

The value of this parameter is one of three locally defined static global 
arrays.

julia
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Linus Torvalds
On Thu, Sep 3, 2015 at 1:55 PM, Julia Lawall  wrote:
>
> There are 32 2-dimensional arrays in function parameters, and 1
> 3-dimensional array.  No 4-dimensional arrays.  I didn't check past that.
> None of these has a sizeof or ARRAY_SIZE.
>
> The three dimensional array is here: drivers/media/dvb-frontends/stv0367.c

Ok. That actually looks like a valid use of the C function argument
array passing semantics. It's rather much simpler than exposing the
pointers.

So I guess we don't really end up wanting to disallow this, and the
new gcc array sizeof warning is good enough.

Thanks for running the analysis so that I didn't have to look at it ;)

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: normalize return value of pm_runtime_get_sync() in MDIO write

2015-09-03 Thread Andrew Lunn
On Thu, Sep 03, 2015 at 09:38:30PM +0200, Maciej S. Szmigiero wrote:
> If fec MDIO write method succeeds its return value comes from
> call to pm_runtime_get_sync().
> But pm_runtime_get_sync() can also return 1.
> 
> In case of Micrel KSZ9031 PHY this value will then
> be returned along the call chain of phy_write() ->
> ksz9031_extended_write() -> ksz9031_center_flp_timing() ->
> ksz9031_config_init() -> phy_init_hw() -> phy_attach_direct() ->
> phy_connect_direct().
> 
> Then phy_connect() will cast it into a pointer using ERR_PTR(),
> which then fec_enet_mii_probe() will try to dereference
> resulting in an oops.
> 
> Fix it by normalizing return value of pm_runtime_get_sync()
> to be zero if positive in MDIO write method.
> 
> Signed-off-by: Maciej Szmigiero 

Fixes: 8fff755e9f8d ("net: fec: Ensure clocks are enabled while using mdio bus")

Acked-by: Andrew Lunn 

Thanks
Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Julia Lawall


On Thu, 3 Sep 2015, Linus Torvalds wrote:

> On Thu, Sep 3, 2015 at 1:55 PM, Julia Lawall  wrote:
> >
> > There are 32 2-dimensional arrays in function parameters, and 1
> > 3-dimensional array.  No 4-dimensional arrays.  I didn't check past that.
> > None of these has a sizeof or ARRAY_SIZE.
> >
> > The three dimensional array is here: drivers/media/dvb-frontends/stv0367.c
> 
> Ok. That actually looks like a valid use of the C function argument
> array passing semantics. It's rather much simpler than exposing the
> pointers.
> 
> So I guess we don't really end up wanting to disallow this, and the
> new gcc array sizeof warning is good enough.
> 
> Thanks for running the analysis so that I didn't have to look at it ;)

The double arrays also look OK - the uses are also explicit double array 
references.

julia
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-next closure?

2015-09-03 Thread David Miller
From: Jeff Kirsher 
Date: Wed, 02 Sep 2015 22:50:35 -0700

> I was just about to send out my last series of patches and noticed you
> sent Linus your pull request.  So I am guessing that your net-next tree
> is now closed, correct?  Just want to make sure before sending anything
> out and did not want to dump patches on you right before the closure of
> your net-next.

Yeah, I already applied too much crap after the merge window openned
up so net-next is definitely closed now.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] net: irda: pxaficp_ir: convert to readl and writel

2015-09-03 Thread Robert Jarzmik
Convert the pxa IRDA driver to readl and writel primitives, and remove
another set of direct registers access. This leaves only the DMA
registers access, which will be dealt with dmaengine conversion.

Signed-off-by: Robert Jarzmik 
---
 drivers/net/irda/pxaficp_ir.c | 210 +-
 1 file changed, 126 insertions(+), 84 deletions(-)

diff --git a/drivers/net/irda/pxaficp_ir.c b/drivers/net/irda/pxaficp_ir.c
index b1794998c68e..519f6b0568a8 100644
--- a/drivers/net/irda/pxaficp_ir.c
+++ b/drivers/net/irda/pxaficp_ir.c
@@ -29,15 +29,16 @@
 
 #include 
 #include 
+#undef __REG
+#define __REG(x) (x)
 #include 
 
-#define FICP   __REG(0x4080)  /* Start of FICP area */
-#define ICCR0  __REG(0x4080)  /* ICP Control Register 0 */
-#define ICCR1  __REG(0x4084)  /* ICP Control Register 1 */
-#define ICCR2  __REG(0x4088)  /* ICP Control Register 2 */
-#define ICDR   __REG(0x408c)  /* ICP Data Register */
-#define ICSR0  __REG(0x40800014)  /* ICP Status Register 0 */
-#define ICSR1  __REG(0x40800018)  /* ICP Status Register 1 */
+#define ICCR0  0x  /* ICP Control Register 0 */
+#define ICCR1  0x0004  /* ICP Control Register 1 */
+#define ICCR2  0x0008  /* ICP Control Register 2 */
+#define ICDR   0x000c  /* ICP Data Register */
+#define ICSR0  0x0014  /* ICP Status Register 0 */
+#define ICSR1  0x0018  /* ICP Status Register 1 */
 
 #define ICCR0_AME  (1 << 7)/* Address match enable */
 #define ICCR0_TIE  (1 << 6)/* Transmit FIFO interrupt enable */
@@ -55,9 +56,7 @@
 #define ICCR2_TRIG_16   (1 << 0)   /*  >= 16 bytes */
 #define ICCR2_TRIG_32   (2 << 0)   /*  >= 32 bytes */
 
-#ifdef CONFIG_PXA27x
 #define ICSR0_EOC  (1 << 6)/* DMA End of Descriptor Chain */
-#endif
 #define ICSR0_FRE  (1 << 5)/* Framing error */
 #define ICSR0_RFS  (1 << 4)/* Receive FIFO service request */
 #define ICSR0_TFS  (1 << 3)/* Transnit FIFO service request */
@@ -98,11 +97,50 @@
 IrSR_RCVEIR_UART_MODE | \
 IrSR_XMITIR_IR_MODE)
 
+/* macros for registers read/write */
+#define ficp_writel(irda, val, off)\
+   do {\
+   dev_vdbg(irda->dev, \
+"%s():%d ficp_writel(0x%x, %s)\n", \
+__func__, __LINE__, (val), #off);  \
+   writel_relaxed((val), (irda)->irda_base + (off));   \
+   } while (0)
+
+#define ficp_readl(irda, off)  \
+   ({  \
+   unsigned int _v;\
+   _v = readl_relaxed((irda)->irda_base + (off));  \
+   dev_vdbg(irda->dev, \
+"%s():%d ficp_readl(%s): 0x%x\n",  \
+__func__, __LINE__, #off, _v); \
+   _v; \
+   })
+
+#define stuart_writel(irda, val, off)  \
+   do {\
+   dev_vdbg(irda->dev, \
+"%s():%d stuart_writel(0x%x, %s)\n",   \
+__func__, __LINE__, (val), #off);  \
+   writel_relaxed((val), (irda)->stuart_base + (off)); \
+   } while (0)
+
+#define stuart_readl(irda, off)
\
+   ({  \
+   unsigned int _v;\
+   _v = readl_relaxed((irda)->stuart_base + (off));\
+   dev_vdbg(irda->dev, \
+"%s():%d stuart_readl(%s): 0x%x\n",\
+__func__, __LINE__, #off, _v); \
+   _v; \
+   })
+
 struct pxa_irda {
int speed;
int newspeed;
unsigned long long  last_clk;
 
+   void __iomem*stuart_base;
+   void __iomem*irda_base;
unsigned char   *dma_rx_buff;
unsigned char   *dma_tx_buff;
dma_addr_t  dma_rx_buff_phy;
@@ -153,7 +191,7 @@ static inline void pxa_irda_enable_sirclk(struct pxa_irda 
*si)
 inline static void pxa_irda_fir_dma_rx_start(struct pxa_irda 

[PATCH 3/3] net: irda: pxaficp_ir: dmaengine conversion

2015-09-03 Thread Robert Jarzmik
Convert pxaficp_ir to dmaengine. As pxa architecture is shifting from
raw DMA registers access to pxa_dma dmaengine driver, convert this
driver to dmaengine.

Signed-off-by: Robert Jarzmik 
---
 drivers/net/irda/pxaficp_ir.c | 145 +-
 1 file changed, 102 insertions(+), 43 deletions(-)

diff --git a/drivers/net/irda/pxaficp_ir.c b/drivers/net/irda/pxaficp_ir.c
index 519f6b0568a8..42318fb2c95a 100644
--- a/drivers/net/irda/pxaficp_ir.c
+++ b/drivers/net/irda/pxaficp_ir.c
@@ -19,6 +19,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -146,8 +149,12 @@ struct pxa_irda {
dma_addr_t  dma_rx_buff_phy;
dma_addr_t  dma_tx_buff_phy;
unsigned intdma_tx_buff_len;
-   int txdma;
-   int rxdma;
+   struct dma_chan *txdma;
+   struct dma_chan *rxdma;
+   dma_cookie_trx_cookie;
+   dma_cookie_ttx_cookie;
+   int drcmr_rx;
+   int drcmr_tx;
 
int uart_irq;
int icp_irq;
@@ -165,6 +172,8 @@ struct pxa_irda {
struct clk  *cur_clk;
 };
 
+static int pxa_irda_set_speed(struct pxa_irda *si, int speed);
+
 static inline void pxa_irda_disable_clk(struct pxa_irda *si)
 {
if (si->cur_clk)
@@ -188,22 +197,41 @@ static inline void pxa_irda_enable_sirclk(struct pxa_irda 
*si)
 #define IS_FIR(si) ((si)->speed >= 400)
 #define IRDA_FRAME_SIZE_LIMIT  2047
 
+static void pxa_irda_fir_dma_rx_irq(void *data);
+static void pxa_irda_fir_dma_tx_irq(void *data);
+
 inline static void pxa_irda_fir_dma_rx_start(struct pxa_irda *si)
 {
-   DCSR(si->rxdma)  = DCSR_NODESC;
-   DSADR(si->rxdma) = (unsigned long)si->irda_base + ICDR;
-   DTADR(si->rxdma) = si->dma_rx_buff_phy;
-   DCMD(si->rxdma) = DCMD_INCTRGADDR | DCMD_FLOWSRC |  DCMD_WIDTH1 | 
DCMD_BURST32 | IRDA_FRAME_SIZE_LIMIT;
-   DCSR(si->rxdma) |= DCSR_RUN;
+   struct dma_async_tx_descriptor *tx;
+
+   tx = dmaengine_prep_slave_single(si->rxdma, si->dma_rx_buff_phy,
+IRDA_FRAME_SIZE_LIMIT, DMA_FROM_DEVICE,
+DMA_PREP_INTERRUPT);
+   if (!tx) {
+   dev_err(si->dev, "prep_slave_sg() failed\n");
+   return;
+   }
+   tx->callback = pxa_irda_fir_dma_rx_irq;
+   tx->callback_param = si;
+   si->rx_cookie = dmaengine_submit(tx);
+   dma_async_issue_pending(si->rxdma);
 }
 
 inline static void pxa_irda_fir_dma_tx_start(struct pxa_irda *si)
 {
-   DCSR(si->txdma)  = DCSR_NODESC;
-   DSADR(si->txdma) = si->dma_tx_buff_phy;
-   DTADR(si->txdma) = (unsigned long)si->irda_base + ICDR;
-   DCMD(si->txdma) = DCMD_INCSRCADDR | DCMD_FLOWTRG |  DCMD_ENDIRQEN | 
DCMD_WIDTH1 | DCMD_BURST32 | si->dma_tx_buff_len;
-   DCSR(si->txdma) |= DCSR_RUN;
+   struct dma_async_tx_descriptor *tx;
+
+   tx = dmaengine_prep_slave_single(si->txdma, si->dma_tx_buff_phy,
+si->dma_tx_buff_len, DMA_TO_DEVICE,
+DMA_PREP_INTERRUPT);
+   if (!tx) {
+   dev_err(si->dev, "prep_slave_sg() failed\n");
+   return;
+   }
+   tx->callback = pxa_irda_fir_dma_tx_irq;
+   tx->callback_param = si;
+   si->tx_cookie = dmaengine_submit(tx);
+   dma_async_issue_pending(si->rxdma);
 }
 
 /*
@@ -242,7 +270,7 @@ static int pxa_irda_set_speed(struct pxa_irda *si, int 
speed)
 
if (IS_FIR(si)) {
/* stop RX DMA */
-   DCSR(si->rxdma) &= ~DCSR_RUN;
+   dmaengine_terminate_all(si->rxdma);
/* disable FICP */
ficp_writel(si, 0, ICCR0);
pxa_irda_disable_clk(si);
@@ -388,30 +416,27 @@ static irqreturn_t pxa_irda_sir_irq(int irq, void *dev_id)
 }
 
 /* FIR Receive DMA interrupt handler */
-static void pxa_irda_fir_dma_rx_irq(int channel, void *data)
+static void pxa_irda_fir_dma_rx_irq(void *data)
 {
-   int dcsr = DCSR(channel);
-
-   DCSR(channel) = dcsr & ~DCSR_RUN;
+   struct net_device *dev = data;
+   struct pxa_irda *si = netdev_priv(dev);
 
-   printk(KERN_DEBUG "pxa_ir: fir rx dma bus error %#x\n", dcsr);
+   dmaengine_terminate_all(si->rxdma);
+   netdev_dbg(dev, "pxa_ir: fir rx dma bus error\n");
 }
 
 /* FIR Transmit DMA interrupt handler */
-static void pxa_irda_fir_dma_tx_irq(int channel, void *data)
+static void pxa_irda_fir_dma_tx_irq(void *data)
 {
struct net_device *dev = data;
struct pxa_irda *si = netdev_priv(dev);
-   int dcsr;
-
-   dcsr = DCSR(channel);
-   DCSR(channel) = dcsr & ~DCSR_RUN;
 
-   if (dcsr & 

[PATCH 0/3] net: irda: pxaficp_ir: dmaengine conversion

2015-09-03 Thread Robert Jarzmik
Hi,

This serie aims at converting pxaficp_ir to dmaengine. This is almost the last
driver to be converted, and once this is gone, legacy DMA support in pxa
architecture can be gone.

Nothing fancy here, standard readl/writel conversion, then dmaengine support.

The main trouble is that I cannot test it, I only compiled and inserted the
module, which works on lubbock, but I have no way to make a communcation try.

Petr, Dmitry, once the review is advanced enough, ie. in a couple of weeks, do
you have a way to test it on corgi/magician if I give you a git tree to pull
from ?

Cheers

--
Robert

Robert Jarzmik (3):
  net: irda: pxaficp_ir: use sched_clock() for time management
  net: irda: pxaficp_ir: convert to readl and writel
  net: irda: pxaficp_ir: dmaengine conversion

 drivers/net/irda/pxaficp_ir.c | 366 +++---
 1 file changed, 233 insertions(+), 133 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: wan: sbni: fix device usage count

2015-09-03 Thread Sudip Mukherjee
dev_get_by_name() will increment the usage count if the matching device
is found. But we were not decrementing the count if we have got the
device and the device is non-active.

Signed-off-by: Sudip Mukherjee 
---
 drivers/net/wan/sbni.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wan/sbni.c b/drivers/net/wan/sbni.c
index 758c4ba..8fef8d8 100644
--- a/drivers/net/wan/sbni.c
+++ b/drivers/net/wan/sbni.c
@@ -1358,6 +1358,8 @@ sbni_ioctl( struct net_device  *dev,  struct ifreq  *ifr, 
 int  cmd )
if( !slave_dev  ||  !(slave_dev->flags & IFF_UP) ) {
netdev_err(dev, "trying to enslave non-active device 
%s\n",
   slave_name);
+   if (slave_dev)
+   dev_put(slave_dev);
return  -EPERM;
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] net: irda: pxaficp_ir: use sched_clock() for time management

2015-09-03 Thread Robert Jarzmik
Instead of using directly the OS timer through direct register access,
use the standard sched_clock(), which will end up in OSCR reading
anyway.

This is a first step for direct access register removal and machine
specific code removal from this driver.

Signed-off-by: Robert Jarzmik 
---
 drivers/net/irda/pxaficp_ir.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/irda/pxaficp_ir.c b/drivers/net/irda/pxaficp_ir.c
index 100454662e4b..b1794998c68e 100644
--- a/drivers/net/irda/pxaficp_ir.c
+++ b/drivers/net/irda/pxaficp_ir.c
@@ -29,7 +29,6 @@
 
 #include 
 #include 
-#include 
 #include 
 
 #define FICP   __REG(0x4080)  /* Start of FICP area */
@@ -102,7 +101,7 @@
 struct pxa_irda {
int speed;
int newspeed;
-   unsigned long   last_oscr;
+   unsigned long long  last_clk;
 
unsigned char   *dma_rx_buff;
unsigned char   *dma_tx_buff;
@@ -292,7 +291,7 @@ static irqreturn_t pxa_irda_sir_irq(int irq, void *dev_id)
}
lsr = STLSR;
}
-   si->last_oscr = readl_relaxed(OSCR);
+   si->last_clk = sched_clock();
break;
 
case 0x04: /* Received Data Available */
@@ -303,7 +302,7 @@ static irqreturn_t pxa_irda_sir_irq(int irq, void *dev_id)
dev->stats.rx_bytes++;
async_unwrap_char(dev, >stats, >rx_buff, STRBR);
} while (STLSR & LSR_DR);
-   si->last_oscr = readl_relaxed(OSCR);
+   si->last_clk = sched_clock();
break;
 
case 0x02: /* Transmit FIFO Data Request */
@@ -319,7 +318,7 @@ static irqreturn_t pxa_irda_sir_irq(int irq, void *dev_id)
 /* We need to ensure that the transmitter has 
finished. */
while ((STLSR & LSR_TEMT) == 0)
cpu_relax();
-   si->last_oscr = readl_relaxed(OSCR);
+   si->last_clk = sched_clock();
 
/*
* Ok, we've finished transmitting.  Now enable
@@ -373,7 +372,7 @@ static void pxa_irda_fir_dma_tx_irq(int channel, void *data)
 
while (ICSR1 & ICSR1_TBY)
cpu_relax();
-   si->last_oscr = readl_relaxed(OSCR);
+   si->last_clk = sched_clock();
 
/*
 * HACK: It looks like the TBY bit is dropped too soon.
@@ -473,8 +472,8 @@ static irqreturn_t pxa_irda_fir_irq(int irq, void *dev_id)
 
/* stop RX DMA */
DCSR(si->rxdma) &= ~DCSR_RUN;
-   si->last_oscr = readl_relaxed(OSCR);
icsr0 = ICSR0;
+   si->last_clk = sched_clock();
 
if (icsr0 & (ICSR0_FRE | ICSR0_RAB)) {
if (icsr0 & ICSR0_FRE) {
@@ -549,7 +548,7 @@ static int pxa_irda_hard_xmit(struct sk_buff *skb, struct 
net_device *dev)
skb_copy_from_linear_data(skb, si->dma_tx_buff, skb->len);
 
if (mtt)
-   while ((unsigned)(readl_relaxed(OSCR) - 
si->last_oscr)/4 < mtt)
+   while ((sched_clock() - si->last_clk) / 4 < mtt)
cpu_relax();
 
/* stop RX DMA,  disable FICP */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Stephen Rothwell
Hi David,

On Wed, 02 Sep 2015 22:35:22 -0700 (PDT) David Miller  
wrote:
>
> The following changes since commit 4941b8f0c2b9d88e8a6dacebf8b7faf603b98368:
> 
>   Merge tag 'powerpc-4.2-4' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2015-08-27 
> 17:59:17 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 
> 
> for you to fetch changes up to 62da98656b62a5ca57f22263705175af8ded5aa1:
> 
>   netfilter: nf_conntrack: make nf_ct_zone_dflt built-in (2015-09-02 16:32:56 
> -0700)

[just for consistency ...]

This has 80 commits that have first been in linux-next on Sept 1 or
later (and 5 that have not made it to linux-next yet).  I understand
that this is part of Dave's work flow and most of these have been
queued for a while.

Not judging, just noting.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: eth: altera: fix napi poll_list corruption

2015-09-03 Thread Atsushi Nemoto
On Wed, 2 Sep 2015 22:32:54 -0700, David Miller  wrote:
>> I think napi_gro_flush() can be called with irq enabled, so moving the
>> spin_lock_irqsave() just before the __napi_complete() (or moving the
>> __napi_complete() just after the spin_lock_irqsave()) would be better,
>> right?
> 
> It should work, yes.

Thank you.  But I agree with Eric's last comment ("Calling
napi_gro_flush() and __napi_complete() looks error prone."), and found
that napi_complete_done() also checks NAPI_STATE_NPSVC to support
NETPOLL.  These checks looks somewhat redundant but I like simple way
unless it is really critical to performance.

So, please take original fix as is.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v2] drivers: net: cpsw: Add support to make gpio drive which slave connected to phy

2015-09-03 Thread Mugunthan V N
In DRA72x EVM, by default slave 1 is connected to the onboard
phy, but slave 2 pins are also muxed with video input module
which is controlled by pcf857x gpio and currently to select slave
0 to connect to phy gpio hogging is used, but with
omap2plus_defconfig the pcf857x gpio is built as module. So when
using NFS on DRA72x EVM, board doesn't boot as gpio hogging do
not set proper gpio state to connect slave 0 to phy as it is
built as module and you do not see any errors for not setting
gpio and just mentions dhcp reply not got.

To solve this issue, introducing "mode-gpio" in DT when gpio
based muxing is required. This will throw a warning when gpio
get fails and returns probe defer. When gpio-pcf857x module is
installed, cpsw probes again and ethernet becomes functional.
Verified this on DRA72x with pcf as module and ramdisk.

Signed-off-by: Mugunthan V N 
---

Changes from initial version:
* Updated the gpio dt naming to be more generic.

This patch is texted on DRA72x, Logs [1] and pushed a branch [2]

[1]: http://pastebin.ubuntu.com/12260767/
[2]: git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git 
cpsw-gpio-optional-v2

---
 Documentation/devicetree/bindings/net/cpsw.txt | 7 +++
 drivers/net/ethernet/ti/cpsw.c | 9 +
 2 files changed, 16 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt 
b/Documentation/devicetree/bindings/net/cpsw.txt
index 33fe846..dfe3e0b 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -26,6 +26,13 @@ Optional properties:
 - dual_emac: Specifies Switch to act as Dual EMAC
 - syscon   : Phandle to the system control device node, which is
  the control module device of the am33x
+- mode-gpio: Should be added if a gpio line is required to
+ be driven so that cpsw data lines can be
+ connected to the phy via selective mux. For
+ example in dra72x-evm, pcf gpio has to be
+ driven low so that cpsw slave 0 and phy
+ data lines are connected via mux.
+
 
 Slave Properties:
 Required properties:
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8fc90f1..90ae3f9 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2207,6 +2208,7 @@ static int cpsw_probe(struct platform_device *pdev)
void __iomem*ss_regs;
struct resource *res, *ss_res;
const struct of_device_id   *of_id;
+   struct gpio_desc*mode;
u32 slave_offset, sliver_offset, slave_size;
int ret = 0, i;
int irq;
@@ -2232,6 +2234,13 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ndev_ret;
}
 
+   mode = devm_gpiod_get_optional(>dev, "mode", GPIOD_OUT_LOW);
+   if (IS_ERR(mode)) {
+   ret = PTR_ERR(mode);
+   dev_err(>dev, "gpio request failed, ret %d\n", ret);
+   goto clean_ndev_ret;
+   }
+
/*
 * This may be required here for child devices.
 */
-- 
2.5.1.522.g7aa67f6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] Revert "net/ipv6: add sysctl option accept_ra_min_hop_limit"

2015-09-03 Thread Florian Westphal
David Miller  wrote:
> From: Sabrina Dubroca 
> Date: Wed, 2 Sep 2015 11:43:01 +0200
> 
> > This reverts commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4.
> > 
> > There are several issues with this patch.
> > It completely cancels the security changes introduced by 6fd99094de2b
> > ("ipv6: Don't reduce hop limit for an interface").
> > The current default value (min hop limit = 1) can result in the same
> > denial of service that 6fd99094de2b prevents, but it is hard to define
> > a correct and sane default value.
> > More generally, it is yet another IPv6 sysctl, and we already have too
> > many.
> > 
> > This was introduced to satisfy a TAHI test case which, in my opinion, is
> > too strict, turning the RFC's "SHOULD" into a "MUST":
> > 
> > If the received Cur Hop Limit value is non-zero, the host
> > SHOULD set its CurHopLimit variable to the received value.
> > 
> > The behavior of this sysctl is wrong in multiple ways.  Some are
> > fixable, but let's not rush this commit into mainline, and revert this
> > while we still can, then we can come up with a better solution.
> > 
> > Signed-off-by: Sabrina Dubroca 
> 
> I don't agree with this revert.
> 
> If you look at the original commit, the quoted RFC recommends adding
> a configurable method to protect against this.

Which also means it recommends a configurable method to NOT protect
against this.

Which begs the question in which scenario you would want to configure
end hosts in such a way that an RA can shrink hoplimit to values
where machines can't talk to internet hosts anymore.

> The only thing I would entertain is potentially an adjustment of the
> default, working in concert with the TAHI folks to make sure their
> tests still pass with any new default.

So, assuming we would change the default to 64 (the hoplimit default).
Where would it make sense to reconfigure this to a lower value?

Moreover, if we would (hypothetically) assume that an administrator wants
a smaller hop limit value and has to change knob to allow e.g. min
hoplimit of 10 they might as well just change the default hoplimit value
rather than altering min hoplimit and then set it via RA...?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread Florian Westphal
Nikola Forró  wrote:
> Man page of ip-route(8) says following about route types:
> 
>   unreachable - these destinations are unreachable.  Packets are dis‐
>   carded and the ICMP message host unreachable is generated.  The local
>   senders get an EHOSTUNREACH error.
> 
>   blackhole - these destinations are unreachable.  Packets are dis‐
>   carded silently.  The local senders get an EINVAL error.
> 
>   prohibit - these destinations are unreachable.  Packets are discarded
>   and the ICMP message communication administratively prohibited is
>   generated.  The local senders get an EACCES error.
> 
> In the inet6 address family, this was correct, except the local senders
> got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
> In the inet address family, all three route types generated ICMP message
> net unreachable, and the local senders got ENETUNREACH error.
> 
> In both address families all three route types now behave consistently
> with documentation.
> 
> Signed-off-by: Nikola Forró 
> ---
>  include/net/ip_fib.h | 21 -
>  net/ipv4/route.c |  6 --
>  net/ipv6/route.c |  4 +++-
>  3 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 5fa643b..cf025107 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -233,8 +233,10 @@ static inline int fib_lookup(struct net *net, const 
> struct flowi4 *flp,
>   rcu_read_lock();
>  
>   tb = fib_get_table(net, RT_TABLE_MAIN);
> - if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
> - err = 0;
> + if (tb)
> + err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
> + if (err == -EAGAIN)
> + err = -ENETUNREACH;

Missing { } ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/1] net: fec: clear receive interrupts before processing a packet

2015-09-03 Thread Philippe De Muyter
Hi Andy,

can you resubmit it, adding also my

Reported-by: Philippe De Muyter 

and explaining that it also prevents a complete rx blockage failure ?

Philippe

On Wed, Sep 02, 2015 at 11:40:15AM +0200, Philippe De Muyter wrote:
> On Wed, Sep 02, 2015 at 05:24:14PM +0800, Fugang Duan wrote:
> > From: Russell King 
> > 
> > The patch just to re-submit the patch "db3421c114cfa6326" because the
> > patch "4d494cdc92b3b9a0" remove the change.
> 
> I think you should mention also the titles of the commits.
> 
> And maybe send it also to stable.
> > 
> > Clear any pending receive interrupt before we process a pending packet.
> > This helps to avoid any spurious interrupts being raised after we have
> > fully cleaned the receive ring, while still allowing an interrupt to be
> > raised if we receive another packet.
> > 
> > The position of this is critical: we must do this prior to reading the
> > next packet status to avoid potentially dropping an interrupt when a
> > packet is still pending.
> > 
> > Acked-by: Fugang Duan 
> > Signed-off-by: Russell King 
> > ---
> >  drivers/net/ethernet/freescale/fec_main.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/net/ethernet/freescale/fec_main.c 
> > b/drivers/net/ethernet/freescale/fec_main.c
> > index 1f89c59..6bed0ff 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -1400,6 +1400,7 @@ fec_enet_rx_queue(struct net_device *ndev, int 
> > budget, u16 queue_id)
> > if ((status & BD_ENET_RX_LAST) == 0)
> > netdev_err(ndev, "rcv is not +last\n");
> >  
> Could a comment be added here to avoid another future removal ?
> 
> > +   writel(FEC_ENET_RXF, fep->hwp + FEC_IEVENT);
> >  
> > /* Check for errors. */
> > if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
> > -- 
> > 1.9.1
> 
> Philippe

-- 
Philippe De Muyter +32 2 6101532 Macq SA rue de l'Aeronef 2 B-1140 Bruxelles
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/9p: Add device name details on error

2015-09-03 Thread Aneesh Kumar K.V
If we use wrong device name 9p mount fails with error

"9pnet_virtio: no channels available"

Improve the error output as below

"9pnet_virtio: no channels available for device /dev/root"

Signed-off-by: Aneesh Kumar K.V 
---
 net/9p/trans_virtio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 6e70ddb158b4..3827760c8eef 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -662,7 +662,7 @@ p9_virtio_create(struct p9_client *client, const char 
*devname, char *args)
mutex_unlock(_9p_lock);
 
if (!found) {
-   pr_err("no channels available\n");
+   pr_err("no channels available for device %s\n", devname);
return ret;
}
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for multicast control

2015-09-03 Thread Paul Durrant
> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 03 September 2015 09:57
> To: Paul Durrant
> Cc: Ian Campbell; Wei Liu; xen-de...@lists.xenproject.org;
> netdev@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for
> multicast control
> 
> >>> On 02.09.15 at 18:58,  wrote:
> > @@ -1215,6 +1289,31 @@ static void xenvif_tx_build_gops(struct
> xenvif_queue *queue,
> > break;
> > }
> >
> > +   if (extras[XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1].type) {
> > +   struct xen_netif_extra_info *extra;
> > +
> > +   extra =
> [XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1];
> > +   ret = xenvif_mcast_add(queue->vif, extra-
> >u.mcast.addr);
> 
> What's the reason this call isn't gated on vif->multicast_control?
> 

No particular reason. I guess it eats a small amount of memory for no gain but 
a well behaved frontend wouldn't send such a request and a malicious one can 
only send 64 of them before netback starts to reject them.

  Paul

> Jan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Fix behavior of unreachable, blackhole and prohibit routes

2015-09-03 Thread Nikola Forró
Hello Alexander,
thank you for your comments.

On 09/01/2015 10:53 AM, Alexander Duyck wrote:

> Generally updating kernel code to match user-space documentation
> isn't 
> always the best way to go.  The question I would have is if there are
> any other user-space applications out there that might be expecting
> this 
> behaviour now?
> 
Well, any application which uses connect or sendto syscalls is getting
wrong error codes. If not wrong, than at least different for ipv4 and
ipv6. I think errors in fib_props are defined for a reason.

But I think bigger issue are incorrect ICMP messages being returned to
sender, e.g. packet going to blackhole route is not silently discarded,
instead it generates ICMP net unreachable message. I think that kind of
breaks the purpose of blackhole route.

> Also your changes don't seem to match up with what you have
> described. 
> You are returning the error code from fib_table_lookup, but 
> fib_table_lookup can return -EAGAIN if there is no matching entry
> found. 
>   I don't see you describing how you would deal with that case.  You 
> might try testing your code after deleting the default route to see
> what 
> behaviour it is you get.
> 
You are right, I need to handle -EAGAIN and return -ENETUNREACH
instead.

> This bit appears to overlook the fact that fib_rules_lookup could
> also 
> be the function used to return the error via a call to fib_lookup. 
>  In 
> which case that also throws -ESRCH into the mix for return error
> codes.
> 
I don't think it does. In __fib_lookup -ESRCH returned from
fib_rules_lookup is being replaced by -ENETUNREACH.


I will submit corrected patch.

Kind regards,
Nikola
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] tipc: fix stall during bclink wakeup procedure

2015-09-03 Thread Kolmakov Dmitriy
From: David Miller [mailto:da...@davemloft.net]
> 
> From: Kolmakov Dmitriy 
> Date: Wed, 2 Sep 2015 15:33:00 +
> 
> > If an attempt to wake up users of broadcast link is made when there
> is
> > no enough place in send queue than it may hang up inside the
> > tipc_sk_rcv() function since the loop breaks only after the wake up
> > queue becomes empty. This can lead to complete CPU stall with the
> > following message generated by RCU:
> 
> I don't understand how it can loop forever.
> 
> It should either successfully deliver each packet to the socket, or
> respond with a TIPC_ERR_OVERLOAD.
> 
> In both cases, the SKB is dequeued from the queue and forward progress
> is made.

The issue occurs only when tipc_sk_rcv() is used to wake up postponed senders. 
In this case the call stack is following:

tipc_bclink_wakeup_users()
// wakeupq - is a queue consist of special 
//   messages with SOCK_WAKEUP type. 
tipc_sk_rcv(wakeupq)
...
while (skb_queue_len(inputq)) {
filter_rcv(skb)
// Here the type of message is checked 
// and if it is SOCK_WAKEUP than
// it tries to wake up a sender.
tipc_write_space(sk)

wake_up_interruptible_sync_poll()
}

After the sender thread is woke up it can gather control and perform an attempt 
to send a message. But if there is no enough place in send queue it will call 
link_schedule_user() function which puts a message of type SOCK_WAKEUP to the 
wakeup queue and put the sender to sleep. Thus the size of the queue actually 
is not changed and the while() loop never exits. 

The approach I proposed is to wake up only senders for which there is enough 
place in send queue so the described issue can't occur. Moreover the same 
approach is already used to wake up senders on unicast links so it was possible 
to reuse existed code.

> 
> If there really is a problem somewhere in here, then two things:
> 
> 1) You need to describe exactly the sequence of tests and conditions
>that lead to the endless loop in this code, because I cannot see
>it.

I have got into the issue on our product code but to reproduce the issue I 
changed a benchmark test application (from tipcutils/demos/benchmark) to 
perform the following scenario:
1. Run 64 instances of test application (nodes). It can be done on the 
one physical machine.
2. Each application connects to all other using TIPC sockets in RDM 
mode.
3. When setup is done all nodes start simultaneously send broadcast 
messages. 
4. Everything hangs up.

The issue is reproducible only when a congestion on broadcast link occurs. For 
example, when there are only 8 nodes it works fine since congestion doesn't 
occur. Send queue limit is 40 in my case (I use a critical importance level) 
and when 64 nodes send a message at the same moment a congestion occurs every 
time.

> 
> 2) I suspect the fix is more likely to be appropriate in tipc_sk_rcv()
>or similar, rather than creating a dummy queue to workaround it's
>behavior.
> 
> Thanks.

BR,
DK
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Daniel Borkmann

On 09/03/2015 10:13 AM, Shaun Crampton wrote:
...

Is there anything I can do on a running system to help figure this out?
Some sort of kernel equivalent to pmap to find out what module or device
owns that chunk of memory?


Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81
and 0xa008772b could be from the same module, if any.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread Nikola Forró
Man page of ip-route(8) says following about route types:

  unreachable - these destinations are unreachable.  Packets are dis‐
  carded and the ICMP message host unreachable is generated.  The local
  senders get an EHOSTUNREACH error.

  blackhole - these destinations are unreachable.  Packets are dis‐
  carded silently.  The local senders get an EINVAL error.

  prohibit - these destinations are unreachable.  Packets are discarded
  and the ICMP message communication administratively prohibited is
  generated.  The local senders get an EACCES error.

In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.

In both address families all three route types now behave consistently
with documentation.

Signed-off-by: Nikola Forró 
---
 include/net/ip_fib.h | 21 -
 net/ipv4/route.c |  6 --
 net/ipv6/route.c |  4 +++-
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 5fa643b..cf025107 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -233,8 +233,10 @@ static inline int fib_lookup(struct net *net, const struct 
flowi4 *flp,
rcu_read_lock();
 
tb = fib_get_table(net, RT_TABLE_MAIN);
-   if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
-   err = 0;
+   if (tb)
+   err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
+   if (err == -EAGAIN)
+   err = -ENETUNREACH;
 
rcu_read_unlock();
 
@@ -267,11 +269,20 @@ static inline int fib_lookup(struct net *net, struct 
flowi4 *flp,
 
for (err = 0; !err; err = -ENETUNREACH) {
tb = rcu_dereference_rtnl(net->ipv4.fib_main);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
-   break;
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags);
+   if (!err)
+   break;
+   }
 
tb = rcu_dereference_rtnl(net->ipv4.fib_default);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags);
+   if (!err)
+   break;
+   }
+
+   if (err && err != -EAGAIN)
break;
}
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e681b85..4ce3f87 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2020,6 +2020,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
struct fib_result res;
struct rtable *rth;
int orig_oif;
+   int err = ENETUNREACH;
 
res.tclassid= 0;
res.fi  = NULL;
@@ -2123,7 +2124,8 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
goto make_route;
}
 
-   if (fib_lookup(net, fl4, , 0)) {
+   err = fib_lookup(net, fl4, , 0);
+   if (err) {
res.fi = NULL;
res.table = NULL;
if (fl4->flowi4_oif) {
@@ -2151,7 +2153,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
res.type = RTN_UNICAST;
goto make_route;
}
-   rth = ERR_PTR(-ENETUNREACH);
+   rth = ERR_PTR(err);
goto out;
}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d155864..d33a6a5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1847,9 +1847,11 @@ int ip6_route_add(struct fib6_config *cfg)
rt->dst.input = ip6_pkt_prohibit;
break;
case RTN_THROW:
+   case RTN_UNREACHABLE:
default:
rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN
-   : -ENETUNREACH;
+   : (cfg->fc_type == RTN_UNREACHABLE)
+   ? -EHOSTUNREACH : -ENETUNREACH;
rt->dst.output = ip6_pkt_discard_out;
rt->dst.input = ip6_pkt_discard;
break;
-- 
2.4.3


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for multicast control

2015-09-03 Thread Jan Beulich
>>> On 02.09.15 at 18:58,  wrote:
> @@ -1215,6 +1289,31 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
> *queue,
>   break;
>   }
>  
> + if (extras[XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1].type) {
> + struct xen_netif_extra_info *extra;
> +
> + extra = [XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1];
> + ret = xenvif_mcast_add(queue->vif, extra->u.mcast.addr);

What's the reason this call isn't gated on vif->multicast_control?

Jan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for multicast control

2015-09-03 Thread Ian Campbell
On Thu, 2015-09-03 at 10:00 +0100, Paul Durrant wrote:
> > 
> > -Original Message-
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: 03 September 2015 09:57
> > To: Paul Durrant
> > Cc: Ian Campbell; Wei Liu; xen-de...@lists.xenproject.org;
> > netdev@vger.kernel.org
> > Subject: Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support 
> > for
> > multicast control
> > 
> > > > > On 02.09.15 at 18:58,  wrote:
> > > @@ -1215,6 +1289,31 @@ static void xenvif_tx_build_gops(struct
> > xenvif_queue *queue,
> > >   break;
> > >   }
> > > 
> > > + if (extras[XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1].type) 
> > > {
> > > + struct xen_netif_extra_info *extra;
> > > +
> > > + extra =
> > [XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1];
> > > + ret = xenvif_mcast_add(queue->vif, extra-
> > > u.mcast.addr);
> > 
> > What's the reason this call isn't gated on vif->multicast_control?
> > 
> 
> No particular reason. I guess it eats a small amount of memory for no 
> gain but a well behaved frontend wouldn't send such a request and a 
> malicious one can only send 64 of them before netback starts to reject 
> them.

Perhaps a confused guest might submit them thinking they would work when
actually the feature hasn't been properly negotiated and since it would
succeed it wouldn't generate an error on the guest side?

(A bit of a niche corner case I confess...)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for multicast control

2015-09-03 Thread Paul Durrant
> -Original Message-
> From: Ian Campbell [mailto:ian.campb...@citrix.com]
> Sent: 03 September 2015 10:31
> To: Paul Durrant; Jan Beulich
> Cc: Wei Liu; xen-de...@lists.xenproject.org; netdev@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for
> multicast control
> 
> On Thu, 2015-09-03 at 10:00 +0100, Paul Durrant wrote:
> > >
> > > -Original Message-
> > > From: Jan Beulich [mailto:jbeul...@suse.com]
> > > Sent: 03 September 2015 09:57
> > > To: Paul Durrant
> > > Cc: Ian Campbell; Wei Liu; xen-de...@lists.xenproject.org;
> > > netdev@vger.kernel.org
> > > Subject: Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support
> > > for
> > > multicast control
> > >
> > > > > > On 02.09.15 at 18:58,  wrote:
> > > > @@ -1215,6 +1289,31 @@ static void xenvif_tx_build_gops(struct
> > > xenvif_queue *queue,
> > > > break;
> > > > }
> > > >
> > > > +   if (extras[XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1].type)
> > > > {
> > > > +   struct xen_netif_extra_info *extra;
> > > > +
> > > > +   extra =
> > > [XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1];
> > > > +   ret = xenvif_mcast_add(queue->vif, extra-
> > > > u.mcast.addr);
> > >
> > > What's the reason this call isn't gated on vif->multicast_control?
> > >
> >
> > No particular reason. I guess it eats a small amount of memory for no
> > gain but a well behaved frontend wouldn't send such a request and a
> > malicious one can only send 64 of them before netback starts to reject
> > them.
> 
> Perhaps a confused guest might submit them thinking they would work
> when
> actually the feature hasn't been properly negotiated and since it would
> succeed it wouldn't generate an error on the guest side?

It would, but that's essentially harmless to functionality. If the feature had 
not been negotiated properly then multicast flooding would still be in 
operation so the guest would not lose any multicasts. I can tighten things up 
if you like but as you say below it is a bit of a corner case.

  Paul

> 
> (A bit of a niche corner case I confess...)


Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Shaun Crampton

>Looking at this one, I am still puzzeled where 0xa008772b and
>0xa008772b comes from ... some driver, bridge ...?

Is there anything I can do on a running system to help figure this out?
Some sort of kernel equivalent to pmap to find out what module or device
owns that chunk of memory?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Shaun Crampton

>...
>> Is there anything I can do on a running system to help figure this out?
>> Some sort of kernel equivalent to pmap to find out what module or device
>> owns that chunk of memory?
>
>Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81
>and 0xa008772b could be from the same module, if any.

Any good: https://transfer.sh/szGRE/kallsyms ?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread Nikola Forró
Florian Westphal wrote:

> Missing { } ?
> 
I should really pay more attention to what I'm submitting.
Thanks Florian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread Nikola Forró
Man page of ip-route(8) says following about route types:

  unreachable - these destinations are unreachable.  Packets are dis‐
  carded and the ICMP message host unreachable is generated.  The local
  senders get an EHOSTUNREACH error.

  blackhole - these destinations are unreachable.  Packets are dis‐
  carded silently.  The local senders get an EINVAL error.

  prohibit - these destinations are unreachable.  Packets are discarded
  and the ICMP message communication administratively prohibited is
  generated.  The local senders get an EACCES error.

In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.

In both address families all three route types now behave consistently
with documentation.

Signed-off-by: Nikola Forró 
---
 include/net/ip_fib.h | 22 +-
 net/ipv4/route.c |  6 --
 net/ipv6/route.c |  4 +++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 5fa643b..8e7b3e1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -233,8 +233,11 @@ static inline int fib_lookup(struct net *net, const struct 
flowi4 *flp,
rcu_read_lock();
 
tb = fib_get_table(net, RT_TABLE_MAIN);
-   if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
-   err = 0;
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
+   if (err == -EAGAIN)
+   err = -ENETUNREACH;
+   }
 
rcu_read_unlock();
 
@@ -267,11 +270,20 @@ static inline int fib_lookup(struct net *net, struct 
flowi4 *flp,
 
for (err = 0; !err; err = -ENETUNREACH) {
tb = rcu_dereference_rtnl(net->ipv4.fib_main);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
-   break;
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags);
+   if (!err)
+   break;
+   }
 
tb = rcu_dereference_rtnl(net->ipv4.fib_default);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
+   if (tb) {
+   err = fib_table_lookup(tb, flp, res, flags);
+   if (!err)
+   break;
+   }
+
+   if (err && err != -EAGAIN)
break;
}
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e681b85..4ce3f87 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2020,6 +2020,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
struct fib_result res;
struct rtable *rth;
int orig_oif;
+   int err = ENETUNREACH;
 
res.tclassid= 0;
res.fi  = NULL;
@@ -2123,7 +2124,8 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
goto make_route;
}
 
-   if (fib_lookup(net, fl4, , 0)) {
+   err = fib_lookup(net, fl4, , 0);
+   if (err) {
res.fi = NULL;
res.table = NULL;
if (fl4->flowi4_oif) {
@@ -2151,7 +2153,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
res.type = RTN_UNICAST;
goto make_route;
}
-   rth = ERR_PTR(-ENETUNREACH);
+   rth = ERR_PTR(err);
goto out;
}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d155864..d33a6a5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1847,9 +1847,11 @@ int ip6_route_add(struct fib6_config *cfg)
rt->dst.input = ip6_pkt_prohibit;
break;
case RTN_THROW:
+   case RTN_UNREACHABLE:
default:
rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN
-   : -ENETUNREACH;
+   : (cfg->fc_type == RTN_UNREACHABLE)
+   ? -EHOSTUNREACH : -ENETUNREACH;
rt->dst.output = ip6_pkt_discard_out;
rt->dst.input = ip6_pkt_discard;
break;
-- 
2.4.3


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] x86: Wire up 32-bit direct socket calls

2015-09-03 Thread David Laight
From: Peter Anvin
> Sent: 02 September 2015 21:16
> On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:
> >
> > Should all other architectures follow suit?
> > Or should we follow the s390 approach:
> >
> 
> It is up to the maintainer(s), largely dependent on how likely you are
> going to want to support this in your libc, but in general, socketcall
> is an abomination which there is no reason not to bypass.

The other (worse) abomination is the way SCTP overloads setsockopt()
to perform actions that change state.
Rather unfortunately that got documented in the protocol standard :-(

David

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

[PATCH 2/6] netfilter: bridge: fix IPv6 packets not being bridged with CONFIG_IPV6=n

2015-09-03 Thread Pablo Neira Ayuso
From: Bernhard Thaler 

230ac490f7fba introduced a dependency to CONFIG_IPV6 which breaks bridging
of IPv6 packets on a bridge with CONFIG_IPV6=n.

Sysctl entry /proc/sys/net/bridge/bridge-nf-call-ip6tables defaults to 1,
for this reason packets are handled by br_nf_pre_routing_ipv6(). When compiled
with CONFIG_IPV6=n this function returns NF_DROP but should return NF_ACCEPT
to let packets through.

Change CONFIG_IPV6=n br_nf_pre_routing_ipv6() return value to NF_ACCEPT.

Tested with a simple bridge with two interfaces and IPv6 packets trying
to pass from host on left side to host on right side of the bridge.

Fixes: 230ac490f7fba ("netfilter: bridge: split ipv6 code into separated file")
Signed-off-by: Bernhard Thaler 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/br_netfilter.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netfilter/br_netfilter.h 
b/include/net/netfilter/br_netfilter.h
index bab824b..d4c6b5f 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -59,7 +59,7 @@ static inline unsigned int
 br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
-   return NF_DROP;
+   return NF_ACCEPT;
 }
 #endif
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] netfilter: nfnetlink: work around wrong endianess in res_id field

2015-09-03 Thread Pablo Neira Ayuso
The convention in nfnetlink is to use network byte order in every header field
as well as in the attribute payload. The initial version of the batching
infrastructure assumes that res_id comes in host byte order though.

The only client of the batching infrastructure is nf_tables, so let's add a
workaround to address this inconsistency. We currently have 11 nfnetlink
subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem
2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias
of nf_tables from the nfnetlink batching path when interpreting the res_id
field.

Based on original patch from Florian Westphal.

Reported-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 0c0e8ec..70277b1 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -444,6 +444,7 @@ done:
 static void nfnetlink_rcv(struct sk_buff *skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
+   u_int16_t res_id;
int msglen;
 
if (nlh->nlmsg_len < NLMSG_HDRLEN ||
@@ -468,7 +469,12 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 
nfgenmsg = nlmsg_data(nlh);
skb_pull(skb, msglen);
-   nfnetlink_rcv_batch(skb, nlh, nfgenmsg->res_id);
+   /* Work around old nft using host byte order */
+   if (nfgenmsg->res_id == NFNL_SUBSYS_NFTABLES)
+   res_id = NFNL_SUBSYS_NFTABLES;
+   else
+   res_id = ntohs(nfgenmsg->res_id);
+   nfnetlink_rcv_batch(skb, nlh, res_id);
} else {
netlink_rcv_skb(skb, _rcv_msg);
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths

2015-09-03 Thread Pablo Neira Ayuso
From: Daniel Borkmann 

Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack
templates") migrated templates to the new allocator api, but forgot to
update error paths for them in CT and synproxy to use nf_ct_tmpl_free()
instead of nf_conntrack_free().

Due to that, memory is being freed into the wrong kmemcache, but also
we drop the per net reference count of ct objects causing an imbalance.

In Brad's case, this leads to a wrap-around of net->ct.count and thus
lets __nf_conntrack_alloc() refuse to create a new ct object:

  [   10.340913] xt_addrtype: ipv6 does not support BROADCAST matching
  [   10.810168] nf_conntrack: table full, dropping packet
  [   11.917416] r8169 :07:00.0 eth0: link up
  [   11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
  [   12.815902] nf_conntrack: table full, dropping packet
  [   15.688561] nf_conntrack: table full, dropping packet
  [   15.689365] nf_conntrack: table full, dropping packet
  [   15.690169] nf_conntrack: table full, dropping packet
  [   15.690967] nf_conntrack: table full, dropping packet
  [...]

With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs.
nf_conntrack_81ce75c0) and reports poison overwrites, etc. Thus,
to fix the problem, export and use nf_ct_tmpl_free() instead.

Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack 
templates")
Reported-by: Brad Jackson 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack.h |1 +
 net/netfilter/nf_conntrack_core.c|3 ++-
 net/netfilter/nf_synproxy_core.c |2 +-
 net/netfilter/xt_CT.c|2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 37cd391..4023c4c 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -292,6 +292,7 @@ extern unsigned int nf_conntrack_hash_rnd;
 void init_nf_conntrack_hash_rnd(void);
 
 struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, gfp_t flags);
+void nf_ct_tmpl_free(struct nf_conn *tmpl);
 
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 3c20d02..0625a42 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -320,12 +320,13 @@ out_free:
 }
 EXPORT_SYMBOL_GPL(nf_ct_tmpl_alloc);
 
-static void nf_ct_tmpl_free(struct nf_conn *tmpl)
+void nf_ct_tmpl_free(struct nf_conn *tmpl)
 {
nf_ct_ext_destroy(tmpl);
nf_ct_ext_free(tmpl);
kfree(tmpl);
 }
+EXPORT_SYMBOL_GPL(nf_ct_tmpl_free);
 
 static void
 destroy_conntrack(struct nf_conntrack *nfct)
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index d7f1685..d6ee8f8 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -378,7 +378,7 @@ static int __net_init synproxy_net_init(struct net *net)
 err3:
free_percpu(snet->stats);
 err2:
-   nf_conntrack_free(ct);
+   nf_ct_tmpl_free(ct);
 err1:
return err;
 }
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 43ddeee..f3377ce 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -233,7 +233,7 @@ out:
return 0;
 
 err3:
-   nf_conntrack_free(ct);
+   nf_ct_tmpl_free(ct);
 err2:
nf_ct_l3proto_module_put(par->family);
 err1:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] netfilter: ipset: Fixing unnamed union init

2015-09-03 Thread Pablo Neira Ayuso
From: Elad Raz 

In continue to proposed Vinson Lee's post [1], this patch fixes compilation
issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed
unions causes compilation error in gcc 4.4.x.

References

Visible links
[1] https://lkml.org/lkml/2015/7/5/74

Signed-off-by: Elad Raz 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/ipset/ip_set_hash_netnet.c |   20 ++--
 net/netfilter/ipset/ip_set_hash_netportnet.c |   20 ++--
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c 
b/net/netfilter/ipset/ip_set_hash_netnet.c
index 3c862c0..a93dfeb 100644
--- a/net/netfilter/ipset/ip_set_hash_netnet.c
+++ b/net/netfilter/ipset/ip_set_hash_netnet.c
@@ -131,6 +131,13 @@ hash_netnet4_data_next(struct hash_netnet4_elem *next,
 #define HOST_MASK  32
 #include "ip_set_hash_gen.h"
 
+static void
+hash_netnet4_init(struct hash_netnet4_elem *e)
+{
+   e->cidr[0] = HOST_MASK;
+   e->cidr[1] = HOST_MASK;
+}
+
 static int
 hash_netnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
  const struct xt_action_param *par,
@@ -160,7 +167,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
 {
const struct hash_netnet *h = set->data;
ipset_adtfn adtfn = set->variant->adt[adt];
-   struct hash_netnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
+   struct hash_netnet4_elem e = { };
struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
u32 ip = 0, ip_to = 0, last;
u32 ip2 = 0, ip2_from = 0, ip2_to = 0, last2;
@@ -169,6 +176,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
if (tb[IPSET_ATTR_LINENO])
*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
 
+   hash_netnet4_init();
if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
return -IPSET_ERR_PROTOCOL;
@@ -357,6 +365,13 @@ hash_netnet6_data_next(struct hash_netnet4_elem *next,
 #define IP_SET_EMIT_CREATE
 #include "ip_set_hash_gen.h"
 
+static void
+hash_netnet6_init(struct hash_netnet6_elem *e)
+{
+   e->cidr[0] = HOST_MASK;
+   e->cidr[1] = HOST_MASK;
+}
+
 static int
 hash_netnet6_kadt(struct ip_set *set, const struct sk_buff *skb,
  const struct xt_action_param *par,
@@ -385,13 +400,14 @@ hash_netnet6_uadt(struct ip_set *set, struct nlattr *tb[],
  enum ipset_adt adt, u32 *lineno, u32 flags, bool retried)
 {
ipset_adtfn adtfn = set->variant->adt[adt];
-   struct hash_netnet6_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
+   struct hash_netnet6_elem e = { };
struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
int ret;
 
if (tb[IPSET_ATTR_LINENO])
*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
 
+   hash_netnet6_init();
if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
return -IPSET_ERR_PROTOCOL;
diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c 
b/net/netfilter/ipset/ip_set_hash_netportnet.c
index 0c68734..9a14c23 100644
--- a/net/netfilter/ipset/ip_set_hash_netportnet.c
+++ b/net/netfilter/ipset/ip_set_hash_netportnet.c
@@ -142,6 +142,13 @@ hash_netportnet4_data_next(struct hash_netportnet4_elem 
*next,
 #define HOST_MASK  32
 #include "ip_set_hash_gen.h"
 
+static void
+hash_netportnet4_init(struct hash_netportnet4_elem *e)
+{
+   e->cidr[0] = HOST_MASK;
+   e->cidr[1] = HOST_MASK;
+}
+
 static int
 hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
  const struct xt_action_param *par,
@@ -175,7 +182,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr 
*tb[],
 {
const struct hash_netportnet *h = set->data;
ipset_adtfn adtfn = set->variant->adt[adt];
-   struct hash_netportnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
+   struct hash_netportnet4_elem e = { };
struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
u32 ip = 0, ip_to = 0, ip_last, p = 0, port, port_to;
u32 ip2_from = 0, ip2_to = 0, ip2_last, ip2;
@@ -185,6 +192,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr 
*tb[],
if (tb[IPSET_ATTR_LINENO])
*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
 
+   hash_netportnet4_init();
if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
 !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) ||
 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
@@ -412,6 +420,13 @@ hash_netportnet6_data_next(struct hash_netportnet4_elem 
*next,
 #define IP_SET_EMIT_CREATE
 #include "ip_set_hash_gen.h"
 
+static void
+hash_netportnet6_init(struct hash_netportnet6_elem *e)
+{
+   e->cidr[0] = 

Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support for multicast control

2015-09-03 Thread Ian Campbell
On Thu, 2015-09-03 at 10:34 +0100, Paul Durrant wrote:
> > 
> > -Original Message-
> > From: Ian Campbell [mailto:ian.campb...@citrix.com]
> > Sent: 03 September 2015 10:31
> > To: Paul Durrant; Jan Beulich
> > Cc: Wei Liu; xen-de...@lists.xenproject.org; netdev@vger.kernel.org
> > Subject: Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add support 
> > for
> > multicast control
> > 
> > On Thu, 2015-09-03 at 10:00 +0100, Paul Durrant wrote:
> > > > 
> > > > -Original Message-
> > > > From: Jan Beulich [mailto:jbeul...@suse.com]
> > > > Sent: 03 September 2015 09:57
> > > > To: Paul Durrant
> > > > Cc: Ian Campbell; Wei Liu; xen-de...@lists.xenproject.org;
> > > > netdev@vger.kernel.org
> > > > Subject: Re: [Xen-devel] [PATCH v2 net-next] xen-netback: add 
> > > > support
> > > > for
> > > > multicast control
> > > > 
> > > > > > > On 02.09.15 at 18:58,  wrote:
> > > > > @@ -1215,6 +1289,31 @@ static void xenvif_tx_build_gops(struct
> > > > xenvif_queue *queue,
> > > > >   break;
> > > > >   }
> > > > > 
> > > > > + if (extras[XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 
> > > > > 1].type)
> > > > > {
> > > > > + struct xen_netif_extra_info *extra;
> > > > > +
> > > > > + extra =
> > > > [XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1];
> > > > > + ret = xenvif_mcast_add(queue->vif, extra
> > > > > -
> > > > > u.mcast.addr);
> > > > 
> > > > What's the reason this call isn't gated on vif->multicast_control?
> > > > 
> > > 
> > > No particular reason. I guess it eats a small amount of memory for no
> > > gain but a well behaved frontend wouldn't send such a request and a
> > > malicious one can only send 64 of them before netback starts to 
> > > reject
> > > them.
> > 
> > Perhaps a confused guest might submit them thinking they would work
> > when
> > actually the feature hasn't been properly negotiated and since it would
> > succeed it wouldn't generate an error on the guest side?
> 
> It would, but that's essentially harmless to functionality. If the 
> feature had not been negotiated properly then multicast flooding would 
> still be in operation so the guest would not lose any multicasts. I can 
> tighten things up if you like but as you say below it is a bit of a 
> corner case.

Ah yes, I had something backwards and thought the guest might miss out on
something it was expecting, but as you say it will just get more than it
wanted.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 9p: trans_fd, read rework to use p9_parse_header

2015-09-03 Thread Dominique Martinet
Most of the changes here are no-op and just renaming to use a
fcall struct, needed for p9_parse_header

It fixes the unaligned memory access to read the tag and defers to
common functions for part of the protocol knowledge (although header
length is still hard-coded...)

Reported-By: Rob Landley 
Signed-Off-By: Dominique Martinet 
---
 net/9p/trans_fd.c | 75 +--
 1 file changed, 40 insertions(+), 35 deletions(-)

It ended up alot bigger than I thought it'd be, submitting it anyway
but happy with either version - letting Eric decide what's better :)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index bced8c0..a270dcc 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -108,9 +108,7 @@ struct p9_poll_wait {
  * @unsent_req_list: accounting for requests that haven't been sent
  * @req: current request being processed (if any)
  * @tmp_buf: temporary buffer to read in header
- * @rsize: amount to read for current frame
- * @rpos: read position in current frame
- * @rbuf: current read buffer
+ * @rc: temporary fcall for reading current frame
  * @wpos: write position for current frame
  * @wsize: amount of data to write for current frame
  * @wbuf: current write buffer
@@ -131,9 +129,7 @@ struct p9_conn {
struct list_head unsent_req_list;
struct p9_req_t *req;
char tmp_buf[7];
-   int rsize;
-   int rpos;
-   char *rbuf;
+   struct p9_fcall rc;
int wpos;
int wsize;
char *wbuf;
@@ -305,49 +301,56 @@ static void p9_read_work(struct work_struct *work)
if (m->err < 0)
return;
 
-   p9_debug(P9_DEBUG_TRANS, "start mux %p pos %d\n", m, m->rpos);
+   p9_debug(P9_DEBUG_TRANS, "start mux %p pos %zd\n", m, m->rc.offset);
 
-   if (!m->rbuf) {
-   m->rbuf = m->tmp_buf;
-   m->rpos = 0;
-   m->rsize = 7; /* start by reading header */
+   if (!m->rc.sdata) {
+   m->rc.sdata = m->tmp_buf;
+   m->rc.offset = 0;
+   m->rc.capacity = 7; /* start by reading header */
}
 
clear_bit(Rpending, >wsched);
-   p9_debug(P9_DEBUG_TRANS, "read mux %p pos %d size: %d = %d\n",
-m, m->rpos, m->rsize, m->rsize-m->rpos);
-   err = p9_fd_read(m->client, m->rbuf + m->rpos,
-   m->rsize - m->rpos);
+   p9_debug(P9_DEBUG_TRANS, "read mux %p pos %zd size: %zd = %zd\n",
+m, m->rc.offset, m->rc.capacity,
+m->rc.capacity - m->rc.offset);
+   err = p9_fd_read(m->client, m->rc.sdata + m->rc.offset,
+m->rc.capacity - m->rc.offset);
p9_debug(P9_DEBUG_TRANS, "mux %p got %d bytes\n", m, err);
-   if (err == -EAGAIN) {
+   if (err == -EAGAIN)
goto end_clear;
-   }
 
if (err <= 0)
goto error;
 
-   m->rpos += err;
+   m->rc.offset += err;
 
-   if ((!m->req) && (m->rpos == m->rsize)) { /* header read in */
-   u16 tag;
+   /* header read in */
+   if ((!m->req) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new header\n");
 
-   n = le32_to_cpu(*(__le32 *) m->rbuf); /* read packet size */
-   if (n >= m->client->msize) {
+   err = p9_parse_header(>rc, NULL, NULL, NULL, 0);
+   if (err) {
p9_debug(P9_DEBUG_ERROR,
-"requested packet size too big: %d\n", n);
+"error parsing header: %d\n", err);
+   goto error;
+   }
+
+   if (m->rc.size >= m->client->msize) {
+   p9_debug(P9_DEBUG_ERROR,
+"requested packet size too big: %d\n",
+m->rc.size);
err = -EIO;
goto error;
}
 
-   tag = le16_to_cpu(*(__le16 *) (m->rbuf+5)); /* read tag */
p9_debug(P9_DEBUG_TRANS,
-"mux %p pkt: size: %d bytes tag: %d\n", m, n, tag);
+"mux %p pkt: size: %d bytes tag: %d\n",
+m, m->rc.size, m->rc.tag);
 
-   m->req = p9_tag_lookup(m->client, tag);
+   m->req = p9_tag_lookup(m->client, m->rc.tag);
if (!m->req || (m->req->status != REQ_STATUS_SENT)) {
p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n",
-tag);
+m->rc.tag);
err = -EIO;
goto error;
}
@@ -361,13 +364,15 @@ static void p9_read_work(struct work_struct *work)
goto error;
}
}
-   

[PATCH] 9p: trans_fd, initialize recv fcall properly if not set

2015-09-03 Thread Dominique Martinet
That code really should never be called (rc is allocated in
tag_alloc), but if it had been it couldn't have worked...

Signed-off-by: Dominique Martinet 
---
 net/9p/trans_fd.c | 3 +++
 1 file changed, 3 insertions(+)

To be honest, I think it might be better to just bail out if we get in
this switch (m->req->rc == NULL after p9_tag_lookup) and not try to
allocate more, because if we get there it's likely a race condition and
silently re-allocating will end up in more troubles than trying to
recover is worth.
Thoughts ?

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a270dcc..0d9831a 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -363,6 +363,9 @@ static void p9_read_work(struct work_struct *work)
err = -ENOMEM;
goto error;
}
+   m->req->rc.capacity = m->client->msize;
+   m->req->rc.sdata = (char*)m->req->rc +
+   sizeof(struct p9_fcall);
}
m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Netfilter fixes for net

2015-09-03 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter fixes for net, they are:

1) Oneliner to restore maps in nf_tables since we support addressing registers
   at 32 bits level.

2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n,
   oneliner from Bernhard Thaler.

3) Out of bound access in ipset hash:net* set types, reported by Dave Jones'
   KASan utility, patch from Jozsef Kadlecsik.

4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of
   unnamed unions, patch from Elad Raz.

5) Add a workaround to address inconsistent endianess in the res_id field of
   nfnetlink batch messages, reported by Florian Westphal.

6) Fix error paths of CT/synproxy since the conntrack template was moved to use
   kmalloc, patch from Daniel Borkmann.

All of them look good to me to reach 4.2, I can route this to -stable myself
too, just let me know what you prefer.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit fd7dec25a18f495e50d2040398fd263836ff3b28:

  batman-adv: Fix memory leak on tt add with invalid vlan (2015-08-18 19:08:23 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to 9cf94eab8b309e8bcc78b41dd1561c75b537dd0b:

  netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths 
(2015-09-01 12:15:08 +0200)


Bernhard Thaler (1):
  netfilter: bridge: fix IPv6 packets not being bridged with CONFIG_IPV6=n

Daniel Borkmann (1):
  netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths

Elad Raz (1):
  netfilter: ipset: Fixing unnamed union init

Jozsef Kadlecsik (1):
  netfilter: ipset: Out of bound access in hash:net* types fixed

Pablo Neira Ayuso (2):
  netfilter: nf_tables: Use 32 bit addressing register from 
nft_type_to_reg()
  netfilter: nfnetlink: work around wrong endianess in res_id field

 include/net/netfilter/br_netfilter.h |2 +-
 include/net/netfilter/nf_conntrack.h |1 +
 include/net/netfilter/nf_tables.h|2 +-
 net/netfilter/ipset/ip_set_hash_gen.h|   12 
 net/netfilter/ipset/ip_set_hash_netnet.c |   20 ++--
 net/netfilter/ipset/ip_set_hash_netportnet.c |   20 ++--
 net/netfilter/nf_conntrack_core.c|3 ++-
 net/netfilter/nf_synproxy_core.c |2 +-
 net/netfilter/nfnetlink.c|8 +++-
 net/netfilter/xt_CT.c|2 +-
 10 files changed, 58 insertions(+), 14 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] netfilter: ipset: Out of bound access in hash:net* types fixed

2015-09-03 Thread Pablo Neira Ayuso
From: Jozsef Kadlecsik 

Dave Jones reported that KASan detected out of bounds access in hash:net*
types:

[   23.139532] 
==
[   23.146130] BUG: KASan: out of bounds access in 
hash_net4_add_cidr+0x1db/0x220 at addr 8800d4844b58
[   23.152937] Write of size 4 by task ipset/457
[   23.159742] 
=
[   23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected
[   23.173641] 
-
[   23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 
pid=456
[   23.201836]  __slab_alloc.constprop.66+0x554/0x620
[   23.208994]  __kmalloc+0x2f2/0x360
[   23.216105]  hash_net_create+0x16a/0x470
[   23.223238]  ip_set_create+0x3e6/0x740
[   23.230343]  nfnetlink_rcv_msg+0x599/0x640
[   23.237454]  netlink_rcv_skb+0x14f/0x190
[   23.244533]  nfnetlink_rcv+0x3f6/0x790
[   23.251579]  netlink_unicast+0x272/0x390
[   23.258573]  netlink_sendmsg+0x5a1/0xa50
[   23.265485]  SYSC_sendto+0x1da/0x2c0
[   23.272364]  SyS_sendto+0xe/0x10
[   23.279168]  entry_SYSCALL_64_fastpath+0x12/0x6f

The bug is fixed in the patch and the testsuite is extended in ipset
to check cidr handling more thoroughly.

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index afe905c..691b54f 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -152,9 +152,13 @@ htable_bits(u32 hashsize)
 #define SET_HOST_MASK(family)  (family == AF_INET ? 32 : 128)
 
 #ifdef IP_SET_HASH_WITH_NET0
+/* cidr from 0 to SET_HOST_MASK() value and c = cidr + 1 */
 #define NLEN(family)   (SET_HOST_MASK(family) + 1)
+#define CIDR_POS(c)((c) - 1)
 #else
+/* cidr from 1 to SET_HOST_MASK() value and c = cidr + 1 */
 #define NLEN(family)   SET_HOST_MASK(family)
+#define CIDR_POS(c)((c) - 2)
 #endif
 
 #else
@@ -305,7 +309,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 
n)
} else if (h->nets[i].cidr[n] < cidr) {
j = i;
} else if (h->nets[i].cidr[n] == cidr) {
-   h->nets[cidr - 1].nets[n]++;
+   h->nets[CIDR_POS(cidr)].nets[n]++;
return;
}
}
@@ -314,7 +318,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 
n)
h->nets[i].cidr[n] = h->nets[i - 1].cidr[n];
}
h->nets[i].cidr[n] = cidr;
-   h->nets[cidr - 1].nets[n] = 1;
+   h->nets[CIDR_POS(cidr)].nets[n] = 1;
 }
 
 static void
@@ -325,8 +329,8 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 
n)
for (i = 0; i < nets_length; i++) {
if (h->nets[i].cidr[n] != cidr)
continue;
-   h->nets[cidr - 1].nets[n]--;
-   if (h->nets[cidr - 1].nets[n] > 0)
+   h->nets[CIDR_POS(cidr)].nets[n]--;
+   if (h->nets[CIDR_POS(cidr)].nets[n] > 0)
return;
for (j = i; j < net_end && h->nets[j].cidr[n]; j++)
h->nets[j].cidr[n] = h->nets[j + 1].cidr[n];
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] netfilter: nf_tables: Use 32 bit addressing register from nft_type_to_reg()

2015-09-03 Thread Pablo Neira Ayuso
nft_type_to_reg() needs to return the register in the new 32 bit addressing,
otherwise we hit EINVAL when using mappings.

Fixes: 49499c3 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Reported-by: Andreas Schultz 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 2a24668..aa8bee7 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -125,7 +125,7 @@ static inline enum nft_data_types nft_dreg_to_type(enum 
nft_registers reg)
 
 static inline enum nft_registers nft_type_to_reg(enum nft_data_types type)
 {
-   return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1;
+   return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * 
NFT_REG_SIZE / NFT_REG32_SIZE;
 }
 
 unsigned int nft_parse_register(const struct nlattr *attr);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] be2net: Revert "make the RX_FILTER command asynchronous" commit

2015-09-03 Thread Sathya Perla
The be_cmd_rx_filter() routine sends a non-embedded cmd to the FW and used
a pre-allocated dma memory to hold the cmd payload. This worked fine when
this cmd was synchronous. This cmd was changed to asynchronous mode by the
commit 8af65c2f4("make the RX_FILTER command asynchronous"). So now when
there are two quick invocations of this cmd, the 2nd request may end up
overwriting the first request, causing FW cmd corruption.

This patch reverts the offending commit and hence fixes the regression.

Fixes: 8af65c2f4("be2net: make the RX_FILTER command asynchronous")
Signed-off-by: Sathya Perla 
---
David, the culprit commit that this patch is reverting was applied on
the net-next tree. As net-next tree is closed now, I'm assuming you've
merged the net-next tree onto the net tree. Thanks!

 drivers/net/ethernet/emulex/benet/be_cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 3be1fbd..eb32391 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -1968,7 +1968,7 @@ static int __be_cmd_rx_filter(struct be_adapter *adapter, 
u32 flags, u32 value)
memcpy(req->mcast_mac[i++].byte, ha->addr, ETH_ALEN);
}
 
-   status = be_mcc_notify(adapter);
+   status = be_mcc_notify_wait(adapter);
 err:
spin_unlock_bh(>mcc_lock);
return status;
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net] switchdev: fix return value of switchdev_port_fdb_dump in case of error

2015-09-03 Thread Jiri Pirko
From: Jiri Pirko 

switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is
idx, so we cannot return errval.

Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via 
switchdev_port_obj ops.")
Signed-off-by: Jiri Pirko 
---
 net/switchdev/switchdev.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 16c1c43..fda38f8 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -853,12 +853,8 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
.cb = cb,
.idx = idx,
};
-   int err;
-
-   err = switchdev_port_obj_dump(dev, );
-   if (err)
-   return err;
 
+   switchdev_port_obj_dump(dev, );
return dump.idx;
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_dump);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH] net: kill default_pref field of struct fib_rules_ops

2015-09-03 Thread Phil Sutter
Since now all users of that field have been converted to use the generic
function fib_default_rule_pref() when assigning to it, fib_nl_newrule()
may just use it directly instead.

Signed-off-by: Phil Sutter 
---
 include/net/fib_rules.h |  1 -
 net/core/fib_rules.c| 10 +++---
 net/decnet/dn_rules.c   |  1 -
 net/ipv4/fib_rules.c|  1 -
 net/ipv4/ipmr.c |  1 -
 net/ipv6/fib6_rules.c   |  1 -
 net/ipv6/ip6mr.c|  1 -
 7 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 4e8f804..75cda93 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -66,7 +66,6 @@ struct fib_rules_ops {
   struct nlattr **);
int (*fill)(struct fib_rule *, struct sk_buff *,
struct fib_rule_hdr *);
-   u32 (*default_pref)(struct fib_rules_ops *ops);
size_t  (*nlmsg_payload)(struct fib_rule *);
 
/* Called after modifications to the rules set, must flush
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index ae8306e..bf77e36 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -44,7 +44,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
 }
 EXPORT_SYMBOL(fib_default_rule_add);
 
-u32 fib_default_rule_pref(struct fib_rules_ops *ops)
+static u32 fib_default_rule_pref(struct fib_rules_ops *ops)
 {
struct list_head *pos;
struct fib_rule *rule;
@@ -60,7 +60,6 @@ u32 fib_default_rule_pref(struct fib_rules_ops *ops)
 
return 0;
 }
-EXPORT_SYMBOL(fib_default_rule_pref);
 
 static void notify_rule_change(int event, struct fib_rule *rule,
   struct fib_rules_ops *ops, struct nlmsghdr *nlh,
@@ -299,8 +298,8 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
}
rule->fr_net = net;
 
-   if (tb[FRA_PRIORITY])
-   rule->pref = nla_get_u32(tb[FRA_PRIORITY]);
+   rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY])
+ : fib_default_rule_pref(ops);
 
if (tb[FRA_IIFNAME]) {
struct net_device *dev;
@@ -350,9 +349,6 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
else
rule->suppress_ifgroup = -1;
 
-   if (!tb[FRA_PRIORITY] && ops->default_pref)
-   rule->pref = ops->default_pref(ops);
-
err = -EINVAL;
if (tb[FRA_GOTO]) {
if (rule->action != FR_ACT_GOTO)
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 9d66a0f..295bbd6 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -229,7 +229,6 @@ static const struct fib_rules_ops __net_initconst 
dn_fib_rules_ops_template = {
.configure  = dn_fib_rule_configure,
.compare= dn_fib_rule_compare,
.fill   = dn_fib_rule_fill,
-   .default_pref   = fib_default_rule_pref,
.flush_cache= dn_fib_rule_flush_cache,
.nlgroup= RTNLGRP_DECnet_RULE,
.policy = dn_fib_rule_policy,
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 18123d5..f2bda9e 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -318,7 +318,6 @@ static const struct fib_rules_ops __net_initconst 
fib4_rules_ops_template = {
.delete = fib4_rule_delete,
.compare= fib4_rule_compare,
.fill   = fib4_rule_fill,
-   .default_pref   = fib_default_rule_pref,
.nlmsg_payload  = fib4_rule_nlmsg_payload,
.flush_cache= fib4_rule_flush_cache,
.nlgroup= RTNLGRP_IPV4_RULE,
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 3a2c016..866ee89 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -233,7 +233,6 @@ static const struct fib_rules_ops __net_initconst 
ipmr_rules_ops_template = {
.match  = ipmr_rule_match,
.configure  = ipmr_rule_configure,
.compare= ipmr_rule_compare,
-   .default_pref   = fib_default_rule_pref,
.fill   = ipmr_rule_fill,
.nlgroup= RTNLGRP_IPV4_RULE,
.policy = ipmr_rule_policy,
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index a859ad2..9f777ec 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -274,7 +274,6 @@ static const struct fib_rules_ops __net_initconst 
fib6_rules_ops_template = {
.configure  = fib6_rule_configure,
.compare= fib6_rule_compare,
.fill   = fib6_rule_fill,
-   .default_pref   = fib_default_rule_pref,
.nlmsg_payload  = fib6_rule_nlmsg_payload,
.nlgroup= RTNLGRP_IPV6_RULE,
.policy = fib6_rule_policy,
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 74ceb73..774c95e 

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Add port attribute when tracking counters"

2015-09-03 Thread eran ben elisha
On Mon, Aug 31, 2015 at 5:39 AM, Alexey Kardashevskiy  wrote:
> On 08/30/2015 04:28 PM, Or Gerlitz wrote:
>>
>> On Fri, Aug 28, 2015 at 7:06 AM, Alexey Kardashevskiy 
>> wrote:
>>>
>>> 68230242cdb breaks SRIOV on POWER8 system. I am not really suggesting
>>> reverting the patch, rather asking for a fix.
>>
>>
>> thanks for the detailed report, we will look into that.
>>
>> Just to be sure, when going back in time, what is the latest upstream
>> version where
>> this system/config works okay? is that 4.1 or later?
>
>
> 4.1 is good, 4.2 is not.
>
>
>
>>
>>>
>>> To reproduce it:
>>>
>>> 1. boot latest upstream kernel (v4.2-rc8 sha1 4941b8f, ppc64le)
>>>
>>> 2. Run:
>>> sudo rmmod mlx4_en mlx4_ib mlx4_core
>>> sudo modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2
>>> debug_level=1
>>>
>>> 3. Run QEMU (just to give a complete picture):
>>> /home/aik/qemu-system-ppc64 -enable-kvm -m 2048 -machine pseries \
>>> -nodefaults \
>>> -chardev stdio,id=id0,signal=off,mux=on \
>>> -device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>>> -mon id=id2,chardev=id0,mode=readline -nographic -vga none \
>>> -initrd dhclient.cpio -kernel vml400bedbg \
>>> -device vfio-pci,id=id3,host=0003:03:00.1
>>> What guest is used does not matter at all.
>>>
>>> 4. Wait till guest boots and then run:
>>> dhclient
>>> This assigns IPs to both interfaces just fine. This is essential -
>>> if interface was not brought up since guest started, the bug does not
>>> appear.
>>> If interface was up and then down, this still causes the problem
>>> (less likely though).
>>>
>>> 5. Run in the guest: shutdown -h 0
>>> Guest prints:
>>> mlx4_en: eth0: Close port called
>>> mlx4_en: eth1: Close port called
>>> mlx4_core :00:00.0: mlx4_shutdown was called
>>> And then the host hangs. After 10-30 seconds the host console prints:
>>> NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
>>> [qemu-system-ppc:5095]
>>> OR
>>> INFO: rcu_sched detected stalls on CPUs/tasks:
>>> or some other random stuff but always related to some sort of lockup.
>>> Backtraces are like these:
>>>
>>> [c01e492a7ac0] [c0135b84]
>>> smp_call_function_many+0x2f4/0x3fable)
>>> [c01e492a7b40] [c0135db8] kick_all_cpus_sync+0x38/0x50
>>> [c01e492a7b60] [c0048f38] pmdp_huge_get_and_clear+0x48/0x70
>>> [c01e492a7b90] [c023181c] change_huge_pmd+0xac/0x210
>>> [c01e492a7bf0] [c01fb9e8] change_protection+0x678/0x720
>>> [c01e492a7d00] [c0217d38] change_prot_numa+0x28/0xa0
>>> [c01e492a7d30] [c00e0e40] task_numa_work+0x2a0/0x370
>>> [c01e492a7db0] [c00c5fb4] task_work_run+0xe4/0x160
>>> [c01e492a7e00] [c00169a4] do_notify_resume+0x84/0x90
>>> [c01e492a7e30] [c00098b8] ret_from_except_lite+0x64/0x68
>>>
>>> OR
>>>
>>> [c01def1b7280] [c00ff941d368] 0xc00ff941d368 (unreliable)
>>> [c01def1b7450] [c001512c] __switch_to+0x1fc/0x350
>>> [c01def1b7490] [c01def1b74e0] 0xc01def1b74e0
>>> [c01def1b74e0] [c011a50c] try_to_del_timer_sync+0x5c/0x90
>>> [c01def1b7520] [c011a590] del_timer_sync+0x50/0x70
>>> [c01def1b7550] [c09136fc] schedule_timeout+0x15c/0x2b0
>>> [c01def1b7620] [c0910e6c] wait_for_common+0x12c/0x230
>>> [c01def1b7660] [c00fa22c] up+0x4c/0x80
>>> [c01def1b76a0] [d00016323e60] __mlx4_cmd+0x320/0x940 [mlx4_core]
>>> [c01def1b7760] [c01def1b77a0] 0xc01def1b77a0
>>> [c01def1b77f0] [d000163528b4] mlx4_2RST_QP_wrapper+0x154/0x1e0
>>> [mlx4_core]
>>> [c01def1b7860] [d00016324934]
>>> mlx4_master_process_vhcr+0x1b4/0x6c0 [mlx4_core]
>>> [c01def1b7930] [d00016324170] __mlx4_cmd+0x630/0x940 [mlx4_core]
>>> [c01def1b79f0] [d00016346fec]
>>> __mlx4_qp_modify.constprop.8+0x1ec/0x350 [mlx4_core]
>>> [c01def1b7ac0] [d00016292228] mlx4_ib_destroy_qp+0xd8/0x5d0
>>> [mlx4_ib]
>>> [c01def1b7b60] [d00013c7305c] ib_destroy_qp+0x1cc/0x290 [ib_core]
>>> [c01def1b7bb0] [d00016284548]
>>> destroy_pv_resources.isra.14.part.15+0x48/0xf0 [mlx4_ib]
>>> [c01def1b7be0] [d00016284d28] mlx4_ib_tunnels_update+0x168/0x170
>>> [mlx4_ib]
>>> [c01def1b7c20] [d000162876e0]
>>> mlx4_ib_tunnels_update_work+0x30/0x50 [mlx4_ib]
>>> [c01def1b7c50] [c00c0d34] process_one_work+0x194/0x490
>>> [c01def1b7ce0] [c00c11b0] worker_thread+0x180/0x5a0
>>> [c01def1b7d80] [c00c8a0c] kthread+0x10c/0x130
>>> [c01def1b7e30] [c00095a8] ret_from_kernel_thread+0x5c/0xb4
>>>
>>> i.e. may or may not mention mlx4.
>>> The issue may not happen on a first try but maximum on the second.
>>
>>
>> so when you revert commit 68230242cdb on the host all works just fine?
>> what guest driver are you running?
>
>
> To be precise, I did checkout 68230242cdb, checked that it does not work,
> then reverted 68230242cdb right there and checked that it works. I did 

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Eric Dumazet
On Thu, 2015-09-03 at 10:09 +, Shaun Crampton wrote:
> >...
> >> Is there anything I can do on a running system to help figure this out?
> >> Some sort of kernel equivalent to pmap to find out what module or device
> >> owns that chunk of memory?
> >
> >Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81
> >and 0xa008772b could be from the same module, if any.
> 
> Any good: https://transfer.sh/szGRE/kallsyms ?
> 

seems to be cryptd module.

Have you tried to run an pristine upstream kernel ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-03 Thread Alexander Duyck

On 09/03/2015 10:29 AM, David Miller wrote:

From: Nikola Forró 
Date: Thu, 03 Sep 2015 11:08:51 +0200


@@ -233,8 +233,10 @@ static inline int fib_lookup(struct net *net, const struct 
flowi4 *flp,
rcu_read_lock();
  
  	tb = fib_get_table(net, RT_TABLE_MAIN);

-   if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
-   err = 0;
+   if (tb)
+   err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
+   if (err == -EAGAIN)
+   err = -ENETUNREACH;

You didn't test this.


Actually the way the code is structured is still functional this way.  
The indentation is all that is really wrong.


I suspect this actually results in smaller code that may be faster for 
the standard case since tb will almost always have a value anyway, and 
if tb doesn't exist then err would equal -ENETUNREACH which would just 
mean the err == -EAGAIN would be ignored.


- Alex


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Support ip route get via given table

2015-09-03 Thread David Miller
From: David Ahern 
Date: Wed,  2 Sep 2015 12:03:12 -0700

> Add support for 'ip [-6] route get table X' where the user wants to
> force the FIB lookup from a given table.
> 
> Signed-off-by: David Ahern 

As Thomas mentioned, this adds cost to the FIB lookup fastpath
for a control-plane only feature, which is really undesirable.

I think I'll pass on this change for now, sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [PATCH] ixgbe: Remove bimodal SR-IOV disabling

2015-09-03 Thread Singh, Krishneil K
-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On 
Behalf Of Alex Williamson
Sent: Friday, July 10, 2015 3:44 PM
To: Rose, Gregory V 
Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; 
linux-ker...@vger.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH] ixgbe: Remove bimodal SR-IOV disabling

On Fri, 2015-07-10 at 21:36 +, Rose, Gregory V wrote:
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Friday, July 10, 2015 2:32 PM
> > To: intel-wired-...@lists.osuosl.org; Kirsher, Jeffrey T
> > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Rose, 
> > Gregory V
> > Subject: [PATCH] ixgbe: Remove bimodal SR-IOV disabling
> > 
> > When unbinding an SR-IOV device with VFs configured from ixgbe, the 
> > driver behaves in one of two ways.  If max_vfs was specified, the 
> > SR-IOV state is disabled, removing the VFs.  The occurs regardless 
> > of whether the VF count was later modified through sysfs.  If 
> > however max_vfs is zero, such as by not specifying the module 
> > parameter, the VFs persist after the PF is unbound from ixgbe.  If 
> > the PF is then bound to vfio-pci to be assigned to a VM, the PF is 
> > non-functional.
> > 
> > >From the comment, commit da36b64736cf ("ixgbe: Implement PCI SR-IOV
> > sysfs callback operation") clearly intended this alternate behavior, 
> > but probably didn't realize the PF doesn't work in this mode.
> > 
> > This bimodal behavior is confusing to users and results in a state 
> > where the PF is broken for other uses unless the user sets 
> > sriov_numvfs to zero prior to unbinding the device.  Remove this 
> > behavior so that VFs are removed and the PF is functional for other 
> > uses after unbind, regardless of the way VFs are enabled.
> > 
> > Signed-off-by: Alex Williamson 
> > Cc: Greg Rose 
> > Cc: Jeff Kirsher 
> > ---
> > 
> > I can only think that not disabling SR-IOV was meant to enable some 
> > sort of persistence for VFs, but that's probably better accomplished 
> > with either udev rules and/or modprobe.d install scripts.
> > 
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |7 +--
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index 5be12a0..de04e3e 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -8810,12 +8810,7 @@ static void ixgbe_remove(struct pci_dev *pdev)
> > unregister_netdev(netdev);
> > 
> >  #ifdef CONFIG_PCI_IOV
> > -   /*
> > -* Only disable SR-IOV on unload if the user specified the now
> > -* deprecated max_vfs module parameter.
> > -*/
> > -   if (max_vfs)
> > -   ixgbe_disable_sriov(adapter);
> > +   ixgbe_disable_sriov(adapter);
> >  #endif
> > ixgbe_clear_interrupt_scheme(adapter);
> > 
> 
> Please remove max_vfs module parameter - it is deprecated and should be 
> removed from upstream builds.  Dave let us get away with a kernel module a 
> few years ago because the other necessary infrastructure to enable SR-IOV 
> virtual functions via the PCIe interface was not available.  Now that it's 
> there it should be removed and vendors/end users should be forced to move 
> away from this.

I can't really say I'm in favor of removing that option.  It's probably going 
to break a lot of people because doing the udev rules right is hard.  The sysfs 
sriov interface has been tossed over the wall as the right way to do things, 
but there's really no infrastructure to facilitate even the simple peanut 
butter, everybody gets the same number of VFs, interface that max_vfs provides. 
 I think the existence of this bug is probably a good indication that the sysfs 
interface has not really been adopted yet.  Thanks,

Alex

___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Tested-By: Krishneil Singh 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Support ip route get via given table

2015-09-03 Thread David Ahern

On 9/3/15 4:40 PM, David Miller wrote:

From: David Ahern 
Date: Wed,  2 Sep 2015 12:03:12 -0700


Add support for 'ip [-6] route get table X' where the user wants to
force the FIB lookup from a given table.

Signed-off-by: David Ahern 


As Thomas mentioned, this adds cost to the FIB lookup fastpath
for a control-plane only feature, which is really undesirable.

I think I'll pass on this change for now, sorry.



Got it. I had other ideas on how to implement it. Figured I would start 
with the simplest.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: eth: altera: fix napi poll_list corruption

2015-09-03 Thread David Miller
 From: Atsushi Nemoto 
Date: Thu, 3 Sep 2015 15:01:02 +0900

> On Wed, 2 Sep 2015 22:32:54 -0700, David Miller  wrote:
>>> I think napi_gro_flush() can be called with irq enabled, so moving the
>>> spin_lock_irqsave() just before the __napi_complete() (or moving the
>>> __napi_complete() just after the spin_lock_irqsave()) would be better,
>>> right?
>> 
>> It should work, yes.
> 
> Thank you.  But I agree with Eric's last comment ("Calling
> napi_gro_flush() and __napi_complete() looks error prone."), and found
> that napi_complete_done() also checks NAPI_STATE_NPSVC to support
> NETPOLL.  These checks looks somewhat redundant but I like simple way
> unless it is really critical to performance.
> 
> So, please take original fix as is.

Fair enough, applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 1/4] Add correlated clocksource deriving system time from an auxiliary clocksource

2015-09-03 Thread Hall, Christopher S
> -Original Message-
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> Sent: Saturday, August 22, 2015 1:17 PM
> To: Hall, Christopher S
> Cc: Kirsher, Jeffrey T; h...@zytor.com; mi...@redhat.com;
> john.stu...@linaro.org; richardcoch...@gmail.com; x...@kernel.org; linux-
> ker...@vger.kernel.org; netdev@vger.kernel.org; intel-wired-
> l...@lists.osuosl.org; pet...@infradead.org
> Subject: Re: [PATCH v3 1/4] Add correlated clocksource deriving system time
> from an auxiliary clocksource
>  
> > +/**
> > + * get_correlated_timestamp - Get a correlated timestamp
> > + *
> > + * Reads a timestamp from a device and correlates it to system time
> > + */
> > +int get_correlated_timestamp(struct correlated_ts *crt,
> > +struct correlated_cs *crs)
> > +{
> > +   struct timekeeper *tk = _core.timekeeper;
> > +   unsigned long seq;
> > +   cycles_t cycles;
> > +   ktime_t base;
> > +   s64 nsecs;
> > +   int ret;
> > +
> > +   do {
> > +   seq = read_seqcount_begin(_core.seq);
> > +   /*
> > +* Verify that the correlated clocksoure is related to
> > +* the currently installed timekeeper clocksoure
> > +*/
> > +   if (tk->tkr_mono.clock != crs->related_cs)
> > +   return -ENODEV;
> > +
> > +   /*
> > +* Try to get a timestamp from the device.
> > +*/
> > +   ret = crt->get_ts(crt);
> > +   if (ret)
> > +   return ret;
> > +
[Re-added code for context]

In addition to the network interface, ART will be used in the audio interface 
as well.
We need to support the case where an audio co-processor will control the audio 
device.
In this case, the get_ts() function supplied by the audio driver will be very 
slow
(several milliseconds) and the result will be out of date by some fraction of 
that 
amount.

This loop makes strict requirements on the latency and recency. Is it possible 
to relax
that requirement in some way?

For example, supply the ART value as an argument and, in the case of the 
realtime
clock, keep a short history of clock changes.  It would fail in cases where 
there
are a lot of calls to adjtimex(), but it will would work most of the time.

What can you suggest? Thanks

Chris

> > +   } while (read_seqcount_retry(_core.seq, seq));
> > +   return 0;
> > +}


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


eBPF / seccomp globals?

2015-09-03 Thread Michael Tirado
Hiyall,

I have created a seccomp white list filter for a program that launches
other less trustworthy programs.  It's working great so far, but I
have run into a little roadblock.  the launcher program needs to call
execve as it's final step, but that may not be present in the white
list.  I am wondering if there is any way to use some sort of global
variable that will be preserved between syscall filter calls so that I
can allow only one execve, if not present in white list by
incrementing a counter variable.

I see that in Documentation/networking/filter.txt one of the registers
is documented as being a pointer to struct sk_buff, in the seccomp
context this is a pointer to struct seccomp_data  instead, right?  and
the line about callee saved registers R6-R9  probably refers to them
being saved across calls within that filter, and not calls between
filters?

My apologies if this is not the appropriate place to ask for help, but
it is difficult to find useful information on how eBPF works, and is a
bit confusing trying to figure out the differences between seccomp and
net filters, and the old bpf code kicking around short of spending
countless hours reading through all of it.  If anybody has a some
links to share I would be very grateful.  the only way I can think to
make this work otherwise is to mount everything as MS_NOEXEC in the
new namespace, but that just feels wrong.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] device property: Don't overwrite addr when failing in device_get_mac_address

2015-09-03 Thread Julien Grall
The function device_get_mac_address is trying different property names
in order to get the mac address. To check the return value, the variable
addr (which contain the buffer pass by the caller) will be re-used. This
means that if the previous property is not found, the next property will
be read using a NULL buffer.

Therefore it's only possible to retrieve the mac if node contains a
property "mac-address". Fix it by using a temporary buffer for the
return value.

This has been introduced by commit 4c96b7dc0d393f12c17e0d81db15aa4a820a6ab3
"Add a matching set of device_ functions for determining mac/phy"

Signed-off-by: Julien Grall 
Cc: Jeremy Linton 
Cc: David S. Miller 

---
Cc: Greg Kroah-Hartman 
Cc: netdev@vger.kernel.org
---
 drivers/base/property.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index ff03f23..2d75366 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -611,13 +611,15 @@ static void *device_get_mac_addr(struct device *dev,
 */
 void *device_get_mac_address(struct device *dev, char *addr, int alen)
 {
-   addr = device_get_mac_addr(dev, "mac-address", addr, alen);
-   if (addr)
-   return addr;
+   char *res;
 
-   addr = device_get_mac_addr(dev, "local-mac-address", addr, alen);
-   if (addr)
-   return addr;
+   res = device_get_mac_addr(dev, "mac-address", addr, alen);
+   if (res)
+   return res;
+
+   res = device_get_mac_addr(dev, "local-mac-address", addr, alen);
+   if (res)
+   return res;
 
return device_get_mac_addr(dev, "address", addr, alen);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread David Ahern

On 9/2/15 11:35 PM, David Miller wrote:


Another merge window, another set of networking changes.  I've heard
rumblings that the lightweight tunnels infrastructure has been voted
networking change of the year.  But what do I know?


...


9) Add support for "light weight tunnels", which allow for
encapsulation and decapsulation without bearing the overhead of a
full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
others.


Glad to see this feature (among others) hit Linus' tree. An oversight in 
the above is Roopa Prabhu who did the implementation for LWT. Roopa 
deserves direct mention versus part of the 'cast of others'.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: eBPF / seccomp globals?

2015-09-03 Thread Kees Cook
On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado  wrote:
> Hiyall,
>
> I have created a seccomp white list filter for a program that launches
> other less trustworthy programs.  It's working great so far, but I
> have run into a little roadblock.  the launcher program needs to call
> execve as it's final step, but that may not be present in the white
> list.  I am wondering if there is any way to use some sort of global
> variable that will be preserved between syscall filter calls so that I
> can allow only one execve, if not present in white list by
> incrementing a counter variable.
>
> I see that in Documentation/networking/filter.txt one of the registers
> is documented as being a pointer to struct sk_buff, in the seccomp
> context this is a pointer to struct seccomp_data  instead, right?  and
> the line about callee saved registers R6-R9  probably refers to them
> being saved across calls within that filter, and not calls between
> filters?
>
> My apologies if this is not the appropriate place to ask for help, but
> it is difficult to find useful information on how eBPF works, and is a
> bit confusing trying to figure out the differences between seccomp and
> net filters, and the old bpf code kicking around short of spending
> countless hours reading through all of it.  If anybody has a some
> links to share I would be very grateful.  the only way I can think to
> make this work otherwise is to mount everything as MS_NOEXEC in the
> new namespace, but that just feels wrong.

For documentation, there's some great slides on seccomp from Plumber's
this year[1].

At present, there is no variable state beyond the syscall context (PC,
args) available to seccomp filters. The no_new_privs prctl was added
to reduce the risk of including execve in a filter's whitelist, but
that isn't as strong as the "exec once" feature you want.

What we did in Chrome OS was to use the "minijail" tool[2] to
LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
a bit of a hack, but works in well-defined environments. You are
talking about namespaces, though, so maybe minijail is worth a look?
It does that too and a whole lot more.

As for using maps via eBPF in seccomp, it's on the horizon, but it
comes with a lot exposure that I haven't finished pondering, so I
don't think those features will be added soon.

-Kees

[1] 
http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
[2] see subdirectory "minijail" after "git clone
https://chromium.googlesource.com/chromiumos/platform2/;


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fixed_phy: pass 'irq' to fixed_phy_add()

2015-09-03 Thread Florian Fainelli
On 03/09/15 13:22, Sergei Shtylyov wrote:
> I've noticed  that fixed_phy_register() ignores its 'irq' parameter instead of
> passing it to fixed_phy_add(). Luckily, fixed_phy_register()  seems to  always
> be  called with PHY_POLL  for 'irq'... :-)

So not critical for -stable, good!

> 
> Fixes: a75951217472 ("net: phy: extend fixed driver with 
> fixed_phy_register()")
> Signed-off-by: Sergei Shtylyov 

Acked-by: Florian Fainelli 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] RDS: rds_conn_lookup() should factor in the struct net for a match

2015-09-03 Thread santosh.shilim...@oracle.com

On 9/3/15 1:24 PM, Sowmini Varadhan wrote:


Only return a conn if the rds_conn_net(conn) matches the struct
net passed to rds_conn_lookup().

Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints,
one per netns.")

Signed-off-by: Sowmini Varadhan 
---


Acked-by: Santosh Shilimkar 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-03 Thread Marcel Holtmann
Hi Linus,

>> [-Wsizeof-array-argument]
> 
> Ahh. Google shows that it's an old clang warning that gcc has recently
> picked up.
> 
> But even clang doesn't seem to have any way for a project to say
> "please warn about arrays in function argument declaration". It *is*
> very traditional idiomatic C, it's just that I personally think it's
> one of those bad traditional C things exactly because it's so
> misleading about what actually goes on. But I guess that in practice,
> the only thing that it actually *affects* is "sizeof" (and assignment
> to the variable name - something that would be invalid for a real
> array, but works on argument arrays because they are really just
> pointers).
> 
> The "array as function argument" syntax is occasionally useful
> (particularly for the multi-dimensional array case), so I very much
> understand why it exists, I just think that in the kernel we'd be
> better off with the rule that it's against our coding practices.

I find them useful as syntactic sugar. We have not used them a lot, but there 
are cases in our crypto handling code where we have fixed size array 
inputs/outputs and there we opted to use them. They make it easy to remember 
what the expected sizes of input and output are without having to read through 
the implementation (of course we never even tried to use sizeof on these 
pointers).

static int smp_ah(struct crypto_blkcipher *tfm, const u8 irk[16],   
 
  const u8 r[3], u8 res[3])

This is one of the simple crypto hashing for privacy keys we have.

r' = padding || r
ah(h, r) = e(k, r') mod 2^24

We are fully aware that const u8 r[3] is const u8 *r. As I said, it is 
syntactic sugar for us and nothing more.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 xfrm GSO fragmentation bug

2015-09-03 Thread Herbert Xu
On Mon, Aug 31, 2015 at 03:35:26PM +0800, Herbert Xu wrote:
> 
> I see where the bug came from.  Indeed IPv6 does do fragmentation
> but only for tunnel mode.  While your patch added a check that also
> affected transport mode.  So in addition to the GSO fix we should
> also make the MTU check conditional to tunnel mode.

Here is the patch:

---8<---
ipv6: Fix IPsec pre-encap fragmentation check

The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
packets.  That is, we perform fragmentation pre-encap rather than
post-encap.

A check was added later to ensure that proper MTU information is
passed back for locally generated traffic.  Unfortunately this
check was performed on all IPsec packets, including transport-mode
packets.

What's more, the check failed to take GSO into account.

The end result is that transport-mode GSO packets get dropped at
the check.

This patch fixes it by moving the tunnel mode check forward as well
as adding the GSO check.

Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
Signed-off-by: Herbert Xu 

diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 09c76a7..be033f2 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -136,6 +136,7 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff 
*skb)
struct dst_entry *dst = skb_dst(skb);
struct xfrm_state *x = dst->xfrm;
int mtu;
+   bool toobig;
 
 #ifdef CONFIG_NETFILTER
if (!x) {
@@ -144,25 +145,29 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff 
*skb)
}
 #endif
 
+   if (x->props.mode != XFRM_MODE_TUNNEL)
+   goto skip_frag;
+
if (skb->protocol == htons(ETH_P_IPV6))
mtu = ip6_skb_dst_mtu(skb);
else
mtu = dst_mtu(skb_dst(skb));
 
-   if (skb->len > mtu && xfrm6_local_dontfrag(skb)) {
+   toobig = skb->len > mtu && !skb_is_gso(skb);
+
+   if (toobig && xfrm6_local_dontfrag(skb)) {
xfrm6_local_rxpmtu(skb, mtu);
return -EMSGSIZE;
-   } else if (!skb->ignore_df && skb->len > mtu && skb->sk) {
+   } else if (!skb->ignore_df && toobig && skb->sk) {
xfrm_local_error(skb, mtu);
return -EMSGSIZE;
}
 
-   if (x->props.mode == XFRM_MODE_TUNNEL &&
-   ((skb->len > mtu && !skb_is_gso(skb)) ||
-   dst_allfrag(skb_dst(skb {
+   if (toobig || dst_allfrag(skb_dst(skb)))
return ip6_fragment(sk, skb,
x->outer_mode->afinfo->output_finish);
-   }
+
+skip_frag:
return x->outer_mode->afinfo->output_finish(sk, skb);
 }
 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >