date:20180504

Re: [PATCH net-next v2] net: core: rework basic flow dissection helper

2018-05-04 Thread kbuild test robot

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-core-rework-basic-flow-dissection-helper/20180505-090417
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/net/tipc.h:55:30: sparse: incorrect type in return expression 
(different base types) @@expected unsigned int @@got restriunsigned int 
@@
   include/net/tipc.h:55:30:expected unsigned int
   include/net/tipc.h:55:30:got restricted __be32 
   net/core/flow_dissector.c:840:48: sparse: incorrect type in assignment 
(different base types) @@expected restricted __be32 [usertype] key @@
got  [usertype] key @@
   net/core/flow_dissector.c:840:48:expected restricted __be32 [usertype] 
key
   net/core/flow_dissector.c:840:48:got unsigned int
   net/core/flow_dissector.c:1035:30: sparse: expression using sizeof(void)
   net/core/flow_dissector.c:1035:30: sparse: expression using sizeof(void)
   net/core/flow_dissector.c:1276:25: sparse: expression using sizeof(void)
>> net/core/flow_dissector.c:1319:59: sparse: Using plain integer as NULL 
>> pointer

vim +1319 net/core/flow_dissector.c

  1305  
  1306  /**
  1307   * skb_get_poff - get the offset to the payload
  1308   * @skb: sk_buff to get the payload offset from
  1309   *
  1310   * The function will get the offset to the payload as far as it could
  1311   * be dissected.  The main user is currently BPF, so that we can 
dynamically
  1312   * truncate packets without needing to push actual payload to the user
  1313   * space and can analyze headers only, instead.
  1314   */
  1315  u32 skb_get_poff(const struct sk_buff *skb)
  1316  {
  1317  struct flow_keys_basic keys;
  1318  
> 1319  if (!skb_flow_dissect_flow_keys_basic(skb, , 0, 0, 0, 0, 
> 0))
  1320  return 0;
  1321  
  1322  return __skb_get_poff(skb, skb->data, , skb_headlen(skb));
  1323  }
  1324  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH iproute2] rdma: fix header files

2018-05-04 Thread Stephen Hemminger

On Fri, 4 May 2018 16:13:07 -0600
David Ahern  wrote:

> On 5/4/18 3:56 PM, Stephen Hemminger wrote:
> > All user api headers in iproute2 should be in include/uapi
> > so that script can be used to put correct sanitized kernel headers
> > there. And the header files for rdma must be a complete set; if one
> > header file includes another, all must be present.
> > 
> > This fixes build on older distributions, and Windows Services
> > for Linux.
> > 
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  include/uapi/rdma/ib_user_sa.h|   77 ++
> >  include/uapi/rdma/ib_user_verbs.h | 1210 +
> >  .../uapi/rdma/rdma_netlink.h  |   13 +
> >  .../uapi/rdma/rdma_user_cm.h  |6 +-
> >  4 files changed, 1303 insertions(+), 3 deletions(-)
> >  create mode 100644 include/uapi/rdma/ib_user_sa.h
> >  create mode 100644 include/uapi/rdma/ib_user_verbs.h
> >  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
> >  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> >   
> 
> Stephen:
> 
> Per a recent discussion the RDMA folks need to take ownership of the
> uapi files. RDMA features do not hit Dave's net-next tree so the rdma
> code can never hit iproute2-next during a dev cycle.

I want all uapi headers in include/uapi because it avoids possible overlap 
problems,
During the linux-net/linus release cycle they should match what is Linus's tree.

During the net-next they can come from two sources.

Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper

2018-05-04 Thread Jann Horn

On Thu, May 3, 2018 at 12:36 AM, Alexei Starovoitov  wrote:
> Introduce helper:
> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> struct umh_info {
>struct file *pipe_to_umh;
>struct file *pipe_from_umh;
>pid_t pid;
> };
>
> that GPLed kernel modules (signed or unsigned) can use it to execute part
> of its own data as swappable user mode process.
>
> The kernel will do:
> - mount "tmpfs"
> - allocate a unique file in tmpfs
> - populate that file with [data, data + len] bytes
> - user-mode-helper code will do_execve that file and, before the process
>   starts, the kernel will create two unix pipes for bidirectional
>   communication between kernel module and umh
> - close tmpfs file, effectively deleting it
> - the fork_usermode_blob will return zero on success and populate
>   'struct umh_info' with two unix pipes and the pid of the user process
>
> As the first step in the development of the bpfilter project
> the fork_usermode_blob() helper is introduced to allow user mode code
> to be invoked from a kernel module. The idea is that user mode code plus
> normal kernel module code are built as part of the kernel build
> and installed as traditional kernel module into distro specified location,
> such that from a distribution point of view, there is
> no difference between regular kernel modules and kernel modules + umh code.
> Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> by a kernel module doesn't make it any special from kernel and user space
> tooling point of view.
[...]
> +static struct vfsmount *umh_fs;
> +
> +static int init_tmpfs(void)
> +{
> +   struct file_system_type *type;
> +
> +   if (umh_fs)
> +   return 0;
> +   type = get_fs_type("tmpfs");
> +   if (!type)
> +   return -ENODEV;
> +   umh_fs = kern_mount(type);
> +   if (IS_ERR(umh_fs)) {
> +   int err = PTR_ERR(umh_fs);
> +
> +   put_filesystem(type);
> +   umh_fs = NULL;
> +   return err;
> +   }
> +   return 0;
> +}

Should init_tmpfs() be holding some sort of mutex if it's fiddling
with `umh_fs`? The current code only calls it in initcall context, but
if that ever changes and two processes try to initialize the tmpfs at
the same time, a few things could go wrong.
I guess Luis' suggestion (putting a call to init_tmpfs() in
do_basic_setup()) might be the easiest way to get rid of that problem.

> +static int alloc_tmpfs_file(size_t size, struct file **filp)
> +{
> +   struct file *file;
> +   int err;
> +
> +   err = init_tmpfs();
> +   if (err)
> +   return err;
> +   file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> +   if (IS_ERR(file))
> +   return PTR_ERR(file);
> +   *filp = file;
> +   return 0;
> +}

[PATCH] net/9p: correct the variable name in v9fs_get_trans_by_name() comment

2018-05-04 Thread Sun Lianwen

The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s".

Signed-off-by: Sun Lianwen 
---
 net/9p/mod.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/9p/mod.c b/net/9p/mod.c
index 6ab36aea7727..eb9777f05755 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -104,7 +104,7 @@ EXPORT_SYMBOL(v9fs_unregister_trans);
 
 /**
  * v9fs_get_trans_by_name - get transport with the matching name
- * @name: string identifying transport
+ * @s: string identifying transport
  *
  */
 struct p9_trans_module *v9fs_get_trans_by_name(char *s)
-- 
2.17.0

Re: [v2 PATCH 1/1] tg3: fix meaningless hw_stats reading after tg3_halt memset 0 hw_stats

2018-05-04 Thread Zumeng Chen


On 05/03/2018 01:04 PM, Michael Chan wrote:

On Wed, May 2, 2018 at 5:30 PM, Zumeng Chen  wrote:

On 2018年05月03日 01:32, Michael Chan wrote:

On Wed, May 2, 2018 at 3:27 AM, Zumeng Chen  wrote:

On 2018年05月02日 13:12, Michael Chan wrote:

On Tue, May 1, 2018 at 5:42 PM, Zumeng Chen 
wrote:


diff --git a/drivers/net/ethernet/broadcom/tg3.h
b/drivers/net/ethernet/broadcom/tg3.h
index 3b5e98e..c61d83c 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3102,6 +3102,7 @@ enum TG3_FLAGS {
   TG3_FLAG_ROBOSWITCH,
   TG3_FLAG_ONE_DMA_AT_ONCE,
   TG3_FLAG_RGMII_MODE,
+   TG3_FLAG_HALT,

I think you should be able to use the existing INIT_COMPLETE flag


No,  it will bring the uncertain factors into the existed complicate
logic
of INIT_COMPLETE.
And I think it's very simple logic here to fix the meaningless hw_stats
reading and the problem
of commit f5992b72. I even suspect if you have read INIT_COMPLETE related
codes carefully.


We should use an existing flag whenever appropriate


I disagree. This is sort of blahblah...

I don't want to see another flag added that is practically the same as
!INIT_COMPLETE.  The driver already has close to one hundred flags.
Adding a new flag that is similar to an existing flag will just make
the code more difficult to understand and maintain.

If you don't want to fix it the cleaner way, Siva or I will fix it.


Feel free to go, I just take a double look, INIT_COMPLETE can directly 
be used as follows:


diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c

index 08bbb63..0e04fd7 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8733,14 +8733,11 @@ static void tg3_free_consistent(struct tg3 *tp)
tg3_mem_rx_release(tp);
tg3_mem_tx_release(tp);

-   /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */
-   tg3_full_lock(tp, 0);
if (tp->hw_stats) {
dma_free_coherent(>pdev->dev, sizeof(struct 
tg3_hw_stats),

  tp->hw_stats, tp->stats_mapping);
tp->hw_stats = NULL;
}
-   tg3_full_unlock(tp);
 }

 /*
@@ -14178,7 +14175,7 @@ static void tg3_get_stats64(struct net_device *dev,
struct tg3 *tp = netdev_priv(dev);

spin_lock_bh(>lock);
-   if (!tp->hw_stats) {
+   if (!tp->hw_stats || tg3_flag(tp, INIT_COMPLETE)) {
*stats = tp->net_stats_prev;
spin_unlock_bh(>lock);
return;

Cheers,
Zumeng

Re: [PATCH V3] net/netlink: make sure the headers line up actual value output

2018-05-04 Thread Bo YU

Hi,
On Fri, May 04, 2018 at 01:02:23PM -0400, David Miller wrote:

From: YU Bo 
Date: Thu, 3 May 2018 23:09:23 -0400

Making sure the headers line up properly with the actual value output
of the command
`cat /proc/net/netlink`

Before the patch:

sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
33203952 0 897 0113 0 0 0 2 0 14906

Signed-off-by: Bo YU 

Applied, but why did you send this V3 to the list two times?

Thanks a lot. When sent the email,i encounter networking issue and not sure
send to the list.Sorry for making noise.

Thank you.

Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper

2018-05-04 Thread Alexei Starovoitov

On Fri, May 04, 2018 at 07:56:43PM +, Luis R. Rodriguez wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

sure.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> > Introduce helper:
> > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > struct umh_info {
> >struct file *pipe_to_umh;
> >struct file *pipe_from_umh;
> >pid_t pid;
> > };
> > 
> > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > of its own data as swappable user mode process.
> > 
> > The kernel will do:
> > - mount "tmpfs"
> 
> Actually its a *shared* vfsmount tmpfs for all umh blobs.

yep

> > - allocate a unique file in tmpfs
> > - populate that file with [data, data + len] bytes
> > - user-mode-helper code will do_execve that file and, before the process
> >   starts, the kernel will create two unix pipes for bidirectional
> >   communication between kernel module and umh
> > - close tmpfs file, effectively deleting it
> > - the fork_usermode_blob will return zero on success and populate
> >   'struct umh_info' with two unix pipes and the pid of the user process
> 
> But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
> inception point was intended, well though out, and will run, but the return
> value in no way reflects the success or not of the execution. More below.

yep

> > As the first step in the development of the bpfilter project
> > the fork_usermode_blob() helper is introduced to allow user mode code
> > to be invoked from a kernel module. The idea is that user mode code plus
> > normal kernel module code are built as part of the kernel build
> > and installed as traditional kernel module into distro specified location,
> > such that from a distribution point of view, there is
> > no difference between regular kernel modules and kernel modules + umh code.
> > Such modules can be signed, modprobed, rmmod, etc. The use of this new 
> > helper
> > by a kernel module doesn't make it any special from kernel and user space
> > tooling point of view.
> > 
> > Such approach enables kernel to delegate functionality traditionally done
> > by the kernel modules into the user space processes (either root or !root) 
> > and
> > reduces security attack surface of the new code. The buggy umh code would 
> > crash
> > the user process, but not the kernel. Another advantage is that umh code
> > of the kernel module can be debugged and tested out of user space
> > (e.g. opening the possibility to run clang sanitizers, fuzzers or
> > user space test suites on the umh code).
> > In case of the bpfilter project such architecture allows complex control 
> > plane
> > to be done in the user space while bpf based data plane stays in the kernel.
> > 
> > Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> > the kernel module that uses them (like bpfilter) needs to manage life
> > time of umh on its own via two unix pipes and the pid of umh.
> > 
> > The exit code of such kernel module should kill the umh it started,
> > so that rmmod of the kernel module will cleanup the corresponding umh.
> > Just like if the kernel module does kmalloc() it should kfree() it in the 
> > exit code.
> > 
> > Signed-off-by: Alexei Starovoitov 
> > ---
> >  fs/exec.c   |  38 ---
> >  include/linux/binfmts.h |   1 +
> >  include/linux/umh.h |  12 
> >  kernel/umh.c| 176 
> > +++-
> >  4 files changed, 215 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 183059c427b9..30a36c2a39bf 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
> >  /*
> >   * sys_execve() executes a new program.
> >   */
> > -static int do_execveat_common(int fd, struct filename *filename,
> > - struct user_arg_ptr argv,
> > - struct user_arg_ptr envp,
> > - int flags)
> > +static int __do_execve_file(int fd, struct filename *filename,
> > +   struct user_arg_ptr argv,
> > +   struct user_arg_ptr envp,
> > +   int flags, struct file *file)
> >  {
> > char *pathbuf = NULL;
> > struct linux_binprm *bprm;
> > -   struct file *file;
> > struct files_struct *displaced;
> > int retval;
> 
> Keeping in mind a fuzzer...
> 
> Note, right below this, and not shown here in the hunk, is:
> 
> if (IS_ERR(filename)) 
>   
> return PTR_ERR(filename)
> >  
> > @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename 
> > *filename,
> > check_unsafe_exec(bprm);
> > current->in_execve = 1;
> >  
> > -   file = do_open_execat(fd, filename,

Re: [PATCH v2 net-next] net: stmmac: Add support for U32 TC filter using Flexible RX Parser

2018-05-04 Thread Jakub Kicinski

On Fri,  4 May 2018 10:01:38 +0100, Jose Abreu wrote:
> This adds support for U32 filter by using an HW only feature called
> Flexible RX Parser. This allow us to match any given packet field with a
> pattern and accept/reject or even route the packet to a specific DMA
> channel.
> 
> Right now we only support acception or rejection of frame and we only
> support simple rules. Though, the Parser has the flexibility of jumping to
> specific rules as an if condition so complex rules can be established.
> 
> This is only supported in GMAC5.10+.
> 
> The following commands can be used to test this code:
> 
>   1) Setup an ingress qdisk:
>   # tc qdisc add dev eth0 handle : ingress
> 
>   2) Setup a filter (e.g. filter by IP):
>   # tc filter add dev eth0 parent : protocol ip u32 match ip \
>   src 192.168.0.3 skip_sw action drop
> 
> In every tests performed we always used the "skip_sw" flag to make sure
> only the RX Parser was involved.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Vitor Soares 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> Cc: Jakub Kicinski 
> ---
> Changes from v1:
>   - Follow Linux network coding style (David)
>   - Use tc_cls_can_offload_and_chain0() (Jakub)

Thanks!

> @@ -4223,6 +4277,11 @@ int stmmac_dvr_probe(struct device *device,
>   ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
>   NETIF_F_RXCSUM;
>  
> + ret = stmmac_tc_init(priv, priv);
> + if (!ret) {
> + ndev->hw_features |= NETIF_F_HW_TC;
> + }
> +
>   if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
>   ndev->hw_features |= NETIF_F_TSO | NETIF_F_TSO6;
>   priv->tso = true;

One more comment, but perhaps not a showstopper, it's considered good
practice to disallow clearing/disabling this flag while filters are
installed.  Driver should return -EBUSY from .ndo_set_features if TC
rules are offloaded and user wants to disable HW_TC feature flag.

[PATCH 24/24] selftests: net: return Kselftest Skip code for skipped tests

2018-05-04 Thread Shuah Khan (Samsung OSG)

When net test is skipped because of unmet dependencies and/or unsupported
configuration, it returns 0 which is treated as a pass by the Kselftest
framework. This leads to false positive result even when the test could
not be run.

Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.

Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.

Change psock_tpacket to use ksft_exit_skip() when a non-root user runs
the test and add an explicit check for root and a clear message, instead
of failing the test when /sys/power/state file open fails.

Signed-off-by: Shuah Khan (Samsung OSG) 
---
 tools/testing/selftests/net/fib_tests.sh|  8 +---
 tools/testing/selftests/net/netdevice.sh| 16 +--
 tools/testing/selftests/net/pmtu.sh |  5 -
 tools/testing/selftests/net/psock_tpacket.c |  4 +++-
 tools/testing/selftests/net/rtnetlink.sh| 31 -
 5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh 
b/tools/testing/selftests/net/fib_tests.sh
index 9164e60d4b66..5baac82b9287 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -5,6 +5,8 @@
 # different events.
 
 ret=0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
 
 VERBOSE=${VERBOSE:=0}
 PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
@@ -579,18 +581,18 @@ fib_test()
 
 if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
-   exit 0
+   exit $ksft_skip;
 fi
 
 if [ ! -x "$(command -v ip)" ]; then
echo "SKIP: Could not run test without ip tool"
-   exit 0
+   exit $ksft_skip
 fi
 
 ip route help 2>&1 | grep -q fibmatch
 if [ $? -ne 0 ]; then
echo "SKIP: iproute2 too old, missing fibmatch"
-   exit 0
+   exit $ksft_skip
 fi
 
 # start clean
diff --git a/tools/testing/selftests/net/netdevice.sh 
b/tools/testing/selftests/net/netdevice.sh
index 903679e0ff31..e3afcb424710 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -8,6 +8,9 @@
 # if not they probably have failed earlier in the boot process and their 
logged error will be catched by another test
 #
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 # this function will try to up the interface
 # if already up, nothing done
 # arg1: network interface name
@@ -18,7 +21,7 @@ kci_net_start()
ip link show "$netdev" |grep -q UP
if [ $? -eq 0 ];then
echo "SKIP: $netdev: interface already up"
-   return 0
+   return $ksft_skip
fi
 
ip link set "$netdev" up
@@ -61,12 +64,12 @@ kci_net_setup()
ip address show "$netdev" |grep '^[[:space:]]*inet'
if [ $? -eq 0 ];then
echo "SKIP: $netdev: already have an IP"
-   return 0
+   return $ksft_skip
fi
 
# TODO what ipaddr to set ? DHCP ?
echo "SKIP: $netdev: set IP address"
-   return 0
+   return $ksft_skip
 }
 
 # test an ethtool command
@@ -84,6 +87,7 @@ kci_netdev_ethtool_test()
if [ $ret -ne 0 ];then
if [ $ret -eq "$1" ];then
echo "SKIP: $netdev: ethtool $2 not supported"
+   return $ksft_skip
else
echo "FAIL: $netdev: ethtool $2"
return 1
@@ -104,7 +108,7 @@ kci_netdev_ethtool()
ethtool --version 2>/dev/null >/dev/null
if [ $? -ne 0 ];then
echo "SKIP: ethtool not present"
-   return 1
+   return $ksft_skip
fi
 
TMP_ETHTOOL_FEATURES="$(mktemp)"
@@ -176,13 +180,13 @@ kci_test_netdev()
 #check for needed privileges
 if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
-   exit 0
+   exit $ksft_skip
 fi
 
 ip link show 2>/dev/null >/dev/null
 if [ $? -ne 0 ];then
echo "SKIP: Could not run test without the ip tool"
-   exit 0
+   exit $ksft_skip
 fi
 
 TMP_LIST_NETDEV="$(mktemp)"
diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index 1e428781a625..7514f93e1624 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -43,6 +43,9 @@
 #  that MTU is properly calculated instead when MTU is not configured from
 #  userspace
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 tests="
pmtu_vti6_exception vti6: PMTU exceptions
pmtu_vti4_exception vti4: PMTU exceptions
@@ -162,7 +165,7 @@ setup_xfrm6() {
 }
 
 setup() {
-   [ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return 1
+   [ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return 
$ksft_skip
 
cleanup_done=0
for arg

[PATCH net-next] vlan: correct the file path in vlan_dev_change_flags() comment

2018-05-04 Thread Sun Lianwen

The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file.
not in include/linux/if_vlan.h file.

Signed-off-by: Sun Lianwen 
---
 net/8021q/vlan_dev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 236452ebbd9e..546af0e73ac3 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -215,7 +215,9 @@ int vlan_dev_set_egress_priority(const struct net_device 
*dev,
return 0;
 }
 
-/* Flags are defined in the vlan_flags enum in include/linux/if_vlan.h file. */
+/* Flags are defined in the vlan_flags enum in
+ * include/uapi/linux/if_vlan.h file.
+ */
 int vlan_dev_change_flags(const struct net_device *dev, u32 flags, u32 mask)
 {
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
-- 
2.17.0

Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module

2018-05-04 Thread Alexei Starovoitov

On Thu, May 03, 2018 at 03:23:55PM +0100, Edward Cree wrote:
> On 03/05/18 05:36, Alexei Starovoitov wrote:
> > bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
> > and user mode helper code that is embedded into bpfilter.ko
> >
> > The steps to build bpfilter.ko are the following:
> > - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
> > - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
> >   is converted into bpfilter_umh.o object file
> >   with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
> >   Example:
> >   $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
> >   4cf8 T _binary_net_bpfilter_bpfilter_umh_end
> >   4cf8 A _binary_net_bpfilter_bpfilter_umh_size
> >    T _binary_net_bpfilter_bpfilter_umh_start
> > - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
> >
> > bpfilter_kern.c is a normal kernel module code that calls
> > the fork_usermode_blob() helper to execute part of its own data
> > as a user mode process.
> >
> > Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> > is placed into .init.rodata section, so it's freed as soon as __init
> > function of bpfilter.ko is finished.
> > As part of __init the bpfilter.ko does first request/reply action
> > via two unix pipe provided by fork_usermode_blob() helper to
> > make sure that umh is healthy. If not it will kill it via pid.
> >
> > Later bpfilter_process_sockopt() will be called from bpfilter hooks
> > in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
> >
> > If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
> > kill umh as well.
> >
> > Signed-off-by: Alexei Starovoitov 
...
> > +static void stop_umh(void)
> > +{
> > +   if (bpfilter_process_sockopt) {
> I worry about locking here.  Is it possible for two calls to
>  bpfilter_process_sockopt() to run in parallel, both fail, and thus both
>  call stop_umh()?  And if both end up calling shutdown_umh(), we double
>  fput().

I thought iptables sockopt is serialized earlier. Nope.
We need to grab the mutex to access these pipes.
Will fix.

Thanks for spelling nits. Will fix as well.

Re: [PATCH iproute2-next] bpf: don't offload perf array maps

2018-05-04 Thread Daniel Borkmann

On 05/05/2018 02:37 AM, Jakub Kicinski wrote:
> Perf arrays are handled specially by the kernel, don't request
> offload even when used by an offloaded program.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 

Acked-by: Daniel Borkmann

[PATCH iproute2-next] bpf: don't offload perf array maps

2018-05-04 Thread Jakub Kicinski

Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 lib/bpf.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index d9a406bf55f2..4e26c0df76c5 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -97,6 +97,11 @@ static const struct bpf_prog_meta __bpf_prog_meta[] = {
},
 };
 
+static bool bpf_map_offload_neutral(enum bpf_map_type type)
+{
+   return type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
+}
+
 static const char *bpf_prog_to_subdir(enum bpf_prog_type type)
 {
assert(type < ARRAY_SIZE(__bpf_prog_meta) &&
@@ -1594,7 +1599,7 @@ static int bpf_map_attach(const char *name, struct 
bpf_elf_ctx *ctx,
  const struct bpf_elf_map *map, struct bpf_map_ext 
*ext,
  int *have_map_in_map)
 {
-   int fd, ret, map_inner_fd = 0;
+   int fd, ifindex, ret, map_inner_fd = 0;
 
fd = bpf_probe_pinned(name, ctx, map->pinning);
if (fd > 0) {
@@ -1631,10 +1636,10 @@ static int bpf_map_attach(const char *name, struct 
bpf_elf_ctx *ctx,
}
}
 
+   ifindex = bpf_map_offload_neutral(map->type) ? 0 : ctx->ifindex;
errno = 0;
fd = bpf_map_create(map->type, map->size_key, map->size_value,
-   map->max_elem, map->flags, map_inner_fd,
-   ctx->ifindex);
+   map->max_elem, map->flags, map_inner_fd, ifindex);
 
if (fd < 0 || ctx->verbose) {
bpf_map_report(fd, name, map, ctx, map_inner_fd);
-- 
2.17.0

Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support

2018-05-04 Thread Alexei Starovoitov

On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote:
> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov
>  wrote:
> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
> >> On 05/02/2018 01:01 PM, Björn Töpel wrote:
> >> > From: Björn Töpel 
> >> >
> >> > This patch set introduces a new address family called AF_XDP that is
> >> > optimized for high performance packet processing and, in upcoming
> >> > patch sets, zero-copy semantics. In this patch set, we have removed
> >> > all zero-copy related code in order to make it smaller, simpler and
> >> > hopefully more review friendly. This patch set only supports copy-mode
> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and
> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his
> >> > work has already been accepted. We will publish our zero-copy support
> >> > for RX and TX on top of his patch sets at a later point in time.
> >>
> >> +1, would be great to see it land this cycle. Saw few minor nits here
> >> and there but nothing to hold it up, for the series:
> >>
> >> Acked-by: Daniel Borkmann 
> >>
> >> Thanks everyone!
> >
> > Great stuff!
> >
> > Applied to bpf-next, with one condition.
> > Upcoming zero-copy patches for both RX and TX need to be posted
> > and reviewed within this release window.
> > If netdev community as a whole won't be able to agree on the zero-copy
> > bits we'd need to revert this feature before the next merge window.
> 
> Thanks everyone for reviewing this. Highly appreciated.
> 
> Just so we understand the purpose correctly:
> 
> 1: Do you want to see the ZC patches in order to verify that the user
> space API holds? If so, we can produce an additional RFC  patch set
> using a big chunk of code that we had in RFC V1. We are not proud of
> this code since it is clunky, but it hopefully proves the point with
> the uapi being the same.
> 
> 2: And/Or are you worried about us all (the netdev community) not
> agreeing on a way to implement ZC internally in the drivers and the
> XDP infrastructure? This is not going to be possible to finish during
> this cycle since we do not like the implementation we had in RFC V1.
> Too intrusive and now we also have nicer abstractions from Jesper that
> we can use and extend to provide a (hopefully) much cleaner and less
> intrusive solution.

short answer: both.

Cleanliness and performance of the ZC code is not as important as
getting API right. The main concern that during ZC review process
we will find out that existing API has issues, so we have to
do this exercise before the merge window.
And RFC won't fly. Send the patches for real. They have to go
through the proper code review. The hackers of netdev community
can accept a partial, or a bit unclean, or slightly inefficient
implementation, since it can be and will be improved later,
but API we cannot change once it goes into official release.

Here is the example of API concern:
this patch set added shared umem concept. It sounds good in theory,
but will it perform well with ZC ? Earlier RFCs didn't have that
feature. If it won't perform well than it shouldn't be in the tree.
The key reason to let AF_XDP into the tree is its performance promise.
If it doesn't perform we should rip it out and redesign.

pull-request: bpf-next 2018-05-05

2018-05-04 Thread Daniel Borkmann

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add initial infrastructure for AF_XDP sockets, which is optimized
   for high performance packet processing. This early work only adds
   copy-mode, and zero-copy semantics with driver changes will land in
   subsequent patches. An AF_XDP socket has RX and/or TX queue associated
   to it for receiving and sending packets. In contrast to AF_PACKET v2/3
   descriptor queues are separated from packet buffers such that a RX or
   TX descriptor points to a data buffer in a memory area called UMEM.
   Latter can be shared so that packets don't need to be copied between
   RX and TX. A XDP BPF program will steer the packets to one of the
   AF_XDP sockets via a new BPF map called XSKMAP, from Björn and Magnus.

2) Add nfp BPF offload support for bpf_event_output() helper. Having
   the driver reimplement and manage the perf array itself seems fragile
   and unnecessary, therefore approach taken is that FW messages that
   carry the events are pushed out to the RB. Additionally bpftool gets
   support to connect to a perf event map and dump ring buffer contents,
   useful for debugging purposes, from Jakub.

3) Add a new eBPF JIT for x86_32. Like in arm32 case, 64 bit div/mod
   and xadd is still missing as well as BPF to BPF calls but other than
   that it's functional and numbers show 30% to 50% improvement compared
   to interpreter, from Wang.

4) Implement a new BPF helper bpf_get_stack() to overcome limitations
   of stackmap and bpf_get_stackid() helper. bpf_get_stack() allows
   to send stack traces directly to the BPF program which can perform
   in-kernel processing and push them out via bpf_perf_event_output(),
   from Yonghong.

5) Remove LD_ABS and LD_IND as native eBPF instructions and implement
   them as rewrites. This significantly reduces complexity from JITs
   while keeping similar performance characteristics, and allows to
   better evolve JITs long term by having them all in C only, from Daniel.

6) Improve the code logic related to managing subprog information by
   unifying main prog and subprogs, unifying entry points and stack
   depth tracking into struct bpf_subprog_info, and adding end marker
   into subprog_info array to simplify iteration logic, from Jiong.

7) Remove tracepoints from BPF core as they started to rot away,
   causing panics triggered from syzkaller. Earlier ones from BPF
   fs got already removed, so follow-up with rest since we also have
   better introspection infrastructure these days, from Alexei.

8) Relax the bpf_current_task_under_cgroup() helper to allow usage in
   interrupt which is particularly useful for BPF programs attached
   to perf events, from Teng.

9) Formatting fixes in the new BPF uapi helper documentation for
   bpf_perf_event_read() and bpf_get_stack() and relaxing whitespace
   constraints in bpf_helpers_doc.py to ease documentation, from Quentin.

10) Dump the bpftool 'loaded at:' information in ISO 8601 format in
the plain variant and seconds since the Epoch in JSON to ease parsing,
also from Quentin.

11) Various cleanups mostly around coding and comment style, and several
capitalization, typo and grammar fixups in comments for the x64 BPF
JIT, from Ingo.

12) Fix up BPF context struct types in uapi BPF helper documentation
where some of them were mistakenly using kernel types, from Andrey.

13) Document that under CONFIG_BPF_JIT_ALWAYS_ON mode the bpf_jit_enable
mode 2 is not available, from Leo.

14) Import erspan uapi header file into tools infra so that BPF tunnel
helpers can use it and won't cause issues due to missing headers on
some systems, from William.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

This has a minor merge conflict in tools/testing/selftests/bpf/test_progs.c.
Resolution is to take the hunk from bpf-next tree and change the first CHECK()
condition such that the missing '\n' is added to the end of the string, like:

if (CHECK(build_id_matches < 1, "build id match",
  "Didn't find expected build ID from the map\n"))
goto disable_pmu;

Let me know if you run into any other unforeseen issue. Thanks a lot!



The following changes since commit 79741a38b4a2538a68342c45b813ecb9dd648ee8:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (2018-04-26 
21:19:50 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to e94fa1d93117e7f1eb783dc9cae6c7065099:

  bpf, xskmap: fix crash in xsk_map_alloc error path handling (2018-05-04 
14:55:54 -0700)


Alexei Starovoitov (5):
  Merge branch 'bpf_get_stack'
  Merge branch 'fix-bpf-helpers-doc'

Re: [PATCH net-next] net/ipv6: rename rt6_next to fib6_next

2018-05-04 Thread David Miller

From: David Ahern 
Date: Fri,  4 May 2018 13:54:24 -0700

> This slipped through the cracks in the followup set to the fib6_info flip.
> Rename rt6_next to fib6_next.
> 
> Signed-off-by: David Ahern 

Applied, thanks David.

Re: pull-request: bpf 2018-05-05

2018-05-04 Thread David Miller

From: Daniel Borkmann 
Date: Sat,  5 May 2018 00:21:47 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Sanitize attr->{prog,map}_type from bpf(2) since used as an array index
>to retrieve prog/map specific ops such that we prevent potential out of
>bounds value under speculation, from Mark and Daniel.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

Re: [PATCH] net: disable UDP punt on sockets in RCV_SHUTDWON

2018-05-04 Thread Eric Dumazet

On 05/04/2018 02:08 PM, Chintan Shah wrote:
> A UDP application which opens multiple sockets with same local
> address/port combination (using SO_REUSEPORT/SO_REUSEADDR socket options);
> and issues connect to a remote socket (using one of these local socket).
> Now if the same socket, which issued connect, issues shutdown (SHUT_RD);
> packets would still be queued to this socket (if sent from same remote
> client, which the local socket connected to), and not delivered to the
> other socket in the normal state.
> 

Confusing changelog.

sk_shutdown is on a different cache line, so this additional fetch would cause
loss of performance if many sockets are scanned in the hash bucket.

If you are trying to add full 4-tuple hash table to UDP, and accept() ability,
this would require a bit more than this hack...

Re: [PATCH net] sctp: delay the authentication for the duplicated cookie-echo chunk

2018-05-04 Thread Marcelo Ricardo Leitner

On Fri, May 04, 2018 at 05:05:10PM +0800, Xin Long wrote:
> Now sctp only delays the authentication for the normal cookie-echo
> chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
> for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
> authentication first based on the old asoc, which will definitely
> fail due to the different auth info in the old asoc.
>
> The duplicated cookie-echo chunk will create a new asoc with the
> auth info from this chunk, and the authentication should also be
> done with the new asoc's auth info for all of the collision 'A',
> 'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
> will never pass the authentication and create the new connection.
>
> This issue exists since very beginning, and this fix is to make
> sctp_assoc_bh_rcv() follow the way sctp_assoc_bh_rcv() does for
   I guess you meant sctp_endpoint_bh_rcv here --^ right?

Other than this LGTM

> the normal cookie-echo chunk to delay the authentication.
>
> While at it, remove the unused params from sctp_sf_authenticate()
> and define sctp_auth_chunk_verify() used for all the places that
> do the delayed authentication.
>
> Signed-off-by: Xin Long 
> ---
>  net/sctp/associola.c| 30 -
>  net/sctp/sm_statefuns.c | 86 
> ++---
>  2 files changed, 75 insertions(+), 41 deletions(-)
>
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 837806d..a47179d 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>   struct sctp_endpoint *ep;
>   struct sctp_chunk *chunk;
>   struct sctp_inq *inqueue;
> - int state;
> + int first_time = 1; /* is this the first time through the loop */
>   int error = 0;
> + int state;
>
>   /* The association should be held so we should be safe. */
>   ep = asoc->ep;
> @@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>   state = asoc->state;
>   subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
>
> + /* If the first chunk in the packet is AUTH, do special
> +  * processing specified in Section 6.3 of SCTP-AUTH spec
> +  */
> + if (first_time && subtype.chunk == SCTP_CID_AUTH) {
> + struct sctp_chunkhdr *next_hdr;
> +
> + next_hdr = sctp_inq_peek(inqueue);
> + if (!next_hdr)
> + goto normal;
> +
> + /* If the next chunk is COOKIE-ECHO, skip the AUTH
> +  * chunk while saving a pointer to it so we can do
> +  * Authentication later (during cookie-echo
> +  * processing).
> +  */
> + if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
> + chunk->auth_chunk = skb_clone(chunk->skb,
> +   GFP_ATOMIC);
> + chunk->auth = 1;
> + continue;
> + }
> + }
> +
> +normal:
>   /* SCTP-AUTH, Section 6.3:
>*The receiver has a list of chunk types which it expects
>*to be received only after an AUTH-chunk.  This list has
> @@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>   /* If there is an error on chunk, discard this packet. */
>   if (error && chunk)
>   chunk->pdiscard = 1;
> +
> + if (first_time)
> + first_time = 0;
>   }
>   sctp_association_put(asoc);
>  }
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 28c070e..c9ae340 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
>   struct sctp_cmd_seq *commands);
>
>  static enum sctp_ierror sctp_sf_authenticate(
> - struct net *net,
> - const struct sctp_endpoint *ep,
>   const struct sctp_association *asoc,
> - const union sctp_subtype type,
>   struct sctp_chunk *chunk);
>
>  static enum sctp_disposition __sctp_sf_do_9_1_abort(
> @@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net 
> *net,
>   return SCTP_DISPOSITION_CONSUME;
>  }
>
> +static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
> +const struct sctp_association *asoc)
> +{
> + struct sctp_chunk auth;
> +
> + if (!chunk->auth_chunk)
> + return true;
> +
> + /* SCTP-AUTH:

Re: [PATCH bpf-next 09/10] tools: bpftool: add simple perf event output reader

2018-05-04 Thread Jakub Kicinski

CC perf folks

On Fri, 4 May 2018 14:25:03 -0700, Alexei Starovoitov wrote:
> > +static void
> > +perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
> > +{
> > +   volatile struct perf_event_mmap_page *header = ring->mem;
> > +   __u64 buffer_size = MMAP_PAGE_CNT * get_page_size();
> > +   __u64 data_tail = header->data_tail;
> > +   __u64 data_head = header->data_head;
> > +   void *base, *begin, *end;
> > +
> > +   asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
> > +   if (data_head == data_tail)
> > +   return;  
> 
> this function was copied several times into different places.
> I think it's time to put into common lib. Like libbpf.

Agreed, I think libbpf would work, although there is nothing BPF
specific in this loop AFAICT now.

> Would be great if you can do it in the follow up.

Looking into it now, I found these:

$ git grep 'data_head == data_tail'
tools/bpf/bpftool/map_perf_ring.c:  if (data_head == data_tail)
tools/testing/selftests/bpf/trace_helpers.c:if (data_head == data_tail)

Are there any other copies I should try to cater to?  I have change a few
things compared to the selftest, I guess others may have modified their
copy too.  Just trying to make sure what we put in libbpf would cater
to most possible use cases.

Should I also move bpf_perf_event_open()/test_bpf_perf_event() to libbpf?

> for the set:
> Acked-by: Alexei Starovoitov 

Thanks!

Re: [net-next PATCH v2 4/8] udp: Do not pass checksum as a parameter to GSO segmentation

2018-05-04 Thread Alexander Duyck

On Fri, May 4, 2018 at 1:19 PM, Eric Dumazet  wrote:
>
>
> On 05/04/2018 11:30 AM, Alexander Duyck wrote:
>> From: Alexander Duyck 
>>
>> This patch is meant to allow us to avoid having to recompute the checksum
>> from scratch and have it passed as a parameter.
>>
>> Instead of taking that approach we can take advantage of the fact that the
>> length that was used to compute the existing checksum is included in the
>> UDP header. If we cancel that out by adding the value XOR with 0x we
>> can then just add the new length in and fold that into the new result.
>>
>
>>
>> + uh = udp_hdr(segs);
>> +
>> + /* compute checksum adjustment based on old length versus new */
>> + newlen = htons(sizeof(*uh) + mss);
>> + check = ~csum_fold((__force __wsum)((__force u32)uh->check +
>> + ((__force u32)uh->len ^ 0x) +
>> + (__force u32)newlen));
>> +
>
>
> Can't this use csum_sub() instead of this ^ 0x trick ?

I could but that actually adds more instructions to all this since
csum_sub will perform the inversion across a 32b checksum when we only
need to bitflip a 16 bit value. I had considered doing (u16)(~uh->len)
but thought type casing it more than once would be a pain as well.

What I wanted to avoid is having to do the extra math to account for
the rollover. Adding 3 16 bit values will result in at most a 18 bit
value which can then be folded. Doing it this way we avoid that extra
add w/ carry logic that is needed for csum_add/sub.

pull-request: bpf 2018-05-05

2018-05-04 Thread Daniel Borkmann

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Sanitize attr->{prog,map}_type from bpf(2) since used as an array index
   to retrieve prog/map specific ops such that we prevent potential out of
   bounds value under speculation, from Mark and Daniel.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit a8d7aa17bbc970971ccdf71988ea19230ab368b1:

  dccp: fix tasklet usage (2018-05-03 15:14:57 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to d0f1a451e33d9ca834422622da30aa68daade56b:

  bpf: use array_index_nospec in find_prog_type (2018-05-03 19:29:35 -0700)


Daniel Borkmann (1):
  bpf: use array_index_nospec in find_prog_type

Mark Rutland (1):
  bpf: fix possible spectre-v1 in find_and_alloc_map()

 kernel/bpf/syscall.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

Re: [PATCH iproute2] rdma: fix header files

2018-05-04 Thread David Ahern

On 5/4/18 3:56 PM, Stephen Hemminger wrote:
> All user api headers in iproute2 should be in include/uapi
> so that script can be used to put correct sanitized kernel headers
> there. And the header files for rdma must be a complete set; if one
> header file includes another, all must be present.
> 
> This fixes build on older distributions, and Windows Services
> for Linux.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  include/uapi/rdma/ib_user_sa.h|   77 ++
>  include/uapi/rdma/ib_user_verbs.h | 1210 +
>  .../uapi/rdma/rdma_netlink.h  |   13 +
>  .../uapi/rdma/rdma_user_cm.h  |6 +-
>  4 files changed, 1303 insertions(+), 3 deletions(-)
>  create mode 100644 include/uapi/rdma/ib_user_sa.h
>  create mode 100644 include/uapi/rdma/ib_user_verbs.h
>  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
>  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> 

Stephen:

Per a recent discussion the RDMA folks need to take ownership of the
uapi files. RDMA features do not hit Dave's net-next tree so the rdma
code can never hit iproute2-next during a dev cycle.

Re: [PATCH v2 4/4] smack: provide socketpair callback

2018-05-04 Thread Casey Schaufler

On 5/4/2018 7:28 AM, David Herrmann wrote:
> From: Tom Gundersen 
>
> Make sure to implement the new socketpair callback so the SO_PEERSEC
> call on socketpair(2)s will return correct information.
>
> Signed-off-by: Tom Gundersen 
> Signed-off-by: David Herrmann 

This doesn't look like it will cause any problems.
I've only been able to test it in a general way. I
haven't created specific tests, but it passes the
usual Smack use cases.

Acked-by: Casey Schaufler 

> ---
>  security/smack/smack_lsm.c | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 0b414836bebd..dcb976f98df2 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -2842,6 +2842,27 @@ static int smack_socket_post_create(struct socket 
> *sock, int family,
>   return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
>  }
>  
> +/**
> + * smack_socket_socketpair - create socket pair
> + * @socka: one socket
> + * @sockb: another socket
> + *
> + * Cross reference the peer labels for SO_PEERSEC
> + *
> + * Returns 0 on success, and error code otherwise
> + */
> +static int smack_socket_socketpair(struct socket *socka,
> +struct socket *sockb)
> +{
> + struct socket_smack *asp = socka->sk->sk_security;
> + struct socket_smack *bsp = sockb->sk->sk_security;
> +
> + asp->smk_packet = bsp->smk_out;
> + bsp->smk_packet = asp->smk_out;
> +
> + return 0;
> +}
> +
>  #ifdef SMACK_IPV6_PORT_LABELING
>  /**
>   * smack_socket_bind - record port binding information.
> @@ -4724,6 +4745,7 @@ static struct security_hook_list smack_hooks[] 
> __lsm_ro_after_init = {
>   LSM_HOOK_INIT(unix_may_send, smack_unix_may_send),
>  
>   LSM_HOOK_INIT(socket_post_create, smack_socket_post_create),
> + LSM_HOOK_INIT(socket_socketpair, smack_socket_socketpair),
>  #ifdef SMACK_IPV6_PORT_LABELING
>   LSM_HOOK_INIT(socket_bind, smack_socket_bind),
>  #endif

[PATCH iproute2] rdma: fix header files

2018-05-04 Thread Stephen Hemminger

All user api headers in iproute2 should be in include/uapi
so that script can be used to put correct sanitized kernel headers
there. And the header files for rdma must be a complete set; if one
header file includes another, all must be present.

This fixes build on older distributions, and Windows Services
for Linux.

Signed-off-by: Stephen Hemminger 
---
 include/uapi/rdma/ib_user_sa.h|   77 ++
 include/uapi/rdma/ib_user_verbs.h | 1210 +
 .../uapi/rdma/rdma_netlink.h  |   13 +
 .../uapi/rdma/rdma_user_cm.h  |6 +-
 4 files changed, 1303 insertions(+), 3 deletions(-)
 create mode 100644 include/uapi/rdma/ib_user_sa.h
 create mode 100644 include/uapi/rdma/ib_user_verbs.h
 rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
 rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)

diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h
new file mode 100644
index ..0d2607f0cd20
--- /dev/null
+++ b/include/uapi/rdma/ib_user_sa.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USER_SA_H
+#define IB_USER_SA_H
+
+#include 
+
+enum {
+   IB_PATH_GMP = 1,
+   IB_PATH_PRIMARY = (1<<1),
+   IB_PATH_ALTERNATE   = (1<<2),
+   IB_PATH_OUTBOUND= (1<<3),
+   IB_PATH_INBOUND = (1<<4),
+   IB_PATH_INBOUND_REVERSE = (1<<5),
+   IB_PATH_BIDIRECTIONAL   = IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE
+};
+
+struct ib_path_rec_data {
+   __u32   flags;
+   __u32   reserved;
+   __u32   path_rec[16];
+};
+
+struct ib_user_path_rec {
+   __u8dgid[16];
+   __u8sgid[16];
+   __be16  dlid;
+   __be16  slid;
+   __u32   raw_traffic;
+   __be32  flow_label;
+   __u32   reversible;
+   __u32   mtu;
+   __be16  pkey;
+   __u8hop_limit;
+   __u8traffic_class;
+   __u8numb_path;
+   __u8sl;
+   __u8mtu_selector;
+   __u8rate_selector;
+   __u8rate;
+   __u8packet_life_time_selector;
+   __u8packet_life_time;
+   __u8preference;
+};
+
+#endif /* IB_USER_SA_H */
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
new file mode 100644
index ..9be07394fdbe
--- /dev/null
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -0,0 +1,1210 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
+ * Copyright (c) 2006 Mellanox Technologies.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary

Re: [PATCH bpf-next 09/10] tools: bpftool: add simple perf event output reader

2018-05-04 Thread Daniel Borkmann

On 05/04/2018 03:37 AM, Jakub Kicinski wrote:
> Users of BPF sooner or later discover perf_event_output() helpers
> and BPF_MAP_TYPE_PERF_EVENT_ARRAY.  Dumping this array type is
> not possible, however, we can add simple reading of perf events.
> Create a new event_pipe subcommand for maps, this sub command
> will only work with BPF_MAP_TYPE_PERF_EVENT_ARRAY maps.
> 
> Parts of the code from samples/bpf/trace_output_user.c.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 
[...]

One remark below:

[...]
> +static void
> +print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
> +{
> + struct {
> + struct perf_event_header header;
> + __u64 id;
> + __u64 lost;
> + } *lost = (void *)e;
> + struct timespec ts;
> +
> + if (clock_gettime(CLOCK_MONOTONIC, )) {
> + perror("Can't read clock for timestamp");
> + return;
> + }
Instead of the timestamp above, probably better to pick it up via
PERF_SAMPLE_TIME which needs to be added to sample_type so it also
ends up in the RB. Given below you poll with 200 and you don't set
a wakeup event for perf RB (it's probably fine not to here, but it
can be done based on watermark or events), the clock_gettime() will
be off compared to when it was actually put into the RB.

> + if (json_output) {
> + jsonw_start_object(json_wtr);
> + jsonw_name(json_wtr, "timestamp");
> + jsonw_uint(json_wtr, ts.tv_sec * 10ull + ts.tv_nsec);
> + jsonw_name(json_wtr, "type");
> + jsonw_uint(json_wtr, e->header.type);
> + jsonw_name(json_wtr, "cpu");
> + jsonw_uint(json_wtr, ring->cpu);
> + jsonw_name(json_wtr, "index");
> + jsonw_uint(json_wtr, ring->key);
> + if (e->header.type == PERF_RECORD_SAMPLE) {
> + jsonw_name(json_wtr, "data");
> + print_data_json(e->data, e->size);
> + } else if (e->header.type == PERF_RECORD_LOST) {
> + jsonw_name(json_wtr, "lost");
> + jsonw_start_object(json_wtr);
> + jsonw_name(json_wtr, "id");
> + jsonw_uint(json_wtr, lost->id);
> + jsonw_name(json_wtr, "count");
> + jsonw_uint(json_wtr, lost->lost);
> + jsonw_end_object(json_wtr);
> + }
> + jsonw_end_object(json_wtr);
> + } else {
> + if (e->header.type == PERF_RECORD_SAMPLE) {
> + printf("== @%ld.%ld CPU: %d index: %d =\n",
> +(long)ts.tv_sec, ts.tv_nsec,
> +ring->cpu, ring->key);
> + fprint_hex(stdout, e->data, e->size, " ");
> + printf("\n");
> + } else if (e->header.type == PERF_RECORD_LOST) {
> + printf("lost %lld events\n", lost->lost);
> + } else {
> + printf("unknown event type=%d size=%d\n",
> +e->header.type, e->header.size);
> + }

[PATCH bpf-next 2/6] bpf: btf: Introduce BTF ID

2018-05-04 Thread Martin KaFai Lau

This patch gives an ID to each loaded BTF.  The ID is allocated by
the idr like the existing prog-id and map-id.

The bpf_put(map->btf) is moved to __bpf_map_put() so that the
userspace can stop seeing the BTF ID ASAP when the last BTF
refcnt is gone.

It also makes BTF accessible from userspace through the
1. new BPF_BTF_GET_FD_BY_ID command.  It is limited to CAP_SYS_ADMIN
   which is inline with the BPF_BTF_LOAD cmd and the existing
   BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"

Once the BTF ID handler is accessible from userspace, freeing a BTF
object has to go through a rcu period.  The BPF_BTF_GET_FD_BY_ID cmd
can then be done under a rcu_read_lock() instead of taking
spin_lock.
[Note: A similar rcu usage can be done to the existing
   bpf_prog_get_fd_by_id() in a follow up patch]

When processing the BPF_BTF_GET_FD_BY_ID cmd,
refcount_inc_not_zero() is needed because the BTF object
could be already in the rcu dead row .  btf_get() is
removed since its usage is currently limited to btf.c
alone.  refcount_inc() is used directly instead.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
---
 include/linux/btf.h  |   2 +
 include/uapi/linux/bpf.h |   5 +++
 kernel/bpf/btf.c | 108 ++-
 kernel/bpf/syscall.c |  24 ++-
 4 files changed, 128 insertions(+), 11 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index a966dc6d61ee..e076c4697049 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -44,5 +44,7 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
u32 *ret_size);
 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
   struct seq_file *m);
+int btf_get_fd_by_id(u32 id);
+u32 btf_id(const struct btf *btf);
 
 #endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 93d5a4eeec2a..6106f23a9a8a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -96,6 +96,7 @@ enum bpf_cmd {
BPF_PROG_QUERY,
BPF_RAW_TRACEPOINT_OPEN,
BPF_BTF_LOAD,
+   BPF_BTF_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -344,6 +345,7 @@ union bpf_attr {
__u32   start_id;
__u32   prog_id;
__u32   map_id;
+   __u32   btf_id;
};
__u32   next_id;
__u32   open_flags;
@@ -2130,6 +2132,9 @@ struct bpf_map_info {
__u32 ifindex;
__u64 netns_dev;
__u64 netns_ino;
+   __u32 btf_id;
+   __u32 btf_key_id;
+   __u32 btf_value_id;
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index fa0dce0452e7..40950b6bf395 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -179,6 +180,9 @@
 i < btf_type_vlen(struct_type);\
 i++, member++)
 
+static DEFINE_IDR(btf_idr);
+static DEFINE_SPINLOCK(btf_idr_lock);
+
 struct btf {
union {
struct btf_header *hdr;
@@ -193,6 +197,8 @@ struct btf {
u32 types_size;
u32 data_size;
refcount_t refcnt;
+   u32 id;
+   struct rcu_head rcu;
 };
 
 enum verifier_phase {
@@ -598,6 +604,42 @@ static int btf_add_type(struct btf_verifier_env *env, 
struct btf_type *t)
return 0;
 }
 
+static int btf_alloc_id(struct btf *btf)
+{
+   int id;
+
+   idr_preload(GFP_KERNEL);
+   spin_lock_bh(_idr_lock);
+   id = idr_alloc_cyclic(_idr, btf, 1, INT_MAX, GFP_ATOMIC);
+   if (id > 0)
+   btf->id = id;
+   spin_unlock_bh(_idr_lock);
+   idr_preload_end();
+
+   if (WARN_ON_ONCE(!id))
+   return -ENOSPC;
+
+   return id > 0 ? 0 : id;
+}
+
+static void btf_free_id(struct btf *btf)
+{
+   unsigned long flags;
+
+   /*
+* In map-in-map, calling map_delete_elem() on outer
+* map will call bpf_map_put on the inner map.
+* It will then eventually call btf_free_id()
+* on the inner map.  Some of the map_delete_elem()
+* implementation may have irq disabled, so
+* we need to use the _irqsave() version instead
+* of the _bh() version.
+*/
+   spin_lock_irqsave(_idr_lock, flags);
+   idr_remove(_idr, btf->id);
+   spin_unlock_irqrestore(_idr_lock, flags);
+}
+
 static void btf_free(struct btf *btf)
 {
kvfree(btf->types);
@@ -607,15 +649,19 @@ static void btf_free(struct btf *btf)
kfree(btf);
 }
 
-static void btf_get(struct btf *btf)
+static void btf_free_rcu(struct rcu_head *rcu)
 {
-   refcount_inc(>refcnt);

[PATCH bpf-next 1/6] bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y

2018-05-04 Thread Martin KaFai Lau

If CONFIG_REFCOUNT_FULL=y, refcount_inc() WARN when refcount is 0.
When creating a new btf, the initial btf->refcnt is 0 and
triggered the following:

[   34.855452] refcount_t: increment on 0; use-after-free.
[   34.856252] WARNING: CPU: 6 PID: 1857 at lib/refcount.c:153 
refcount_inc+0x26/0x30

[   34.868809] Call Trace:
[   34.869168]  btf_new_fd+0x1af6/0x24d0
[   34.869645]  ? btf_type_seq_show+0x200/0x200
[   34.870212]  ? lock_acquire+0x3b0/0x3b0
[   34.870726]  ? security_capable+0x54/0x90
[   34.871247]  __x64_sys_bpf+0x1b2/0x310
[   34.871761]  ? __ia32_sys_bpf+0x310/0x310
[   34.872285]  ? bad_area_access_error+0x310/0x310
[   34.872894]  do_syscall_64+0x95/0x3f0

This patch uses refcount_set() instead.

Reported-by: Yonghong Song 
Tested-by: Yonghong Song 
Signed-off-by: Martin KaFai Lau 
---
 kernel/bpf/btf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 22e1046a1a86..fa0dce0452e7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -1977,7 +1977,7 @@ static struct btf *btf_parse(void __user *btf_data, u32 
btf_data_size,
 
if (!err) {
btf_verifier_env_free(env);
-   btf_get(btf);
+   refcount_set(>refcnt, 1);
return btf;
}
 
-- 
2.9.5

[PATCH bpf-next 0/6] Introduce BTF ID

2018-05-04 Thread Martin KaFai Lau

This series introduces BTF ID which is exposed through
the new BPF_BTF_GET_FD_BY_ID cmd, new "struct bpf_btf_info"
and new members in the "struct bpf_map_info".

Please see individual patch for details.

Martin KaFai Lau (6):
  bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y
  bpf: btf: Introduce BTF ID
  bpf: btf: Add struct bpf_btf_info
  bpf: btf: Some test_btf clean up
  bpf: btf: Update tools/include/uapi/linux/btf.h with BTF ID
  bpf: btf: Tests for BPF_OBJ_GET_INFO_BY_FD and BPF_BTF_GET_FD_BY_ID

 include/linux/btf.h|   2 +
 include/uapi/linux/bpf.h   |  11 +
 kernel/bpf/btf.c   | 136 --
 kernel/bpf/syscall.c   |  41 ++-
 tools/include/uapi/linux/bpf.h |  11 +
 tools/lib/bpf/bpf.c|  10 +
 tools/lib/bpf/bpf.h|   1 +
 tools/testing/selftests/bpf/test_btf.c | 478 +
 8 files changed, 563 insertions(+), 127 deletions(-)

-- 
2.9.5

[PATCH bpf-next 4/6] bpf: btf: Some test_btf clean up

2018-05-04 Thread Martin KaFai Lau

This patch adds a CHECK() macro for condition checking
and error report purpose.  Something similar to test_progs.c

It also counts the number of tests passed/skipped/failed and
print them at the end of the test run.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_btf.c | 201 -
 1 file changed, 99 insertions(+), 102 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index 7b39b1f712a1..b7880a20fad1 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -20,6 +20,30 @@
 
 #include "bpf_rlimit.h"
 
+static uint32_t pass_cnt;
+static uint32_t error_cnt;
+static uint32_t skip_cnt;
+
+#define CHECK(condition, format...) ({ \
+   int __ret = !!(condition);  \
+   if (__ret) {\
+   fprintf(stderr, "%s:%d:FAIL ", __func__, __LINE__); \
+   fprintf(stderr, format);\
+   }   \
+   __ret;  \
+})
+
+static int count_result(int err)
+{
+   if (err)
+   error_cnt++;
+   else
+   pass_cnt++;
+
+   fprintf(stderr, "\n");
+   return err;
+}
+
 #define min(a, b) ((a) < (b) ? (a) : (b))
 #define __printf(a, b) __attribute__((format(printf, a, b)))
 
@@ -894,17 +918,13 @@ static void *btf_raw_create(const struct btf_header *hdr,
void *raw_btf;
 
type_sec_size = get_type_sec_size(raw_types);
-   if (type_sec_size < 0) {
-   fprintf(stderr, "Cannot get nr_raw_types\n");
+   if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types"))
return NULL;
-   }
 
size_needed = sizeof(*hdr) + type_sec_size + str_sec_size;
raw_btf = malloc(size_needed);
-   if (!raw_btf) {
-   fprintf(stderr, "Cannot allocate memory for raw_btf\n");
+   if (CHECK(!raw_btf, "Cannot allocate memory for raw_btf"))
return NULL;
-   }
 
/* Copy header */
memcpy(raw_btf, hdr, sizeof(*hdr));
@@ -915,8 +935,7 @@ static void *btf_raw_create(const struct btf_header *hdr,
for (i = 0; i < type_sec_size / sizeof(raw_types[0]); i++) {
if (raw_types[i] == NAME_TBD) {
next_str = get_next_str(next_str, end_str);
-   if (!next_str) {
-   fprintf(stderr, "Error in getting next_str\n");
+   if (CHECK(!next_str, "Error in getting next_str")) {
free(raw_btf);
return NULL;
}
@@ -973,9 +992,8 @@ static int do_test_raw(unsigned int test_num)
free(raw_btf);
 
err = ((btf_fd == -1) != test->btf_load_err);
-   if (err)
-   fprintf(stderr, "btf_load_err:%d btf_fd:%d\n",
-   test->btf_load_err, btf_fd);
+   CHECK(err, "btf_fd:%d test->btf_load_err:%u",
+ btf_fd, test->btf_load_err);
 
if (err || btf_fd == -1)
goto done;
@@ -992,16 +1010,15 @@ static int do_test_raw(unsigned int test_num)
map_fd = bpf_create_map_xattr(_attr);
 
err = ((map_fd == -1) != test->map_create_err);
-   if (err)
-   fprintf(stderr, "map_create_err:%d map_fd:%d\n",
-   test->map_create_err, map_fd);
+   CHECK(err, "map_fd:%d test->map_create_err:%u",
+ map_fd, test->map_create_err);
 
 done:
if (!err)
-   fprintf(stderr, "OK\n");
+   fprintf(stderr, "OK");
 
if (*btf_log_buf && (err || args.always_log))
-   fprintf(stderr, "%s\n", btf_log_buf);
+   fprintf(stderr, "\n%s", btf_log_buf);
 
if (btf_fd != -1)
close(btf_fd);
@@ -1017,10 +1034,10 @@ static int test_raw(void)
int err = 0;
 
if (args.raw_test_num)
-   return do_test_raw(args.raw_test_num);
+   return count_result(do_test_raw(args.raw_test_num));
 
for (i = 1; i <= ARRAY_SIZE(raw_tests); i++)
-   err |= do_test_raw(i);
+   err |= count_result(do_test_raw(i));
 
return err;
 }
@@ -1080,8 +1097,7 @@ static int do_test_get_info(unsigned int test_num)
*btf_log_buf = '\0';
 
user_btf = malloc(raw_btf_size);
-   if (!user_btf) {
-   fprintf(stderr, "Cannot allocate memory for user_btf\n");
+   if (CHECK(!user_btf, "!user_btf")) {
err = -1;
goto done;
}
@@ -1089,9 +1105,7 @@ static int do_test_get_info(unsigned int test_num)
btf_fd = bpf_load_btf(raw_btf, raw_btf_size,

[PATCH bpf-next 5/6] bpf: btf: Update tools/include/uapi/linux/btf.h with BTF ID

2018-05-04 Thread Martin KaFai Lau

This patch sync the tools/include/uapi/linux/btf.h with
the newly introduced BTF ID support.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
---
 tools/include/uapi/linux/bpf.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 83a95ae388dd..fff51c187d1e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -96,6 +96,7 @@ enum bpf_cmd {
BPF_PROG_QUERY,
BPF_RAW_TRACEPOINT_OPEN,
BPF_BTF_LOAD,
+   BPF_BTF_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -343,6 +344,7 @@ union bpf_attr {
__u32   start_id;
__u32   prog_id;
__u32   map_id;
+   __u32   btf_id;
};
__u32   next_id;
__u32   open_flags;
@@ -2129,6 +2131,15 @@ struct bpf_map_info {
__u32 ifindex;
__u64 netns_dev;
__u64 netns_ino;
+   __u32 btf_id;
+   __u32 btf_key_id;
+   __u32 btf_value_id;
+} __attribute__((aligned(8)));
+
+struct bpf_btf_info {
+   __aligned_u64 btf;
+   __u32 btf_size;
+   __u32 id;
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
-- 
2.9.5

[PATCH bpf-next 6/6] bpf: btf: Tests for BPF_OBJ_GET_INFO_BY_FD and BPF_BTF_GET_FD_BY_ID

2018-05-04 Thread Martin KaFai Lau

This patch adds test for BPF_BTF_GET_FD_BY_ID and the new
btf_id/btf_key_id/btf_value_id in the "struct bpf_map_info".

It also modifies the existing BPF_OBJ_GET_INFO_BY_FD test
to reflect the new "struct bpf_btf_info".

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/bpf.c|  10 ++
 tools/lib/bpf/bpf.h|   1 +
 tools/testing/selftests/bpf/test_btf.c | 289 +++--
 3 files changed, 287 insertions(+), 13 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 76b36cc16e7f..a3a8fb2ac697 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -458,6 +458,16 @@ int bpf_map_get_fd_by_id(__u32 id)
return sys_bpf(BPF_MAP_GET_FD_BY_ID, , sizeof(attr));
 }
 
+int bpf_btf_get_fd_by_id(__u32 id)
+{
+   union bpf_attr attr;
+
+   bzero(, sizeof(attr));
+   attr.btf_id = id;
+
+   return sys_bpf(BPF_BTF_GET_FD_BY_ID, , sizeof(attr));
+}
+
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
 {
union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 553b11ad52b3..fb3a146d92ff 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -98,6 +98,7 @@ int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
 int bpf_prog_get_fd_by_id(__u32 id);
 int bpf_map_get_fd_by_id(__u32 id);
+int bpf_btf_get_fd_by_id(__u32 id);
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index b7880a20fad1..c8bceae7ec02 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -1047,9 +1047,13 @@ struct btf_get_info_test {
const char *str_sec;
__u32 raw_types[MAX_NR_RAW_TYPES];
__u32 str_sec_size;
-   int info_size_delta;
+   int btf_size_delta;
+   int (*special_test)(unsigned int test_num);
 };
 
+static int test_big_btf_info(unsigned int test_num);
+static int test_btf_id(unsigned int test_num);
+
 const struct btf_get_info_test get_info_tests[] = {
 {
.descr = "== raw_btf_size+1",
@@ -1060,7 +1064,7 @@ const struct btf_get_info_test get_info_tests[] = {
},
.str_sec = "",
.str_sec_size = sizeof(""),
-   .info_size_delta = 1,
+   .btf_size_delta = 1,
 },
 {
.descr = "== raw_btf_size-3",
@@ -1071,20 +1075,274 @@ const struct btf_get_info_test get_info_tests[] = {
},
.str_sec = "",
.str_sec_size = sizeof(""),
-   .info_size_delta = -3,
+   .btf_size_delta = -3,
+},
+{
+   .descr = "Large bpf_btf_info",
+   .raw_types = {
+   /* int */   /* [1] */
+   BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+   BTF_END_RAW,
+   },
+   .str_sec = "",
+   .str_sec_size = sizeof(""),
+   .special_test = test_big_btf_info,
+},
+{
+   .descr = "BTF ID",
+   .raw_types = {
+   /* int */   /* [1] */
+   BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+   /* unsigned int */  /* [2] */
+   BTF_TYPE_INT_ENC(0, 0, 0, 32, 4),
+   BTF_END_RAW,
+   },
+   .str_sec = "",
+   .str_sec_size = sizeof(""),
+   .special_test = test_btf_id,
 },
 };
 
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+   return (__u64)(unsigned long)ptr;
+}
+
+static int test_big_btf_info(unsigned int test_num)
+{
+   const struct btf_get_info_test *test = _info_tests[test_num - 1];
+   uint8_t *raw_btf = NULL, *user_btf = NULL;
+   unsigned int raw_btf_size;
+   struct {
+   struct bpf_btf_info info;
+   uint64_t garbage;
+   } info_garbage;
+   struct bpf_btf_info *info;
+   int btf_fd = -1, err;
+   uint32_t info_len;
+
+   raw_btf = btf_raw_create(_tmpl,
+test->raw_types,
+test->str_sec,
+test->str_sec_size,
+_btf_size);
+
+   if (!raw_btf)
+   return -1;
+
+   *btf_log_buf = '\0';
+
+   user_btf = malloc(raw_btf_size);
+   if (CHECK(!user_btf, "!user_btf")) {
+   err = -1;
+   goto done;
+   }
+
+   btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+ btf_log_buf, BTF_LOG_BUF_SIZE,
+ args.always_log);
+   if (CHECK(btf_fd == -1, "errno:%d", errno)) {
+   err = -1;
+   goto done;
+   }
+
+   /*
+* GET_INFO should error out if the

[PATCH bpf-next 3/6] bpf: btf: Add struct bpf_btf_info

2018-05-04 Thread Martin KaFai Lau

During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
info.info is directly filled with the BTF binary data.  It is
not extensible.  In this case, we want to add BTF ID.

This patch adds "struct bpf_btf_info" which has the BTF ID as
one of its member.  The BTF binary data itself is exposed through
the "btf" and "btf_size" members.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  6 ++
 kernel/bpf/btf.c | 26 +-
 kernel/bpf/syscall.c | 17 -
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6106f23a9a8a..d615c777b573 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2137,6 +2137,12 @@ struct bpf_map_info {
__u32 btf_value_id;
 } __attribute__((aligned(8)));
 
+struct bpf_btf_info {
+   __aligned_u64 btf;
+   __u32 btf_size;
+   __u32 id;
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 40950b6bf395..ded10ab47b8a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -2114,12 +2114,28 @@ int btf_get_info_by_fd(const struct btf *btf,
   const union bpf_attr *attr,
   union bpf_attr __user *uattr)
 {
-   void __user *udata = u64_to_user_ptr(attr->info.info);
-   u32 copy_len = min_t(u32, btf->data_size,
-attr->info.info_len);
+   struct bpf_btf_info __user *uinfo;
+   struct bpf_btf_info info = {};
+   u32 info_copy, btf_copy;
+   void __user *ubtf;
+   u32 uinfo_len;
 
-   if (copy_to_user(udata, btf->data, copy_len) ||
-   put_user(btf->data_size, >info.info_len))
+   uinfo = u64_to_user_ptr(attr->info.info);
+   uinfo_len = attr->info.info_len;
+
+   info_copy = min_t(u32, uinfo_len, sizeof(info));
+   if (copy_from_user(, uinfo, info_copy))
+   return -EFAULT;
+
+   info.id = btf->id;
+   ubtf = u64_to_user_ptr(info.btf);
+   btf_copy = min_t(u32, btf->data_size, info.btf_size);
+   if (copy_to_user(ubtf, btf->data, btf_copy))
+   return -EFAULT;
+   info.btf_size = btf->data_size;
+
+   if (copy_to_user(uinfo, , info_copy) ||
+   put_user(info_copy, >info.info_len))
return -EFAULT;
 
return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8b0a45d65454..d2895e3e5cbf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2019,6 +2019,21 @@ static int bpf_map_get_info_by_fd(struct bpf_map *map,
return 0;
 }
 
+static int bpf_btf_get_info_by_fd(struct btf *btf,
+ const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+   struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+   u32 info_len = attr->info.info_len;
+   int err;
+
+   err = check_uarg_tail_zero(uinfo, sizeof(*uinfo), info_len);
+   if (err)
+   return err;
+
+   return btf_get_info_by_fd(btf, attr, uattr);
+}
+
 #define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info
 
 static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
@@ -2042,7 +2057,7 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr 
*attr,
err = bpf_map_get_info_by_fd(f.file->private_data, attr,
 uattr);
else if (f.file->f_op == _fops)
-   err = btf_get_info_by_fd(f.file->private_data, attr, uattr);
+   err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr);
else
err = -EINVAL;
 
-- 
2.9.5

Re: [PATCH bpf-next 00/10] bpf: support offload of bpf_event_output()

2018-05-04 Thread Daniel Borkmann

On 05/04/2018 03:37 AM, Jakub Kicinski wrote:
> Hi!
> 
> This series centres on NFP offload of bpf_event_output().  The
> first patch allows perf event arrays to be used by offloaded
> programs.  Next patch makes the nfp driver keep track of such
> arrays to be able to filter FW events referring to maps.
> Perf event arrays are not device bound.  Having driver
> reimplement and manage the perf array seems brittle and unnecessary.
> 
> Patch 4 moves slightly the verifier step which replaces map fds
> with map pointers.  This is useful for nfp JIT since we can then
> easily replace host pointers with NFP table ids (patch 6).  This
> allows us to lift the limitation on map helpers having to be used
> with the same map pointer on all paths.  Second use of replacing
> fds with real host map pointers is that we can use the host map
> pointer as a key for FW events in perf event array offload.
> 
> Patch 5 adds perf event output offload support for the NFP.
> 
> There are some differences between bpf_event_output() offloaded
> and non-offloaded version.  The FW messages which carry events
> may get dropped and reordered relatively easily.  The return codes
> from the helper are also not guaranteed to match the host.  Users
> are warned about some of those discrepancies with a one time
> warning message to kernel logs.
> 
> bpftool gains an ability to dump perf ring events in a very simple
> format.  This was very useful for testing and simple debug, maybe
> it will be useful to others?
> 
> Last patch is a trivial comment fix.

Nice approach, applied to bpf-next, thanks Jakub!

Re: [PATCH] cxgb4vf: fix t4vf_eth_xmit()'s return type

2018-05-04 Thread Casey Leedom

| From: Luc Van Oostenryck 
| Sent: Tuesday, April 24, 2018 6:19:02 AM
| 
| The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
| which is a typedef for an enum type, but the implementation in this
| driver returns an 'int'.
| 
| Fix this by returning 'netdev_tx_t' in this driver too.

Looks good to me.

Casey

Re: [PATCH net-next] net: core: rework skb_probe_transport_header()

2018-05-04 Thread kbuild test robot

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-core-rework-skb_probe_transport_header/20180504-041345
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> include/linux/skbuff.h:2360:32: sparse: Using plain integer as NULL pointer
   drivers/net/tun.c:2088:40: sparse: expression using sizeof(void)
   drivers/net/tun.c:2221:15: sparse: expression using sizeof(void)
   drivers/net/tun.c:2221:15: sparse: expression using sizeof(void)
   drivers/net/tun.c:2846:36: sparse: incorrect type in argument 2 (different 
address spaces) @@expected struct tun_prog [noderef] **prog_p @@
got noderef] **prog_p @@
   drivers/net/tun.c:2846:36:expected struct tun_prog [noderef] 
**prog_p
   drivers/net/tun.c:2846:36:got struct tun_prog **prog_p
   drivers/net/tun.c:3142:42: sparse: incorrect type in argument 2 (different 
address spaces) @@expected struct tun_prog **prog_p @@got struct 
tun_prog [struct tun_prog **prog_p @@
   drivers/net/tun.c:3142:42:expected struct tun_prog **prog_p
   drivers/net/tun.c:3142:42:got struct tun_prog [noderef] 
**
   drivers/net/tun.c:3146:42: sparse: incorrect type in argument 2 (different 
address spaces) @@expected struct tun_prog **prog_p @@got struct 
tun_prog [struct tun_prog **prog_p @@
   drivers/net/tun.c:3146:42:expected struct tun_prog **prog_p
   drivers/net/tun.c:3146:42:got struct tun_prog [noderef] 
**
--
>> include/linux/skbuff.h:2360:32: sparse: Using plain integer as NULL pointer
   drivers/net/tap.c:879:15: sparse: expression using sizeof(void)
   drivers/net/tap.c:879:15: sparse: expression using sizeof(void)
--
   drivers/net/xen-netback/netback.c:175:21: sparse: expression using 
sizeof(void)
   drivers/net/xen-netback/netback.c:182:35: sparse: expression using 
sizeof(void)
   drivers/net/xen-netback/netback.c:182:35: sparse: expression using 
sizeof(void)
>> include/linux/skbuff.h:2360:32: sparse: Using plain integer as NULL pointer
   drivers/net/xen-netback/netback.c:1632:37: sparse: expression using 
sizeof(void)

vim +2360 include/linux/skbuff.h

  2349  
  2350  static inline void skb_probe_transport_header(struct sk_buff *skb,
  2351const int offset_hint)
  2352  {
  2353  struct flow_keys_basic keys;
  2354  
  2355  if (skb_transport_header_was_set(skb))
  2356  return;
  2357  
  2358  memset(, 0, sizeof(keys));
  2359  if (__skb_flow_dissect(skb, _keys_buf_dissector, ,
> 2360 0, 0, 0, 0, 0))
  2361  skb_set_transport_header(skb, keys.control.thoff);
  2362  else
  2363  skb_set_transport_header(skb, offset_hint);
  2364  }
  2365  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH bpf-next 09/10] tools: bpftool: add simple perf event output reader

2018-05-04 Thread Alexei Starovoitov

On Thu, May 03, 2018 at 06:37:16PM -0700, Jakub Kicinski wrote:
> Users of BPF sooner or later discover perf_event_output() helpers
> and BPF_MAP_TYPE_PERF_EVENT_ARRAY.  Dumping this array type is
> not possible, however, we can add simple reading of perf events.
> Create a new event_pipe subcommand for maps, this sub command
> will only work with BPF_MAP_TYPE_PERF_EVENT_ARRAY maps.
> 
> Parts of the code from samples/bpf/trace_output_user.c.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 
> ---
>  .../bpf/bpftool/Documentation/bpftool-map.rst |  29 +-
>  tools/bpf/bpftool/Documentation/bpftool.rst   |   2 +-
>  tools/bpf/bpftool/Makefile|   7 +-
>  tools/bpf/bpftool/bash-completion/bpftool |  36 +-
>  tools/bpf/bpftool/common.c|  19 +
>  tools/bpf/bpftool/main.h  |   4 +
>  tools/bpf/bpftool/map.c   |  19 +-
>  tools/bpf/bpftool/map_perf_ring.c | 347 ++
>  8 files changed, 444 insertions(+), 19 deletions(-)
>  create mode 100644 tools/bpf/bpftool/map_perf_ring.c
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
> b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index c3eef8c972cd..a6258bc8ec4f 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -22,12 +22,13 @@ MAP COMMANDS
>  =
>  
>  |**bpftool** **map { show | list }**   [*MAP*]
> -|**bpftool** **map dump***MAP*
> -|**bpftool** **map update**  *MAP*  **key** *DATA*   **value** *VALUE* 
> [*UPDATE_FLAGS*]
> -|**bpftool** **map lookup**  *MAP*  **key** *DATA*
> -|**bpftool** **map getnext** *MAP* [**key** *DATA*]
> -|**bpftool** **map delete**  *MAP*  **key** *DATA*
> -|**bpftool** **map pin** *MAP*  *FILE*
> +|**bpftool** **map dump**   *MAP*
> +|**bpftool** **map update** *MAP*  **key** *DATA*   **value** 
> *VALUE* [*UPDATE_FLAGS*]
> +|**bpftool** **map lookup** *MAP*  **key** *DATA*
> +|**bpftool** **map getnext***MAP* [**key** *DATA*]
> +|**bpftool** **map delete** *MAP*  **key** *DATA*
> +|**bpftool** **map pin***MAP*  *FILE*
> +|**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
>  |**bpftool** **map help**
>  |
>  |*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> @@ -76,6 +77,22 @@ DESCRIPTION
>  
> Note: *FILE* must be located in *bpffs* mount.
>  
> + **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
> +   Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
> +
> +   Install perf rings into a perf event array map and dump
> +   output of any bpf_perf_event_output() call in the kernel.
> +   By default read the number of CPUs on the system and
> +   install perf ring for each CPU in the corresponding index
> +   in the array.
> +
> +   If **cpu** and **index** are specified, install perf ring
> +   for given **cpu** at **index** in the array (single ring).
> +
> +   Note that installing a perf ring into an array will silently
> +   replace any existing ring.  Any other application will stop
> +   receiving events if it installed its rings earlier.
> +
>   **bpftool map help**
> Print short help message.
>  
> diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst 
> b/tools/bpf/bpftool/Documentation/bpftool.rst
> index 20689a321ffe..564cb0d9692b 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool.rst
> @@ -23,7 +23,7 @@ SYNOPSIS
>  
>   *MAP-COMMANDS* :=
>   { **show** | **list** | **dump** | **update** | **lookup** | 
> **getnext** | **delete**
> - | **pin** | **help** }
> + | **pin** | **event_pipe** | **help** }
>  
>   *PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump 
> xlated** | **pin**
>   | **load** | **help** }
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 4e69782c4a79..892dbf095bff 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -39,7 +39,12 @@ CC = gcc
>  
>  CFLAGS += -O2
>  CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow 
> -Wno-missing-field-initializers
> -CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ 
> -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include 
> -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
> +CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ \
> + -I$(srctree)/kernel/bpf/ \
> + -I$(srctree)/tools/include \
> + -I$(srctree)/tools/include/uapi \
> + -I$(srctree)/tools/lib/bpf \
> + -I$(srctree)/tools/perf
>  CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
>  LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
>  
> diff

[PATCH] net: disable UDP punt on sockets in RCV_SHUTDWON

2018-05-04 Thread Chintan Shah

A UDP application which opens multiple sockets with same local
address/port combination (using SO_REUSEPORT/SO_REUSEADDR socket options);
and issues connect to a remote socket (using one of these local socket).
Now if the same socket, which issued connect, issues shutdown (SHUT_RD);
packets would still be queued to this socket (if sent from same remote
client, which the local socket connected to), and not delivered to the
other socket in the normal state.

In UDP socket lookup, socket's state (if it has issued SHUTDOWN on
read or not), is not taken into account. When application calls, SHUTDOWN
(SHUT_RD), UDP socket's state is changed (sk_shutdown is set to
RCV_SHUTDOWN).

UDP socket lookup is performed with help of compute_score
function. The function checks socket's attributes against incoming packets
headers; and based on match/mismatch it returns score. We can check for
the socket's state (sk->sk_shutdown) here, in same compute_score function,
and return values accordingly.

Signed-off-by: Chintan Shah 
CC: xe-linux-exter...@cisco.com
---
 net/ipv4/udp.c | 6 ++
 net/ipv6/udp.c | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0dfcd73..a5fe6d7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -402,6 +402,9 @@ static inline int compute_score(struct sock *sk, struct net 
*net,
 #endif
 #endif
 
+   if (sk->sk_shutdown & RCV_SHUTDOWN)
+   return -1;
+
if (!net_eq(sock_net(sk), net) ||
udp_sk(sk)->udp_port_hash != hnum ||
ipv6_only_sock(sk))
@@ -483,6 +486,9 @@ static inline int compute_score2(struct sock *sk, struct 
net *net,
 #endif
 #endif
 
+   if (sk->sk_shutdown & RCV_SHUTDOWN)
+   return -1;
+
if (!net_eq(sock_net(sk), net) ||
ipv6_only_sock(sk))
return -1;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d956cbb..2254b07 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -170,6 +170,9 @@ static inline int compute_score(struct sock *sk, struct net 
*net,
 #endif
 #endif
 
+   if (sk->sk_shutdown & RCV_SHUTDOWN)
+   return -1;
+
if (!net_eq(sock_net(sk), net) ||
udp_sk(sk)->udp_port_hash != hnum ||
sk->sk_family != PF_INET6)
@@ -251,6 +254,9 @@ static inline int compute_score2(struct sock *sk, struct 
net *net,
 #endif
 #endif
 
+   if (sk->sk_shutdown & RCV_SHUTDOWN)
+   return -1;
+
if (!net_eq(sock_net(sk), net) ||
udp_sk(sk)->udp_port_hash != hnum ||
sk->sk_family != PF_INET6)
-- 
2.5.0

[PATCH ghak81 RFC V1 0/5] audit: group task params

2018-05-04 Thread Richard Guy Briggs

Group the audit parameters for each task into one structure.
In particular, remove the loginuid and sessionid values and the audit
context pointer from the task structure, replacing them with an audit
task information structure to contain them.  Use access functions to
access audit values.

Note:  Use static allocation of the audit task information structure
initially.  Dynamic allocation was considered and attempted, but isn't
ready yet.  Static allocation has the limitation that future audit task
information structure changes would cause a visible change to the rest
of the kernel, whereas dynamic allocation would mostly hide any future
changes.

The first four access normalization patches could stand alone.

Passes audit-testsuite.

Richard Guy Briggs (5):
  audit: normalize loginuid read access
  audit: convert sessionid unset to a macro
  audit: use inline function to get audit context
  audit: use inline function to set audit context
  audit: collect audit task parameters

 MAINTAINERS  |  2 +-
 include/linux/audit.h| 30 ++---
 include/linux/audit_task.h   | 31 ++
 include/linux/sched.h|  6 +--
 include/net/xfrm.h   |  4 +-
 include/uapi/linux/audit.h   |  1 +
 init/init_task.c |  8 +++-
 kernel/audit.c   |  4 +-
 kernel/audit_watch.c |  2 +-
 kernel/auditsc.c | 82 ++--
 kernel/fork.c|  2 +-
 net/bridge/netfilter/ebtables.c  |  2 +-
 net/core/dev.c   |  2 +-
 net/netfilter/x_tables.c |  2 +-
 net/netlabel/netlabel_user.c |  2 +-
 security/integrity/ima/ima_api.c |  2 +-
 security/integrity/integrity_audit.c |  2 +-
 security/lsm_audit.c |  2 +-
 security/selinux/hooks.c |  4 +-
 security/selinux/selinuxfs.c |  6 +--
 security/selinux/ss/services.c   | 12 +++---
 21 files changed, 129 insertions(+), 79 deletions(-)
 create mode 100644 include/linux/audit_task.h

-- 
1.8.3.1

[PATCH ghak81 RFC V1 1/5] audit: normalize loginuid read access

2018-05-04 Thread Richard Guy Briggs

Recognizing that the loginuid is an internal audit value, use an access
function to retrieve the audit loginuid value for the task rather than
reaching directly into the task struct to get it.

Signed-off-by: Richard Guy Briggs 
---
 kernel/auditsc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 479c031..f3817d0 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -374,7 +374,7 @@ static int audit_field_compare(struct task_struct *tsk,
case AUDIT_COMPARE_EGID_TO_OBJ_GID:
return audit_compare_gid(cred->egid, name, f, ctx);
case AUDIT_COMPARE_AUID_TO_OBJ_UID:
-   return audit_compare_uid(tsk->loginuid, name, f, ctx);
+   return audit_compare_uid(audit_get_loginuid(tsk), name, f, ctx);
case AUDIT_COMPARE_SUID_TO_OBJ_UID:
return audit_compare_uid(cred->suid, name, f, ctx);
case AUDIT_COMPARE_SGID_TO_OBJ_GID:
@@ -385,7 +385,7 @@ static int audit_field_compare(struct task_struct *tsk,
return audit_compare_gid(cred->fsgid, name, f, ctx);
/* uid comparisons */
case AUDIT_COMPARE_UID_TO_AUID:
-   return audit_uid_comparator(cred->uid, f->op, tsk->loginuid);
+   return audit_uid_comparator(cred->uid, f->op, 
audit_get_loginuid(tsk));
case AUDIT_COMPARE_UID_TO_EUID:
return audit_uid_comparator(cred->uid, f->op, cred->euid);
case AUDIT_COMPARE_UID_TO_SUID:
@@ -394,11 +394,11 @@ static int audit_field_compare(struct task_struct *tsk,
return audit_uid_comparator(cred->uid, f->op, cred->fsuid);
/* auid comparisons */
case AUDIT_COMPARE_AUID_TO_EUID:
-   return audit_uid_comparator(tsk->loginuid, f->op, cred->euid);
+   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
cred->euid);
case AUDIT_COMPARE_AUID_TO_SUID:
-   return audit_uid_comparator(tsk->loginuid, f->op, cred->suid);
+   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
cred->suid);
case AUDIT_COMPARE_AUID_TO_FSUID:
-   return audit_uid_comparator(tsk->loginuid, f->op, cred->fsuid);
+   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
cred->fsuid);
/* euid comparisons */
case AUDIT_COMPARE_EUID_TO_SUID:
return audit_uid_comparator(cred->euid, f->op, cred->suid);
@@ -611,7 +611,7 @@ static int audit_filter_rules(struct task_struct *tsk,
result = match_tree_refs(ctx, rule->tree);
break;
case AUDIT_LOGINUID:
-   result = audit_uid_comparator(tsk->loginuid, f->op, 
f->uid);
+   result = audit_uid_comparator(audit_get_loginuid(tsk), 
f->op, f->uid);
break;
case AUDIT_LOGINUID_SET:
result = audit_comparator(audit_loginuid_set(tsk), 
f->op, f->val);
@@ -2287,8 +2287,8 @@ int audit_signal_info(int sig, struct task_struct *t)
(sig == SIGTERM || sig == SIGHUP ||
 sig == SIGUSR1 || sig == SIGUSR2)) {
audit_sig_pid = task_tgid_nr(tsk);
-   if (uid_valid(tsk->loginuid))
-   audit_sig_uid = tsk->loginuid;
+   if (uid_valid(audit_get_loginuid(tsk)))
+   audit_sig_uid = audit_get_loginuid(tsk);
else
audit_sig_uid = uid;
security_task_getsecid(tsk, _sig_sid);
-- 
1.8.3.1

[PATCH ghak81 RFC V1 2/5] audit: convert sessionid unset to a macro

2018-05-04 Thread Richard Guy Briggs

Use a macro, "AUDIT_SID_UNSET", to replace each instance of
initialization and comparison to an audit session ID.

Signed-off-by: Richard Guy Briggs 
---
 include/linux/audit.h  | 2 +-
 include/net/xfrm.h | 2 +-
 include/uapi/linux/audit.h | 1 +
 init/init_task.c   | 2 +-
 kernel/auditsc.c   | 4 ++--
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 75d5b03..5f86f7c 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -513,7 +513,7 @@ static inline kuid_t audit_get_loginuid(struct task_struct 
*tsk)
 }
 static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 {
-   return -1;
+   return AUDIT_SID_UNSET;
 }
 static inline void audit_ipc_obj(struct kern_ipc_perm *ipcp)
 { }
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index a872379..fcce8ee 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -751,7 +751,7 @@ static inline void xfrm_audit_helper_usrinfo(bool 
task_valid,
audit_get_loginuid(current) :
INVALID_UID);
const unsigned int ses = task_valid ? audit_get_sessionid(current) :
-   (unsigned int) -1;
+   AUDIT_SID_UNSET;
 
audit_log_format(audit_buf, " auid=%u ses=%u", auid, ses);
audit_log_task_context(audit_buf);
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 4e61a9e..04f9bd2 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -465,6 +465,7 @@ struct audit_tty_status {
 };
 
 #define AUDIT_UID_UNSET (unsigned int)-1
+#define AUDIT_SID_UNSET ((unsigned int)-1)
 
 /* audit_rule_data supports filter rules with both integer and string
  * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
diff --git a/init/init_task.c b/init/init_task.c
index 3ac6e75..c788f91 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -119,7 +119,7 @@ struct task_struct init_task
.thread_node= LIST_HEAD_INIT(init_signals.thread_head),
 #ifdef CONFIG_AUDITSYSCALL
.loginuid   = INVALID_UID,
-   .sessionid  = (unsigned int)-1,
+   .sessionid  = AUDIT_SID_UNSET,
 #endif
 #ifdef CONFIG_PERF_EVENTS
.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f3817d0..6e3ceb9 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2050,7 +2050,7 @@ static void audit_log_set_loginuid(kuid_t koldloginuid, 
kuid_t kloginuid,
 int audit_set_loginuid(kuid_t loginuid)
 {
struct task_struct *task = current;
-   unsigned int oldsessionid, sessionid = (unsigned int)-1;
+   unsigned int oldsessionid, sessionid = AUDIT_SID_UNSET;
kuid_t oldloginuid;
int rc;
 
@@ -2064,7 +2064,7 @@ int audit_set_loginuid(kuid_t loginuid)
/* are we setting or clearing? */
if (uid_valid(loginuid)) {
sessionid = (unsigned int)atomic_inc_return(_id);
-   if (unlikely(sessionid == (unsigned int)-1))
+   if (unlikely(sessionid == AUDIT_SID_UNSET))
sessionid = (unsigned 
int)atomic_inc_return(_id);
}
 
-- 
1.8.3.1

[PATCH ghak81 RFC V1 4/5] audit: use inline function to set audit context

2018-05-04 Thread Richard Guy Briggs

Recognizing that the audit context is an internal audit value, use an
access function to set the audit context pointer for the task
rather than reaching directly into the task struct to set it.

Signed-off-by: Richard Guy Briggs 
---
 include/linux/audit.h | 8 
 kernel/auditsc.c  | 6 +++---
 kernel/fork.c | 2 +-
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 93e4c61..dba0d45 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -235,6 +235,10 @@ extern void __audit_inode_child(struct inode *parent,
 extern void __audit_seccomp(unsigned long syscall, long signr, int code);
 extern void __audit_ptrace(struct task_struct *t);
 
+static inline void audit_set_context(struct task_struct *task, struct 
audit_context *ctx)
+{
+   task->audit_context = ctx;
+}
 static inline struct audit_context *audit_context(struct task_struct *task)
 {
return task->audit_context;
@@ -472,6 +476,10 @@ static inline bool audit_dummy_context(void)
 {
return true;
 }
+static inline void audit_set_context(struct task_struct *task, struct 
audit_context *ctx)
+{
+   task->audit_context = ctx;
+}
 static inline struct audit_context *audit_context(struct task_struct *task)
 {
return NULL;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index a4bbdcc..f294e4a 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -865,7 +865,7 @@ static inline struct audit_context 
*audit_take_context(struct task_struct *tsk,
audit_filter_inodes(tsk, context);
}
 
-   tsk->audit_context = NULL;
+   audit_set_context(tsk, NULL);
return context;
 }
 
@@ -952,7 +952,7 @@ int audit_alloc(struct task_struct *tsk)
}
context->filterkey = key;
 
-   tsk->audit_context  = context;
+   audit_set_context(tsk, context);
set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
 }
@@ -1590,7 +1590,7 @@ void __audit_syscall_exit(int success, long return_code)
kfree(context->filterkey);
context->filterkey = NULL;
}
-   tsk->audit_context = context;
+   audit_set_context(tsk, context);
 }
 
 static inline void handle_one(const struct inode *inode)
diff --git a/kernel/fork.c b/kernel/fork.c
index 242c8c9..cd18448 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1713,7 +1713,7 @@ static __latent_entropy struct task_struct *copy_process(
p->start_time = ktime_get_ns();
p->real_start_time = ktime_get_boot_ns();
p->io_context = NULL;
-   p->audit_context = NULL;
+   audit_set_context(p, NULL);
cgroup_fork(p);
 #ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
-- 
1.8.3.1

[PATCH ghak81 RFC V1 3/5] audit: use inline function to get audit context

2018-05-04 Thread Richard Guy Briggs

Recognizing that the audit context is an internal audit value, use an
access function to retrieve the audit context pointer for the task
rather than reaching directly into the task struct to get it.

Signed-off-by: Richard Guy Briggs 
---
 include/linux/audit.h| 16 ---
 include/net/xfrm.h   |  2 +-
 kernel/audit.c   |  4 +--
 kernel/audit_watch.c |  2 +-
 kernel/auditsc.c | 52 ++--
 net/bridge/netfilter/ebtables.c  |  2 +-
 net/core/dev.c   |  2 +-
 net/netfilter/x_tables.c |  2 +-
 net/netlabel/netlabel_user.c |  2 +-
 security/integrity/ima/ima_api.c |  2 +-
 security/integrity/integrity_audit.c |  2 +-
 security/lsm_audit.c |  2 +-
 security/selinux/hooks.c |  4 +--
 security/selinux/selinuxfs.c |  6 ++---
 security/selinux/ss/services.c   | 12 -
 15 files changed, 60 insertions(+), 52 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 5f86f7c..93e4c61 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -235,26 +235,30 @@ extern void __audit_inode_child(struct inode *parent,
 extern void __audit_seccomp(unsigned long syscall, long signr, int code);
 extern void __audit_ptrace(struct task_struct *t);
 
+static inline struct audit_context *audit_context(struct task_struct *task)
+{
+   return task->audit_context;
+}
 static inline bool audit_dummy_context(void)
 {
-   void *p = current->audit_context;
+   void *p = audit_context(current);
return !p || *(int *)p;
 }
 static inline void audit_free(struct task_struct *task)
 {
-   if (unlikely(task->audit_context))
+   if (unlikely(audit_context(task)))
__audit_free(task);
 }
 static inline void audit_syscall_entry(int major, unsigned long a0,
   unsigned long a1, unsigned long a2,
   unsigned long a3)
 {
-   if (unlikely(current->audit_context))
+   if (unlikely(audit_context(current)))
__audit_syscall_entry(major, a0, a1, a2, a3);
 }
 static inline void audit_syscall_exit(void *pt_regs)
 {
-   if (unlikely(current->audit_context)) {
+   if (unlikely(audit_context(current))) {
int success = is_syscall_success(pt_regs);
long return_code = regs_return_value(pt_regs);
 
@@ -468,6 +472,10 @@ static inline bool audit_dummy_context(void)
 {
return true;
 }
+static inline struct audit_context *audit_context(struct task_struct *task)
+{
+   return NULL;
+}
 static inline struct filename *audit_reusename(const __user char *name)
 {
return NULL;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index fcce8ee..2788332 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -736,7 +736,7 @@ static inline struct audit_buffer *xfrm_audit_start(const 
char *op)
 
if (audit_enabled == 0)
return NULL;
-   audit_buf = audit_log_start(current->audit_context, GFP_ATOMIC,
+   audit_buf = audit_log_start(audit_context(current), GFP_ATOMIC,
AUDIT_MAC_IPSEC_EVENT);
if (audit_buf == NULL)
return NULL;
diff --git a/kernel/audit.c b/kernel/audit.c
index e9f9a90..9a03603 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1099,7 +1099,7 @@ static void audit_log_feature_change(int which, u32 
old_feature, u32 new_feature
 
if (audit_enabled == AUDIT_OFF)
return;
-   ab = audit_log_start(current->audit_context,
+   ab = audit_log_start(audit_context(current),
 GFP_KERNEL, AUDIT_FEATURE_CHANGE);
if (!ab)
return;
@@ -2317,7 +2317,7 @@ void audit_log_link_denied(const char *operation)
return;
 
/* Generate AUDIT_ANOM_LINK with subject, operation, outcome. */
-   ab = audit_log_start(current->audit_context, GFP_KERNEL,
+   ab = audit_log_start(audit_context(current), GFP_KERNEL,
 AUDIT_ANOM_LINK);
if (!ab)
return;
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 9eb8b35..8b596c4 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -274,7 +274,7 @@ static void audit_update_watch(struct audit_parent *parent,
/* If the update involves invalidating rules, do the inode-based
 * filtering now, so we don't omit records. */
if (invalidating && !audit_dummy_context())
-   audit_filter_inodes(current, current->audit_context);
+   audit_filter_inodes(current, audit_context(current));
 
/* updating ino will likely change which audit_hash_list we
 * are on so we need a new watch for the new list */
diff --git

[PATCH ghak81 RFC V1 5/5] audit: collect audit task parameters

2018-05-04 Thread Richard Guy Briggs

The audit-related parameters in struct task_struct should ideally be
collected together and accessed through a standard audit API.

Collect the existing loginuid, sessionid and audit_context together in a
new struct audit_task_info pointer called "audit" in struct task_struct.

Use kmem_cache to manage this pool of memory.
Un-inline audit_free() to be able to always recover that memory.

See: https://github.com/linux-audit/audit-kernel/issues/81

Signed-off-by: Richard Guy Briggs 
---
 MAINTAINERS|  2 +-
 include/linux/audit.h  |  8 
 include/linux/audit_task.h | 31 +++
 include/linux/sched.h  |  6 ++
 init/init_task.c   |  8 ++--
 kernel/auditsc.c   |  4 ++--
 6 files changed, 46 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/audit_task.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0a1410d..8c7992d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2510,7 +2510,7 @@ L:linux-au...@redhat.com (moderated for 
non-subscribers)
 W: https://github.com/linux-audit
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
 S: Supported
-F: include/linux/audit.h
+F: include/linux/audit*.h
 F: include/uapi/linux/audit.h
 F: kernel/audit*
 
diff --git a/include/linux/audit.h b/include/linux/audit.h
index dba0d45..1324969 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -237,11 +237,11 @@ extern void __audit_inode_child(struct inode *parent,
 
 static inline void audit_set_context(struct task_struct *task, struct 
audit_context *ctx)
 {
-   task->audit_context = ctx;
+   task->audit.ctx = ctx;
 }
 static inline struct audit_context *audit_context(struct task_struct *task)
 {
-   return task->audit_context;
+   return task->audit.ctx;
 }
 static inline bool audit_dummy_context(void)
 {
@@ -330,12 +330,12 @@ extern int auditsc_get_stamp(struct audit_context *ctx,
 
 static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
 {
-   return tsk->loginuid;
+   return tsk->audit.loginuid;
 }
 
 static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 {
-   return tsk->sessionid;
+   return tsk->audit.sessionid;
 }
 
 extern void __audit_ipc_obj(struct kern_ipc_perm *ipcp);
diff --git a/include/linux/audit_task.h b/include/linux/audit_task.h
new file mode 100644
index 000..d4b3a20
--- /dev/null
+++ b/include/linux/audit_task.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* audit_task.h -- definition of audit_task_info structure
+ *
+ * Copyright 2018 Red Hat Inc., Raleigh, North Carolina.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Written by Richard Guy Briggs 
+ *
+ */
+
+#ifndef _LINUX_AUDIT_TASK_H_
+#define _LINUX_AUDIT_TASK_H_
+
+struct audit_context;
+struct audit_task_info {
+   kuid_t  loginuid;
+   unsigned intsessionid;
+   struct audit_context*ctx;
+};
+
+#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3d697f..b58eca0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -27,9 +27,9 @@
 #include 
 #include 
 #include 
+#include 
 
 /* task_struct member predeclarations (sorted alphabetically): */
-struct audit_context;
 struct backing_dev_info;
 struct bio_list;
 struct blk_plug;
@@ -832,10 +832,8 @@ struct task_struct {
 
struct callback_head*task_works;
 
-   struct audit_context*audit_context;
 #ifdef CONFIG_AUDITSYSCALL
-   kuid_t  loginuid;
-   unsigned intsessionid;
+   struct audit_task_info  audit;
 #endif
struct seccomp  seccomp;
 
diff --git a/init/init_task.c b/init/init_task.c
index c788f91..d33260d 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -118,8 +119,11 @@ struct task_struct init_task
.thread_group   = LIST_HEAD_INIT(init_task.thread_group),
.thread_node= LIST_HEAD_INIT(init_signals.thread_head),
 #ifdef CONFIG_AUDITSYSCALL
-   .loginuid   = INVALID_UID,
-   .sessionid  = AUDIT_SID_UNSET,
+   .audit  = {
+   .loginuid   = INVALID_UID,
+   .sessionid  = AUDIT_SID_UNSET,
+   .ctx= NULL,
+   },
 #endif
 #ifdef CONFIG_PERF_EVENTS

[PATCH net-next] net/ipv6: rename rt6_next to fib6_next

2018-05-04 Thread David Ahern

This slipped through the cracks in the followup set to the fib6_info flip.
Rename rt6_next to fib6_next.

Signed-off-by: David Ahern 
---
 include/net/ip6_fib.h |  6 +++---
 net/ipv6/ip6_fib.c| 26 +-
 net/ipv6/route.c  | 12 ++--
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1af450d4e923..a3ec08d05756 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -135,7 +135,7 @@ struct fib6_nh {
 
 struct fib6_info {
struct fib6_table   *fib6_table;
-   struct fib6_info __rcu  *rt6_next;
+   struct fib6_info __rcu  *fib6_next;
struct fib6_node __rcu  *fib6_node;
 
/* Multipath routes:
@@ -192,11 +192,11 @@ struct rt6_info {
 
 #define for_each_fib6_node_rt_rcu(fn)  \
for (rt = rcu_dereference((fn)->leaf); rt;  \
-rt = rcu_dereference(rt->rt6_next))
+rt = rcu_dereference(rt->fib6_next))
 
 #define for_each_fib6_walker_rt(w) \
for (rt = (w)->leaf; rt;\
-rt = rcu_dereference_protected(rt->rt6_next, 1))
+rt = rcu_dereference_protected(rt->fib6_next, 1))
 
 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
 {
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 6421c893466e..f0a4262a4789 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -945,7 +945,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
fib6_info *rt,
ins = >leaf;
 
for (iter = leaf; iter;
-iter = rcu_dereference_protected(iter->rt6_next,
+iter = rcu_dereference_protected(iter->fib6_next,
lockdep_is_held(>fib6_table->tb6_lock))) {
/*
 *  Search for duplicates
@@ -1002,7 +1002,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
fib6_info *rt,
break;
 
 next_iter:
-   ins = >rt6_next;
+   ins = >fib6_next;
}
 
if (fallback_ins && !found) {
@@ -1031,7 +1031,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
fib6_info *rt,
  >fib6_siblings);
break;
}
-   sibling = rcu_dereference_protected(sibling->rt6_next,
+   sibling = rcu_dereference_protected(sibling->fib6_next,
lockdep_is_held(>fib6_table->tb6_lock));
}
/* For each sibling in the list, increment the counter of
@@ -1065,7 +1065,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
fib6_info *rt,
if (err)
return err;
 
-   rcu_assign_pointer(rt->rt6_next, iter);
+   rcu_assign_pointer(rt->fib6_next, iter);
atomic_inc(>fib6_ref);
rcu_assign_pointer(rt->fib6_node, fn);
rcu_assign_pointer(*ins, rt);
@@ -1096,7 +1096,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
fib6_info *rt,
 
atomic_inc(>fib6_ref);
rcu_assign_pointer(rt->fib6_node, fn);
-   rt->rt6_next = iter->rt6_next;
+   rt->fib6_next = iter->fib6_next;
rcu_assign_pointer(*ins, rt);
if (!info->skip_notify)
inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
@@ -1113,14 +1113,14 @@ static int fib6_add_rt2node(struct fib6_node *fn, 
struct fib6_info *rt,
 
if (nsiblings) {
/* Replacing an ECMP route, remove all siblings */
-   ins = >rt6_next;
+   ins = >fib6_next;
iter = rcu_dereference_protected(*ins,
lockdep_is_held(>fib6_table->tb6_lock));
while (iter) {
if (iter->fib6_metric > rt->fib6_metric)
break;
if (rt6_qualify_for_ecmp(iter)) {
-   *ins = iter->rt6_next;
+   *ins = iter->fib6_next;
iter->fib6_node = NULL;
fib6_purge_rt(iter, fn, info->nl_net);
if (rcu_access_pointer(fn->rr_ptr) == 
iter)
@@ -1129,7 +1129,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
fib6_info *rt,
nsiblings--;

info->nl_net->ipv6.rt6_stats->fib_rt_entries--;
} else {
-   ins =

Re: [net-next PATCH v2 4/8] udp: Do not pass checksum as a parameter to GSO segmentation

2018-05-04 Thread Eric Dumazet



On 05/04/2018 11:30 AM, Alexander Duyck wrote:
> From: Alexander Duyck 
> 
> This patch is meant to allow us to avoid having to recompute the checksum
> from scratch and have it passed as a parameter.
> 
> Instead of taking that approach we can take advantage of the fact that the
> length that was used to compute the existing checksum is included in the
> UDP header. If we cancel that out by adding the value XOR with 0x we
> can then just add the new length in and fold that into the new result.
> 

>  
> + uh = udp_hdr(segs);
> +
> + /* compute checksum adjustment based on old length versus new */
> + newlen = htons(sizeof(*uh) + mss);
> + check = ~csum_fold((__force __wsum)((__force u32)uh->check +
> + ((__force u32)uh->len ^ 0x) +
> + (__force u32)newlen));
> +


Can't this use csum_sub() instead of this ^ 0x trick ?

Product Inquiry

2018-05-04 Thread Gerhard Kahmann

?Dear Sir,

We recently visited your website, we were recommended by one of your customer 
and we are interested in your models, We will like to place an order from the 
list of your products. However, we would like to see your company's latest 
catalogs with the; minimum order quantity, delivery time/FOB, payment terms 
etc. Official order placement will follow as soon as possible.

Awaiting your prompt reply

Best Regards
Gerhard Kahmann
Purchasing Dept
*

Re: [net-next PATCH v2 3/8] udp: Do not pass MSS as parameter to GSO segmentation

2018-05-04 Thread Eric Dumazet



On 05/04/2018 11:29 AM, Alexander Duyck wrote:
> From: Alexander Duyck 
> 
> There is no point in passing MSS as a parameter for for the GSO
> segmentation call as it is already available via the shared info for the
> skb itself.
> 
> Signed-off-by: Alexander Duyck 
> ---

Reviewed-by: Eric Dumazet

Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Peter Zijlstra

On Fri, May 04, 2018 at 09:07:35PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-05-04 20:51:32 [+0200], Peter Zijlstra wrote:

> > softirqs disabled, ack that is exactly what it checks.
> > 
> > But afaict the assertion you introduced tests that we are _in_ softirq
> > context, which is not the same.
> 
> indeed, now it clicked. Given what I wrote in the cover letter would you
> be in favour of (a proper) lockdep_assert_BH_disabled() or the cheaper
> local_bh_enable() (assuming the network folks don't mind the cheaper
> version)?

Depends a bit on what the code wants I suppose. If the code is in fact
fine with the stronger in-softirq assertion, that'd be best. Otherwise I
don't object to a lockdep_assert_bh_disabled() to accompany the
lockdep_assert_irq_disabled() we already have either.

Re: [net-next PATCH v2 2/8] udp: Verify that pulling UDP header in GSO segmentation doesn't fail

2018-05-04 Thread Eric Dumazet



On 05/04/2018 11:29 AM, Alexander Duyck wrote:
> From: Alexander Duyck 
> 
> We should verify that we can pull the UDP header before we attempt to do
> so. Otherwise if this fails we have no way of knowing and GSO will not work
> correctly.
> 
> Signed-off-by: Alexander Duyck 
> ---

Reviewed-by: Eric Dumazet

Re: [PATCH v5 0/6] firmware_loader: cleanups for v4.18

2018-05-04 Thread Luis R. Rodriguez

On Fri, May 04, 2018 at 09:17:08PM +0200, Krzysztof Halasa wrote:
> "Luis R. Rodriguez"  writes:
> 
> >   * CONFIG_WANXL --> CONFIG_WANXL_BUILD_FIRMWARE
> >   * CONFIG_SCSI_AIC79XX --> CONFIG_AIC79XX_BUILD_FIRMWARE
> >
> > To this day both of these drivers are building driver *firmwares* when
> > the option CONFIG_PREVENT_FIRMWARE_BUILD is disabled, and they don't
> > even make use of the firmware API at all.
> 
> Don't know for sure about Adaptec, but wanXL firmware fits every
> definition of "stable". BTW it's a 1997 or so early PCI card, built
> around the PLX PCI9060 bus mastering PCI bridge and Motorola 68360
> (the original QUICC) processor. Maximum bit rate of 2 Mb/s on each sync
> serial port.

So we can nuke CONFIG_WANXL_BUILD_FIRMWARE now?

> It's more about delivering the .S source for the firmware, I guess.
> Nobody is expected to build it. The fw is about 2.5 KB and is directly
> linked with the driver.

:P Future work I guess would be to just use the firmware API and stuff
it into linux-firmware?

  Luis

Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper

2018-05-04 Thread Luis R. Rodriguez

What a mighty short list of reviewers. Adding some more. My review below.
I'd appreciate a Cc on future versions of these patches.

On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> Introduce helper:
> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> struct umh_info {
>struct file *pipe_to_umh;
>struct file *pipe_from_umh;
>pid_t pid;
> };
> 
> that GPLed kernel modules (signed or unsigned) can use it to execute part
> of its own data as swappable user mode process.
> 
> The kernel will do:
> - mount "tmpfs"

Actually its a *shared* vfsmount tmpfs for all umh blobs.

> - allocate a unique file in tmpfs
> - populate that file with [data, data + len] bytes
> - user-mode-helper code will do_execve that file and, before the process
>   starts, the kernel will create two unix pipes for bidirectional
>   communication between kernel module and umh
> - close tmpfs file, effectively deleting it
> - the fork_usermode_blob will return zero on success and populate
>   'struct umh_info' with two unix pipes and the pid of the user process

But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
inception point was intended, well though out, and will run, but the return
value in no way reflects the success or not of the execution. More below.

> As the first step in the development of the bpfilter project
> the fork_usermode_blob() helper is introduced to allow user mode code
> to be invoked from a kernel module. The idea is that user mode code plus
> normal kernel module code are built as part of the kernel build
> and installed as traditional kernel module into distro specified location,
> such that from a distribution point of view, there is
> no difference between regular kernel modules and kernel modules + umh code.
> Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> by a kernel module doesn't make it any special from kernel and user space
> tooling point of view.
> 
> Such approach enables kernel to delegate functionality traditionally done
> by the kernel modules into the user space processes (either root or !root) and
> reduces security attack surface of the new code. The buggy umh code would 
> crash
> the user process, but not the kernel. Another advantage is that umh code
> of the kernel module can be debugged and tested out of user space
> (e.g. opening the possibility to run clang sanitizers, fuzzers or
> user space test suites on the umh code).
> In case of the bpfilter project such architecture allows complex control plane
> to be done in the user space while bpf based data plane stays in the kernel.
> 
> Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> the kernel module that uses them (like bpfilter) needs to manage life
> time of umh on its own via two unix pipes and the pid of umh.
> 
> The exit code of such kernel module should kill the umh it started,
> so that rmmod of the kernel module will cleanup the corresponding umh.
> Just like if the kernel module does kmalloc() it should kfree() it in the 
> exit code.
> 
> Signed-off-by: Alexei Starovoitov 
> ---
>  fs/exec.c   |  38 ---
>  include/linux/binfmts.h |   1 +
>  include/linux/umh.h |  12 
>  kernel/umh.c| 176 
> +++-
>  4 files changed, 215 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 183059c427b9..30a36c2a39bf 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
>  /*
>   * sys_execve() executes a new program.
>   */
> -static int do_execveat_common(int fd, struct filename *filename,
> -   struct user_arg_ptr argv,
> -   struct user_arg_ptr envp,
> -   int flags)
> +static int __do_execve_file(int fd, struct filename *filename,
> + struct user_arg_ptr argv,
> + struct user_arg_ptr envp,
> + int flags, struct file *file)
>  {
>   char *pathbuf = NULL;
>   struct linux_binprm *bprm;
> - struct file *file;
>   struct files_struct *displaced;
>   int retval;

Keeping in mind a fuzzer...

Note, right below this, and not shown here in the hunk, is:

if (IS_ERR(filename))   
return PTR_ERR(filename)
>  
> @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename 
> *filename,
>   check_unsafe_exec(bprm);
>   current->in_execve = 1;
>  
> - file = do_open_execat(fd, filename, flags);
> + if (!file)
> + file = do_open_execat(fd, filename, flags);


Here we now seem to allow !file and open the file with the passed fd as in
the old days. This is an expected change.

>   retval = PTR_ERR(file);
>   if (IS_ERR(file))
>   goto

Re: [PATCH v2 0/4] Introduce LSM-hook for socketpair(2)

2018-05-04 Thread James Morris

On Fri, 4 May 2018, David Herrmann wrote:

> Hi
> 
> This is v2 of the socketpair(2) LSM hook introduction.

Thanks, all applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
next-general


-- 
James Morris

Re: [PATCH 04/15] media: pxa_camera: remove the dmaengine compat need

2018-05-04 Thread Mauro Carvalho Chehab

Em Sun, 22 Apr 2018 13:06:12 +0200
Hans Verkuil  escreveu:

> On 04/02/2018 04:26 PM, Robert Jarzmik wrote:
> > From: Robert Jarzmik 
> > 
> > As the pxa architecture switched towards the dmaengine slave map, the
> > old compatibility mechanism to acquire the dma requestor line number and
> > priority are not needed anymore.
> > 
> > This patch simplifies the dma resource acquisition, using the more
> > generic function dma_request_slave_channel().
> > 
> > Signed-off-by: Robert Jarzmik   
> 
> Acked-by: Hans Verkuil 

I'm assuming that you'll be applying it together with other PXA patches.
So:

Acked-by: Mauro Carvalho Chehab 

Regards,
Mauro
> 
> Regards,
> 
>   Hans
> 
> > ---
> >  drivers/media/platform/pxa_camera.c | 22 +++---
> >  1 file changed, 3 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/media/platform/pxa_camera.c 
> > b/drivers/media/platform/pxa_camera.c
> > index c71a00736541..4c82d1880753 100644
> > --- a/drivers/media/platform/pxa_camera.c
> > +++ b/drivers/media/platform/pxa_camera.c
> > @@ -2357,8 +2357,6 @@ static int pxa_camera_probe(struct platform_device 
> > *pdev)
> > .src_maxburst = 8,
> > .direction = DMA_DEV_TO_MEM,
> > };
> > -   dma_cap_mask_t mask;
> > -   struct pxad_param params;
> > char clk_name[V4L2_CLK_NAME_SIZE];
> > int irq;
> > int err = 0, i;
> > @@ -2432,34 +2430,20 @@ static int pxa_camera_probe(struct platform_device 
> > *pdev)
> > pcdev->base = base;
> >  
> > /* request dma */
> > -   dma_cap_zero(mask);
> > -   dma_cap_set(DMA_SLAVE, mask);
> > -   dma_cap_set(DMA_PRIVATE, mask);
> > -
> > -   params.prio = 0;
> > -   params.drcmr = 68;
> > -   pcdev->dma_chans[0] =
> > -   dma_request_slave_channel_compat(mask, pxad_filter_fn,
> > -, >dev, "CI_Y");
> > +   pcdev->dma_chans[0] = dma_request_slave_channel(>dev, "CI_Y");
> > if (!pcdev->dma_chans[0]) {
> > dev_err(>dev, "Can't request DMA for Y\n");
> > return -ENODEV;
> > }
> >  
> > -   params.drcmr = 69;
> > -   pcdev->dma_chans[1] =
> > -   dma_request_slave_channel_compat(mask, pxad_filter_fn,
> > -, >dev, "CI_U");
> > +   pcdev->dma_chans[1] = dma_request_slave_channel(>dev, "CI_U");
> > if (!pcdev->dma_chans[1]) {
> > dev_err(>dev, "Can't request DMA for Y\n");
> > err = -ENODEV;
> > goto exit_free_dma_y;
> > }
> >  
> > -   params.drcmr = 70;
> > -   pcdev->dma_chans[2] =
> > -   dma_request_slave_channel_compat(mask, pxad_filter_fn,
> > -, >dev, "CI_V");
> > +   pcdev->dma_chans[2] = dma_request_slave_channel(>dev, "CI_V");
> > if (!pcdev->dma_chans[2]) {
> > dev_err(>dev, "Can't request DMA for V\n");
> > err = -ENODEV;
> >   
> 



Thanks,
Mauro

Re: [PATCH v5 0/6] firmware_loader: cleanups for v4.18

2018-05-04 Thread Krzysztof Halasa

"Luis R. Rodriguez"  writes:

>   * CONFIG_WANXL --> CONFIG_WANXL_BUILD_FIRMWARE
>   * CONFIG_SCSI_AIC79XX --> CONFIG_AIC79XX_BUILD_FIRMWARE
>
> To this day both of these drivers are building driver *firmwares* when
> the option CONFIG_PREVENT_FIRMWARE_BUILD is disabled, and they don't
> even make use of the firmware API at all.

Don't know for sure about Adaptec, but wanXL firmware fits every
definition of "stable". BTW it's a 1997 or so early PCI card, built
around the PLX PCI9060 bus mastering PCI bridge and Motorola 68360
(the original QUICC) processor. Maximum bit rate of 2 Mb/s on each sync
serial port.

It's more about delivering the .S source for the firmware, I guess.
Nobody is expected to build it. The fw is about 2.5 KB and is directly
linked with the driver.
-- 
Krzysztof Halasa

Re: [PATCH v2 bpf-next 0/3] bpf: cleanups on managing subprog information

2018-05-04 Thread Daniel Borkmann

On 05/02/2018 10:17 PM, Jiong Wang wrote:
> This patch set clean up some code logic related with managing subprog
> information.
> 
> Part of the set are inspried by Edwin's code in his RFC:
> 
>   "bpf/verifier: subprog/func_call simplifications"
> 
> but with clearer separation so it could be easier to review.
> 
>   - Path 1 unifies main prog and subprogs. All of them are registered in
> env->subprog_starts.
> 
>   - After patch 1, it is clear that subprog_starts and subprog_stack_depth
> could be merged as both of them now have main and subprog unified.
> Patch 2 therefore does the merge, all subprog information are centred
> at bpf_subprog_info.
> 
>   - Patch 3 goes further to introduce a new fake "exit" subprog which
> serves as an ending marker to the subprog list. We could then turn the
> following code snippets across verifier:
> 
>if (env->subprog_cnt == cur_subprog + 1)
>subprog_end = insn_cnt;
>else
>subprog_end = env->subprog_info[cur_subprog + 1].start;
> 
> into:
>subprog_end = env->subprog_info[cur_subprog + 1].start;
> 
> There is no functional change by this patch set.
> No bpf selftest (both non-jit and jit) regression found after this set.
> 
> v2:
>   - fixed adjust_subprog_starts to also update fake "exit" subprog start.
>   - for John's suggestion on renaming subprog to prog, I could work on
> a follow-up patch if it is recognized as worth the change.
> 
> Jiong Wang (3):
>   bpf: unify main prog and subprog
>   bpf: centre subprog information fields
>   bpf: add faked "ending" subprog
> 
>  include/linux/bpf_verifier.h |   9 ++--
>  kernel/bpf/verifier.c| 121 
> ++-
>  2 files changed, 67 insertions(+), 63 deletions(-)
> 

LGTM, applied to bpf-next, thanks Jiong!

Re: [PATCH net-next v8 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc

2018-05-04 Thread Toke Høiland-Jørgensen

Thank you for the review! A few comments below, I'll fix the rest.

> [...]
> 
> So sch_cake doesn't accept normal tc filters? Is this intentional?
> If so, why?

For two reasons:

- The two-level scheduling used in CAKE (tins / diffserv classes, and
  flow hashing) does not map in an obvious way to the classification
  index of tc filters.

- No one has asked for it. We have done our best to accommodate the
  features people want in a home router qdisc directly in CAKE, and the
  ability to integrate tc filters has never been requested.

>> +static u16 quantum_div[CAKE_QUEUES + 1] = {0};
>> +
>> +#define REC_INV_SQRT_CACHE (16)
>> +static u32 cobalt_rec_inv_sqrt_cache[REC_INV_SQRT_CACHE] = {0};
>> +
>> +/* http://en.wikipedia.org/wiki/Methods_of_computing_square_roots
>> + * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
>> + *
>> + * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
>> + */
>> +
>> +static void cobalt_newton_step(struct cobalt_vars *vars)
>> +{
>> +   u32 invsqrt = vars->rec_inv_sqrt;
>> +   u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
>> +   u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
>> +
>> +   val >>= 2; /* avoid overflow in following multiply */
>> +   val = (val * invsqrt) >> (32 - 2 + 1);
>> +
>> +   vars->rec_inv_sqrt = val;
>> +}
>> +
>> +static void cobalt_invsqrt(struct cobalt_vars *vars)
>> +{
>> +   if (vars->count < REC_INV_SQRT_CACHE)
>> +   vars->rec_inv_sqrt = cobalt_rec_inv_sqrt_cache[vars->count];
>> +   else
>> +   cobalt_newton_step(vars);
>> +}
>
> Looks pretty much duplicated with codel...

Cobalt is derived from CoDel, and so naturally shares some features with
it. However, it is quite different in other respects, so we can't just
use the existing CoDel code for the parts that are similar. We don't
feel quite confident enough in Cobalt (yet) to propose it replace CoDel
everywhere else in the kernel; so we have elected to keep it internal to
CAKE instead.

>> [...]
>>
>> +static int cake_init(struct Qdisc *sch, struct nlattr *opt,
>> +struct netlink_ext_ack *extack)
>> +{
>> +   struct cake_sched_data *q = qdisc_priv(sch);
>> +   int i, j;
>> +
>> +   sch->limit = 10240;
>> +   q->tin_mode = CAKE_DIFFSERV_BESTEFFORT;
>> +   q->flow_mode  = CAKE_FLOW_TRIPLE;
>> +
>> +   q->rate_bps = 0; /* unlimited by default */
>> +
>> +   q->interval = 10; /* 100ms default */
>> +   q->target   =   5000; /* 5ms: codel RFC argues
>> +  * for 5 to 10% of interval
>> +  */
>> +
>> +   q->cur_tin = 0;
>> +   q->cur_flow  = 0;
>> +
>> +   if (opt) {
>> +   int err = cake_change(sch, opt, extack);
>> +
>> +   if (err)
>> +   return err;
>
>
> Not sure if you really want to reallocate q->tines below for this
> case.

I'm not sure what you mean here? If there's an error we return it and
the qdisc is not created. If there's not, we allocate and on subsequent
changes cake_change() will be called directly, or? Can the init function
ever be called again during the lifetime of the qdisc?

-Toke

Re: [PATCH] net/mlx5: fix spelling mistake: "modfiy" -> "modify"

2018-05-04 Thread Saeed Mahameed

On Thu, 2018-05-03 at 14:44 -0400, David Miller wrote:
> From: Colin King 
> Date: Thu,  3 May 2018 14:35:03 +0100
> 
> > From: Colin Ian King 
> > 
> > Trivial fix to spelling mistake in netdev_warn warning message
> > 
> > Signed-off-by: Colin Ian King 
> 
> Saeed, please send this to me in your next pull request.
> 

Applied to mlx5-next, Thanks Colin and Dave !

Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Sebastian Andrzej Siewior

On 2018-05-04 20:51:32 [+0200], Peter Zijlstra wrote:
> On Fri, May 04, 2018 at 08:45:39PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> > > On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > > > From: Anna-Maria Gleixner 
> > > > 
> > > > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to 
> > > > ensure
> > > > the softirq context for the subsequent netif_receive_skb() call. 
> > > 
> > > That's not in fact what it does though; so while that might indeed be
> > > the intent that's not what it does.
> > 
> > It was introduced in commit d20ef63d3246 ("mac80211: document
> > ieee80211_rx() context requirement"):
> > 
> > mac80211: document ieee80211_rx() context requirement
> > 
> > ieee80211_rx() must be called with softirqs disabled
> 
> softirqs disabled, ack that is exactly what it checks.
> 
> But afaict the assertion you introduced tests that we are _in_ softirq
> context, which is not the same.

indeed, now it clicked. Given what I wrote in the cover letter would you
be in favour of (a proper) lockdep_assert_BH_disabled() or the cheaper
local_bh_enable() (assuming the network folks don't mind the cheaper
version)?

Sebastian

Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Peter Zijlstra

On Fri, May 04, 2018 at 08:45:39PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> > On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > > From: Anna-Maria Gleixner 
> > > 
> > > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
> > > the softirq context for the subsequent netif_receive_skb() call. 
> > 
> > That's not in fact what it does though; so while that might indeed be
> > the intent that's not what it does.
> 
> It was introduced in commit d20ef63d3246 ("mac80211: document
> ieee80211_rx() context requirement"):
> 
> mac80211: document ieee80211_rx() context requirement
> 
> ieee80211_rx() must be called with softirqs disabled

softirqs disabled, ack that is exactly what it checks.

But afaict the assertion you introduced tests that we are _in_ softirq
context, which is not the same.

Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Sebastian Andrzej Siewior

On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > From: Anna-Maria Gleixner 
> > 
> > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
> > the softirq context for the subsequent netif_receive_skb() call. 
> 
> That's not in fact what it does though; so while that might indeed be
> the intent that's not what it does.

It was introduced in commit d20ef63d3246 ("mac80211: document
ieee80211_rx() context requirement"):

mac80211: document ieee80211_rx() context requirement

ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.

It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.

Sebastian

Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Peter Zijlstra

On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> From: Anna-Maria Gleixner 
> 
> The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
> the softirq context for the subsequent netif_receive_skb() call. 

That's not in fact what it does though; so while that might indeed be
the intent that's not what it does.

[net-next PATCH v2 8/8] net: Add NETIF_F_GSO_UDP_L4 to list of GSO offloads with fallback

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

Enable UDP offload as a generic software offload since we can now handle it
for multiple cases including if the checksum isn't present and via
gso_partial in the case of tunnels.

Signed-off-by: Alexander Duyck 
---
 include/linux/netdev_features.h |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index c87c3a3453c1..efbd8b2c0197 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -184,7 +184,8 @@ enum {
 
 /* List of features with software fallbacks. */
 #define NETIF_F_GSO_SOFTWARE   (NETIF_F_ALL_TSO | \
-NETIF_F_GSO_SCTP)
+NETIF_F_GSO_SCTP| \
+NETIF_F_GSO_UDP_L4)
 
 /*
  * If one device supports one of these features, then enable them

[net-next PATCH v2 7/8] udp: Do not copy destructor if one is not present

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

This patch makes it so that if a destructor is not present we avoid trying
to update the skb socket or any reference counting that would be associated
with the NULL socket and/or descriptor. By doing this we can support
traffic coming from another namespace without any issues.

Signed-off-by: Alexander Duyck 
---

v2: Do not overwrite destructor if not sock_wfree as per Eric
Drop WARN_ON as per Eric

 net/ipv4/udp_offload.c |   21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index fd94bbb369b2..e802f6331589 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -195,6 +195,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
unsigned int sum_truesize = 0;
struct udphdr *uh;
unsigned int mss;
+   bool copy_dtor;
__sum16 check;
__be16 newlen;
 
@@ -208,12 +209,14 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
skb_pull(gso_skb, sizeof(*uh));
 
/* clear destructor to avoid skb_segment assigning it to tail */
-   WARN_ON_ONCE(gso_skb->destructor != sock_wfree);
-   gso_skb->destructor = NULL;
+   copy_dtor = gso_skb->destructor == sock_wfree;
+   if (copy_dtor)
+   gso_skb->destructor = NULL;
 
segs = skb_segment(gso_skb, features);
if (unlikely(IS_ERR_OR_NULL(segs))) {
-   gso_skb->destructor = sock_wfree;
+   if (copy_dtor)
+   gso_skb->destructor = sock_wfree;
return segs;
}
 
@@ -234,9 +237,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
(__force u32)newlen));
 
for (;;) {
-   seg->destructor = sock_wfree;
-   seg->sk = sk;
-   sum_truesize += seg->truesize;
+   if (copy_dtor) {
+   seg->destructor = sock_wfree;
+   seg->sk = sk;
+   sum_truesize += seg->truesize;
+   }
 
if (!seg->next)
break;
@@ -268,7 +273,9 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->check = gso_make_checksum(seg, ~check);
 
/* update refcount for the packet */
-   refcount_add(sum_truesize - gso_skb->truesize, >sk_wmem_alloc);
+   if (copy_dtor)
+   refcount_add(sum_truesize - gso_skb->truesize,
+>sk_wmem_alloc);
 out:
return segs;
 }

[net-next PATCH v2 6/8] udp: Add support for software checksum and GSO_PARTIAL with GSO offload

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

This patch adds support for a software provided checksum and GSO_PARTIAL
segmentation support. With this we can offload UDP segmentation on devices
that only have partial support for tunnels.

Since we are no longer needing the hardware checksum we can drop the checks
in the segmentation code that were verifying if it was present.

Signed-off-by: Alexander Duyck 
---
 net/ipv4/udp_offload.c |   28 ++--
 net/ipv6/udp_offload.c |   11 +--
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 946d06d2aa0c..fd94bbb369b2 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -217,6 +217,13 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return segs;
}
 
+   /* GSO partial and frag_list segmentation only requires splitting
+* the frame into an MSS multiple and possibly a remainder, both
+* cases return a GSO skb. So update the mss now.
+*/
+   if (skb_is_gso(segs))
+   mss *= skb_shinfo(segs)->gso_segs;
+
seg = segs;
uh = udp_hdr(seg);
 
@@ -237,6 +244,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->len = newlen;
uh->check = check;
 
+   if (seg->ip_summed == CHECKSUM_PARTIAL)
+   gso_reset_checksum(seg, ~check);
+   else
+   uh->check = gso_make_checksum(seg, ~check);
+
seg = seg->next;
uh = udp_hdr(seg);
}
@@ -250,6 +262,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->len = newlen;
uh->check = check;
 
+   if (seg->ip_summed == CHECKSUM_PARTIAL)
+   gso_reset_checksum(seg, ~check);
+   else
+   uh->check = gso_make_checksum(seg, ~check);
+
/* update refcount for the packet */
refcount_add(sum_truesize - gso_skb->truesize, >sk_wmem_alloc);
 out:
@@ -257,15 +274,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 }
 EXPORT_SYMBOL_GPL(__udp_gso_segment);
 
-static struct sk_buff *__udp4_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features)
-{
-   if (!can_checksum_protocol(features, htons(ETH_P_IP)))
-   return ERR_PTR(-EIO);
-
-   return __udp_gso_segment(gso_skb, features);
-}
-
 static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 netdev_features_t features)
 {
@@ -289,7 +297,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
*skb,
goto out;
 
if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
-   return __udp4_gso_segment(skb, features);
+   return __udp_gso_segment(skb, features);
 
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 61e34f1d2fa2..03a2ff3fe1e6 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -17,15 +17,6 @@
 #include 
 #include "ip6_offload.h"
 
-static struct sk_buff *__udp6_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features)
-{
-   if (!can_checksum_protocol(features, htons(ETH_P_IPV6)))
-   return ERR_PTR(-EIO);
-
-   return __udp_gso_segment(gso_skb, features);
-}
-
 static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 netdev_features_t features)
 {
@@ -58,7 +49,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
goto out;
 
if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
-   return __udp6_gso_segment(skb, features);
+   return __udp_gso_segment(skb, features);
 
/* Do software UFO. Complete and fill in the UDP checksum as HW 
cannot
 * do checksum of UDP packets sent as multiple IP fragments.

[net-next PATCH v2 4/8] udp: Do not pass checksum as a parameter to GSO segmentation

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

This patch is meant to allow us to avoid having to recompute the checksum
from scratch and have it passed as a parameter.

Instead of taking that approach we can take advantage of the fact that the
length that was used to compute the existing checksum is included in the
UDP header. If we cancel that out by adding the value XOR with 0x we
can then just add the new length in and fold that into the new result.

I think this may be fixing a checksum bug in the original code as well
since the checksum that was passed included the UDP header in the checksum
computation, but then excluded it for the adjustment on the last frame. I
believe this may have an effect on things in the cases where the two differ
by bits that would result in things crossing the byte boundaries.

Signed-off-by: Alexander Duyck 
---

v2: New break-out patch based on one patch from earlier series

 include/net/udp.h  |3 +--
 net/ipv4/udp_offload.c |   35 +--
 net/ipv6/udp_offload.c |7 +--
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 8bd83b044ecd..9289b6425032 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,8 +175,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
struct sk_buff *skb,
 int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features,
- __sum16 check);
+ netdev_features_t features);
 
 static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
 {
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f21b63adcbbc..5c4bb8b9e83e 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -188,8 +188,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 EXPORT_SYMBOL(skb_udp_tunnel_segment);
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features,
- __sum16 check)
+ netdev_features_t features)
 {
struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
struct sock *sk = gso_skb->sk;
@@ -197,6 +196,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
unsigned int hdrlen;
struct udphdr *uh;
unsigned int mss;
+   __sum16 check;
+   __be16 newlen;
 
mss = skb_shinfo(gso_skb)->gso_size;
if (gso_skb->len <= sizeof(*uh) + mss)
@@ -218,17 +219,28 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return segs;
}
 
+   uh = udp_hdr(segs);
+
+   /* compute checksum adjustment based on old length versus new */
+   newlen = htons(sizeof(*uh) + mss);
+   check = ~csum_fold((__force __wsum)((__force u32)uh->check +
+   ((__force u32)uh->len ^ 0x) +
+   (__force u32)newlen));
+
for (seg = segs; seg; seg = seg->next) {
uh = udp_hdr(seg);
-   uh->len = htons(seg->len - hdrlen);
-   uh->check = check;
 
/* last packet can be partial gso_size */
-   if (!seg->next)
-   csum_replace2(>check, htons(mss),
- htons(seg->len - hdrlen - sizeof(*uh)));
+   if (!seg->next) {
+   newlen = htons(seg->len - hdrlen);
+   check = ~csum_fold((__force __wsum)((__force 
u32)uh->check +
+   ((__force 
u32)uh->len ^ 0x) +
+   (__force 
u32)newlen));
+   }
+
+   uh->len = newlen;
+   uh->check = check;
 
-   uh->check = ~uh->check;
seg->destructor = sock_wfree;
seg->sk = sk;
sum_truesize += seg->truesize;
@@ -243,15 +255,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 static struct sk_buff *__udp4_gso_segment(struct sk_buff *gso_skb,
  netdev_features_t features)
 {
-   const struct iphdr *iph = ip_hdr(gso_skb);
-   unsigned int mss = skb_shinfo(gso_skb)->gso_size;
-
if (!can_checksum_protocol(features, htons(ETH_P_IP)))
return ERR_PTR(-EIO);
 
-   return __udp_gso_segment(gso_skb, features,
-udp_v4_check(sizeof(struct udphdr) + mss,
- iph->saddr, iph->daddr, 0));
+   return __udp_gso_segment(gso_skb, features);
 }
 
 static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
diff --git a/net/ipv6/udp_offload.c

[net-next PATCH v2 5/8] udp: Partially unroll handling of first segment and last segment

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

This patch allows us to take care of unrolling the first segment and the
last segment of the loop for processing the segmented skb. Part of the
motivation for this is that it makes it easier to process the fact that the
first fame and all of the frames in between should be mostly identical
in terms of header data, and the last frame has differences in the length
and partial checksum.

In addition I am dropping the header length calculation since we don't
really need it for anything but the last frame and it can be easily
obtained by just pulling the data_len and offset of tail from the transport
header.

Signed-off-by: Alexander Duyck 
---

v2: New break-out patch based on one patch from earlier series

 net/ipv4/udp_offload.c |   35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 5c4bb8b9e83e..946d06d2aa0c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -193,7 +193,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
struct sock *sk = gso_skb->sk;
unsigned int sum_truesize = 0;
-   unsigned int hdrlen;
struct udphdr *uh;
unsigned int mss;
__sum16 check;
@@ -206,7 +205,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
if (!pskb_may_pull(gso_skb, sizeof(*uh)))
goto out;
 
-   hdrlen = gso_skb->data - skb_mac_header(gso_skb);
skb_pull(gso_skb, sizeof(*uh));
 
/* clear destructor to avoid skb_segment assigning it to tail */
@@ -219,7 +217,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return segs;
}
 
-   uh = udp_hdr(segs);
+   seg = segs;
+   uh = udp_hdr(seg);
 
/* compute checksum adjustment based on old length versus new */
newlen = htons(sizeof(*uh) + mss);
@@ -227,25 +226,31 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
((__force u32)uh->len ^ 0x) +
(__force u32)newlen));
 
-   for (seg = segs; seg; seg = seg->next) {
-   uh = udp_hdr(seg);
+   for (;;) {
+   seg->destructor = sock_wfree;
+   seg->sk = sk;
+   sum_truesize += seg->truesize;
 
-   /* last packet can be partial gso_size */
-   if (!seg->next) {
-   newlen = htons(seg->len - hdrlen);
-   check = ~csum_fold((__force __wsum)((__force 
u32)uh->check +
-   ((__force 
u32)uh->len ^ 0x) +
-   (__force 
u32)newlen));
-   }
+   if (!seg->next)
+   break;
 
uh->len = newlen;
uh->check = check;
 
-   seg->destructor = sock_wfree;
-   seg->sk = sk;
-   sum_truesize += seg->truesize;
+   seg = seg->next;
+   uh = udp_hdr(seg);
}
 
+   /* last packet can be partial gso_size, account for that in checksum */
+   newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
+  seg->data_len);
+   check = ~csum_fold((__force __wsum)((__force u32)uh->check +
+   ((__force u32)uh->len ^ 0x)  +
+   (__force u32)newlen));
+   uh->len = newlen;
+   uh->check = check;
+
+   /* update refcount for the packet */
refcount_add(sum_truesize - gso_skb->truesize, >sk_wmem_alloc);
 out:
return segs;

[net-next PATCH v2 3/8] udp: Do not pass MSS as parameter to GSO segmentation

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

There is no point in passing MSS as a parameter for for the GSO
segmentation call as it is already available via the shared info for the
skb itself.

Signed-off-by: Alexander Duyck 
---

v2: New break-out patch based on one patch from earlier series

 include/net/udp.h  |2 +-
 net/ipv4/udp_offload.c |6 --
 net/ipv6/udp_offload.c |2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 05990746810e..8bd83b044ecd 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -176,7 +176,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
struct sk_buff *skb,
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
  netdev_features_t features,
- unsigned int mss, __sum16 check);
+ __sum16 check);
 
 static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
 {
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 8303fff42940..f21b63adcbbc 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -189,14 +189,16 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff 
*skb,
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
  netdev_features_t features,
- unsigned int mss, __sum16 check)
+ __sum16 check)
 {
struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
struct sock *sk = gso_skb->sk;
unsigned int sum_truesize = 0;
unsigned int hdrlen;
struct udphdr *uh;
+   unsigned int mss;
 
+   mss = skb_shinfo(gso_skb)->gso_size;
if (gso_skb->len <= sizeof(*uh) + mss)
goto out;
 
@@ -247,7 +249,7 @@ static struct sk_buff *__udp4_gso_segment(struct sk_buff 
*gso_skb,
if (!can_checksum_protocol(features, htons(ETH_P_IP)))
return ERR_PTR(-EIO);
 
-   return __udp_gso_segment(gso_skb, features, mss,
+   return __udp_gso_segment(gso_skb, features,
 udp_v4_check(sizeof(struct udphdr) + mss,
  iph->saddr, iph->daddr, 0));
 }
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index f7b85b1e6b3e..dea03ec09715 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -26,7 +26,7 @@ static struct sk_buff *__udp6_gso_segment(struct sk_buff 
*gso_skb,
if (!can_checksum_protocol(features, htons(ETH_P_IPV6)))
return ERR_PTR(-EIO);
 
-   return __udp_gso_segment(gso_skb, features, mss,
+   return __udp_gso_segment(gso_skb, features,
 udp_v6_check(sizeof(struct udphdr) + mss,
  >saddr, >daddr, 0));
 }

[net-next PATCH v2 2/8] udp: Verify that pulling UDP header in GSO segmentation doesn't fail

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

We should verify that we can pull the UDP header before we attempt to do
so. Otherwise if this fails we have no way of knowing and GSO will not work
correctly.

Signed-off-by: Alexander Duyck 
---

v2: New break-out patch based on one patch from earlier series

 net/ipv4/udp_offload.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 006257092f06..8303fff42940 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -191,14 +191,17 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
  netdev_features_t features,
  unsigned int mss, __sum16 check)
 {
+   struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
struct sock *sk = gso_skb->sk;
unsigned int sum_truesize = 0;
-   struct sk_buff *segs, *seg;
unsigned int hdrlen;
struct udphdr *uh;
 
if (gso_skb->len <= sizeof(*uh) + mss)
-   return ERR_PTR(-EINVAL);
+   goto out;
+
+   if (!pskb_may_pull(gso_skb, sizeof(*uh)))
+   goto out;
 
hdrlen = gso_skb->data - skb_mac_header(gso_skb);
skb_pull(gso_skb, sizeof(*uh));
@@ -230,7 +233,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
}
 
refcount_add(sum_truesize - gso_skb->truesize, >sk_wmem_alloc);
-
+out:
return segs;
 }
 EXPORT_SYMBOL_GPL(__udp_gso_segment);

[net-next PATCH v2 1/8] udp: Record gso_segs when supporting UDP segmentation offload

2018-05-04 Thread Alexander Duyck

From: Alexander Duyck 

We need to record the number of segments that will be generated when this
frame is segmented. The expectation is that if gso_size is set then
gso_segs is set as well. Without this some drivers such as ixgbe get
confused if they attempt to offload this as they record 0 segments for the
entire packet instead of the correct value.

Reviewed-by: Eric Dumazet 
Signed-off-by: Alexander Duyck 
---
 net/ipv4/udp.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index dd3102a37ef9..e07db83b311e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -793,6 +793,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 
*fl4,
 
skb_shinfo(skb)->gso_size = cork->gso_size;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+   skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
+cork->gso_size);
goto csum_partial;
}

[net-next PATCH v2 0/8] UDP GSO Segmentation clean-ups

2018-05-04 Thread Alexander Duyck

This patch set addresses a number of issues I found while sorting out
enabling UDP GSO Segmentation support for ixgbe/ixgbevf. Specifically there
were a number of issues related to the checksum and such that seemed to
cause either minor irregularities or kernel panics in the case of the
offload request being allowed to traverse between name spaces.

With this set applied I am was able to get UDP GSO traffic to pass over
vxlan tunnels in both offloaded modes and non-offloaded modes for ixgbe and
ixgbevf.

I submitted the driver specific patches earlier as an RFC:
https://patchwork.ozlabs.org/project/netdev/list/?series=42477=both=*

v2: Updated patches based on feedback from Eric Dumazet
Split first patch into several patches based on feedback from Eric

---

Alexander Duyck (8):
  udp: Record gso_segs when supporting UDP segmentation offload
  udp: Verify that pulling UDP header in GSO segmentation doesn't fail
  udp: Do not pass MSS as parameter to GSO segmentation
  udp: Do not pass checksum as a parameter to GSO segmentation
  udp: Partially unroll handling of first segment and last segment
  udp: Add support for software checksum and GSO_PARTIAL with GSO offload
  udp: Do not copy destructor if one is not present
  net: Add NETIF_F_GSO_UDP_L4 to list of GSO offloads with fallback


 include/linux/netdev_features.h |3 +
 include/net/udp.h   |3 -
 net/ipv4/udp.c  |2 +
 net/ipv4/udp_offload.c  |  104 ++-
 net/ipv6/udp_offload.c  |   16 --
 5 files changed, 74 insertions(+), 54 deletions(-)

--

[PATCH net] Added support for 802.1ad Q in Q Ethernet tagged frames

2018-05-04 Thread Elad Nachman

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before 
calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to 
__vlan_hwaccel_put_tag() .

This causes packets not to be passed to the VLAN slave if it was created with 
802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
instead of using the hard-coded value of ETH_P_8021Q.

Signed-off-by: Elad Nachman 


---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b65e2d1..ced2d34 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3293,17 +3293,19 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
 static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
 {
-   struct ethhdr *ehdr;
+   struct vlan_ethhdr *veth;
u16 vlanid;
+   __be16 vlan_proto;
 
if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
NETIF_F_HW_VLAN_CTAG_RX &&
!__vlan_get_tag(skb, )) {
/* pop the vlan tag */
-   ehdr = (struct ethhdr *)skb->data;
-   memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
+   veth = (struct vlan_ethhdr *)skb->data;
+   vlan_proto = veth->h_vlan_proto;
+   memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
skb_pull(skb, VLAN_HLEN);
-   __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+   __vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
}
 }
 
-- 
2.7.4

Hello

2018-05-04 Thread Faruk Sakawo

I have a business to discuss with you, can we talk?


Regards
Faruk Sakawo

[PATCH V2 net-next] liquidio: support use of ethtool to set link speed of CN23XX-225 cards

2018-05-04 Thread Felix Manlunas

From: Weilin Chang 

Support setting the link speed of CN23XX-225 cards (which can do 25Gbps or
10Gbps) via ethtool_ops.set_link_ksettings.

Also fix the function assigned to ethtool_ops.get_link_ksettings to use the
new link_ksettings api completely (instead of partially via
ethtool_convert_legacy_u32_to_link_mode).

Signed-off-by: Weilin Chang 
Acked-by: Raghu Vatsavayi 
Signed-off-by: Felix Manlunas 
---
Patch Change Log:
  V1 -> V2:
In lio_set_link_ksettings(), set the order of local variable
declarations from longest to shortest line.

 drivers/net/ethernet/cavium/liquidio/lio_core.c| 196 +
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 195 +---
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  20 +++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |   5 +
 .../net/ethernet/cavium/liquidio/liquidio_common.h |   4 +
 .../net/ethernet/cavium/liquidio/octeon_device.h   |  14 ++
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  15 ++
 7 files changed, 425 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c 
b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 6821afc..8093c5e 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -1481,3 +1481,199 @@ int octnet_get_link_stats(struct net_device *netdev)
 
return 0;
 }
+
+static void liquidio_nic_seapi_ctl_callback(struct octeon_device *oct,
+   u32 status,
+   void *buf)
+{
+   struct liquidio_nic_seapi_ctl_context *ctx;
+   struct octeon_soft_command *sc = buf;
+
+   ctx = sc->ctxptr;
+
+   oct = lio_get_device(ctx->octeon_id);
+   if (status) {
+   dev_err(>pci_dev->dev, "%s: instruction failed. Status: 
%llx\n",
+   __func__,
+   CVM_CAST64(status));
+   }
+   ctx->status = status;
+   complete(>complete);
+}
+
+int liquidio_set_speed(struct lio *lio, int speed)
+{
+   struct liquidio_nic_seapi_ctl_context *ctx;
+   struct octeon_device *oct = lio->oct_dev;
+   struct oct_nic_seapi_resp *resp;
+   struct octeon_soft_command *sc;
+   union octnet_cmd *ncmd;
+   u32 ctx_size;
+   int retval;
+   u32 var;
+
+   if (oct->speed_setting == speed)
+   return 0;
+
+   if (!OCTEON_CN23XX_PF(oct)) {
+   dev_err(>pci_dev->dev, "%s: SET SPEED only for PF\n",
+   __func__);
+   return -EOPNOTSUPP;
+   }
+
+   ctx_size = sizeof(struct liquidio_nic_seapi_ctl_context);
+   sc = octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE,
+  sizeof(struct oct_nic_seapi_resp),
+  ctx_size);
+   if (!sc)
+   return -ENOMEM;
+
+   ncmd = sc->virtdptr;
+   ctx  = sc->ctxptr;
+   resp = sc->virtrptr;
+   memset(resp, 0, sizeof(struct oct_nic_seapi_resp));
+
+   ctx->octeon_id = lio_get_device_id(oct);
+   ctx->status = 0;
+   init_completion(>complete);
+
+   ncmd->u64 = 0;
+   ncmd->s.cmd = SEAPI_CMD_SPEED_SET;
+   ncmd->s.param1 = speed;
+
+   octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3));
+
+   sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+   octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+   OPCODE_NIC_UBOOT_CTL, 0, 0, 0);
+
+   sc->callback = liquidio_nic_seapi_ctl_callback;
+   sc->callback_arg = sc;
+   sc->wait_time = 5000;
+
+   retval = octeon_send_soft_command(oct, sc);
+   if (retval == IQ_SEND_FAILED) {
+   dev_info(>pci_dev->dev, "Failed to send soft command\n");
+   retval = -EBUSY;
+   } else {
+   /* Wait for response or timeout */
+   if (wait_for_completion_timeout(>complete,
+   msecs_to_jiffies(1)) == 0) {
+   dev_err(>pci_dev->dev, "%s: sc timeout\n",
+   __func__);
+   octeon_free_soft_command(oct, sc);
+   return -EINTR;
+   }
+
+   retval = resp->status;
+
+   if (retval) {
+   dev_err(>pci_dev->dev, "%s failed, retval=%d\n",
+   __func__, retval);
+   octeon_free_soft_command(oct, sc);
+   return -EIO;
+   }
+
+   var = be32_to_cpu((__force __be32)resp->speed);
+   if (var != speed) {
+   dev_err(>pci_dev->dev,
+   "%s: setting failed speed= %x, expect %x\n",
+   __func__, var, speed);
+

Re: [PATCH net-next v8 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc

2018-05-04 Thread Cong Wang

On Fri, May 4, 2018 at 7:02 AM, Toke Høiland-Jørgensen  wrote:
> +struct cake_sched_data {
> +   struct cake_tin_data *tins;
> +
> +   struct cake_heap_entry overflow_heap[CAKE_QUEUES * CAKE_MAX_TINS];
> +   u16 overflow_timeout;
> +
> +   u16 tin_cnt;
> +   u8  tin_mode;
> +   u8  flow_mode;
> +   u8  ack_filter;
> +   u8  atm_mode;
> +
> +   /* time_next = time_this + ((len * rate_ns) >> rate_shft) */
> +   u16 rate_shft;
> +   u64 time_next_packet;
> +   u64 failsafe_next_packet;
> +   u32 rate_ns;
> +   u32 rate_bps;
> +   u16 rate_flags;
> +   s16 rate_overhead;
> +   u16 rate_mpu;
> +   u32 interval;
> +   u32 target;
> +
> +   /* resource tracking */
> +   u32 buffer_used;
> +   u32 buffer_max_used;
> +   u32 buffer_limit;
> +   u32 buffer_config_limit;
> +
> +   /* indices for dequeue */
> +   u16 cur_tin;
> +   u16 cur_flow;
> +
> +   struct qdisc_watchdog watchdog;
> +   const u8*tin_index;
> +   const u8*tin_order;
> +
> +   /* bandwidth capacity estimate */
> +   u64 last_packet_time;
> +   u64 avg_packet_interval;
> +   u64 avg_window_begin;
> +   u32 avg_window_bytes;
> +   u32 avg_peak_bandwidth;
> +   u64 last_reconfig_time;
> +
> +   /* packet length stats */
> +   u32 avg_netoff;
> +   u16 max_netlen;
> +   u16 max_adjlen;
> +   u16 min_netlen;
> +   u16 min_adjlen;
> +};


So sch_cake doesn't accept normal tc filters? Is this intentional?
If so, why?


> +static u16 quantum_div[CAKE_QUEUES + 1] = {0};
> +
> +#define REC_INV_SQRT_CACHE (16)
> +static u32 cobalt_rec_inv_sqrt_cache[REC_INV_SQRT_CACHE] = {0};
> +
> +/* http://en.wikipedia.org/wiki/Methods_of_computing_square_roots
> + * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
> + *
> + * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
> + */
> +
> +static void cobalt_newton_step(struct cobalt_vars *vars)
> +{
> +   u32 invsqrt = vars->rec_inv_sqrt;
> +   u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
> +   u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
> +
> +   val >>= 2; /* avoid overflow in following multiply */
> +   val = (val * invsqrt) >> (32 - 2 + 1);
> +
> +   vars->rec_inv_sqrt = val;
> +}
> +
> +static void cobalt_invsqrt(struct cobalt_vars *vars)
> +{
> +   if (vars->count < REC_INV_SQRT_CACHE)
> +   vars->rec_inv_sqrt = cobalt_rec_inv_sqrt_cache[vars->count];
> +   else
> +   cobalt_newton_step(vars);
> +}


Looks pretty much duplicated with codel...


> +
> +/* There is a big difference in timing between the accurate values placed in
> + * the cache and the approximations given by a single Newton step for small
> + * count values, particularly when stepping from count 1 to 2 or vice versa.
> + * Above 16, a single Newton step gives sufficient accuracy in either
> + * direction, given the precision stored.
> + *
> + * The magnitude of the error when stepping up to count 2 is such as to give
> + * the value that *should* have been produced at count 4.
> + */
> +
> +static void cobalt_cache_init(void)
> +{
> +   struct cobalt_vars v;
> +
> +   memset(, 0, sizeof(v));
> +   v.rec_inv_sqrt = ~0U;
> +   cobalt_rec_inv_sqrt_cache[0] = v.rec_inv_sqrt;
> +
> +   for (v.count = 1; v.count < REC_INV_SQRT_CACHE; v.count++) {
> +   cobalt_newton_step();
> +   cobalt_newton_step();
> +   cobalt_newton_step();
> +   cobalt_newton_step();
> +
> +   cobalt_rec_inv_sqrt_cache[v.count] = v.rec_inv_sqrt;
> +   }
> +}
> +
> +static void cobalt_vars_init(struct cobalt_vars *vars)
> +{
> +   memset(vars, 0, sizeof(*vars));
> +
> +   if (!cobalt_rec_inv_sqrt_cache[0]) {
> +   cobalt_cache_init();
> +   cobalt_rec_inv_sqrt_cache[0] = ~0;
> +   }
> +}
> +
> +/* CoDel control_law is t + interval/sqrt(count)
> + * We maintain in rec_inv_sqrt the reciprocal value of sqrt(count) to avoid
> + * both sqrt() and divide operation.
> + */
> +static cobalt_time_t cobalt_control(cobalt_time_t t,
> +   cobalt_time_t interval,
> +   u32 rec_inv_sqrt)
> +{
> +   return t + reciprocal_scale(interval, rec_inv_sqrt);
> +}
> +
> +/* Call this when a packet had to be dropped due to queue overflow.  Returns
> + * true if the BLUE state was quiescent before but active after this call.
> + */
> +static bool cobalt_queue_full(struct cobalt_vars *vars,
> +

Re: [PATCH] net: Work around crash in ipv6 fib-walk-continue

2018-05-04 Thread Ben Greear


On 05/04/2018 10:47 AM, David Ahern wrote:

On 4/19/18 12:01 PM, gree...@candelatech.com wrote:

From: Ben Greear 

This keeps us from crashing in certain test cases where we
bring up many (1000, for instance) mac-vlans with IPv6
enabled in the kernel.  This bug has been around for a
very long time.

Until a real fix is found (and for stable), maybe it
is better to return an incomplete fib walk instead
of crashing.

BUG: unable to handle kernel NULL pointer dereference at 8
IP: fib6_walk_continue+0x5b/0x140 [ipv6]
PGD 8007dfc0c067 P4D 8007dfc0c067 PUD 7e66ff067 PMD 0
Oops:  [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 
libcrc32c vrf]
CPU: 3 PID: 15117 Comm: ip Tainted: G   O 4.16.0+ #5
Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
RSP: 0018:c90008c3bc10 EFLAGS: 00010287
RAX: 88085ac45050 RBX: 8807e03008a0 RCX: 
RDX:  RSI: c90008c3bc48 RDI: 8232b240
RBP: 880819167600 R08: 0008 R09: 8807dff10071
R10: c90008c3bbd0 R11:  R12: 8807e03008a0
R13: 0002 R14: 8807e05744c8 R15: 8807e08ef000
FS:  7f2f04342700() GS:88087fcc() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 0007e0556002 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 inet6_dump_fib+0x14b/0x2c0 [ipv6]
 netlink_dump+0x216/0x2a0
 netlink_recvmsg+0x254/0x400
 ? copy_msghdr_from_user+0xb5/0x110
 ___sys_recvmsg+0xe9/0x230
 ? find_held_lock+0x3b/0xb0
 ? __handle_mm_fault+0x617/0x1180
 ? __audit_syscall_entry+0xb3/0x110
 ? __sys_recvmsg+0x39/0x70
 __sys_recvmsg+0x39/0x70
 do_syscall_64+0x63/0x120
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f2f03a72030
RSP: 002b:7fffab3de508 EFLAGS: 0246 ORIG_RAX: 002f
RAX: ffda RBX: 7fffab3e641c RCX: 7f2f03a72030
RDX:  RSI: 7fffab3de570 RDI: 0004
RBP:  R08: 7e6c R09: 7fffab3e63a8
R10: 7fffab3de5b0 R11: 0246 R12: 7fffab3e6608
R13: 0066b460 R14: 7e6c R15: 
Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 
89 53 2c c7 4
RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: c90008c3bc10
CR2: 0008
---[ end trace bd03458864eb266c ]---

Signed-off-by: Ben Greear 
---



Does your use case that triggers this involve replacing routes? I just
noticed the route delete code in fib6_add_rt2node does not have the
'Adjust walkers' code that is in fib6_del_route.

Further, the adjust walkers code in fib6_del_route looks suspicious in
its timing with route deletes. If you have a reliable reproducer we can
try a few things with fib6_del_route and the walker code.


Yes, we replace routes, and yes we can reliably reproduce it and will
be happy to test patches.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

[RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning

2018-05-04 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
the softirq context for the subsequent netif_receive_skb() call. The check
could be moved into the netif_receive_skb() function to prevent all calling
functions implement the checks on their own. Use the lockdep variant for
softirq context check. While at it, add a lockdep based check for irq
enabled as mentioned in the comment above netif_receive_skb().

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 net/core/dev.c | 3 +++
 net/mac80211/rx.c  | 2 --
 net/mac802154/rx.c | 2 --
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index af0558b00c6c..7b4b8611cc21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4750,6 +4750,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
+   lockdep_assert_irqs_enabled();
+   lockdep_assert_in_softirq();
+
trace_netif_receive_skb_entry(skb);
 
return netif_receive_skb_internal(skb);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 03102aff0953..653332d81c17 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -4324,8 +4324,6 @@ void ieee80211_rx_napi(struct ieee80211_hw *hw, struct 
ieee80211_sta *pubsta,
struct ieee80211_supported_band *sband;
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
 
-   WARN_ON_ONCE(softirq_count() == 0);
-
if (WARN_ON(status->band >= NUM_NL80211_BANDS))
goto drop;
 
diff --git a/net/mac802154/rx.c b/net/mac802154/rx.c
index 4dcf6e18563a..66916c270efc 100644
--- a/net/mac802154/rx.c
+++ b/net/mac802154/rx.c
@@ -258,8 +258,6 @@ void ieee802154_rx(struct ieee802154_local *local, struct 
sk_buff *skb)
 {
u16 crc;
 
-   WARN_ON_ONCE(softirq_count() == 0);
-
if (local->suspended)
goto drop;
 
-- 
2.17.0

[RFC PATCH 0/2] Introduce assert_in_softirq()

2018-05-04 Thread Sebastian Andrzej Siewior

ieee80211_rx_napi() has a check to ensure that it is invoked in softirq
context / with BH disabled. It is there because it invokes
netif_receive_skb() which has this requirement.
On -RT this check does not work as expected so there is always this
warning.
Tree wide there are two users of this check: ieee80211_rx_napi() and
ieee802154_rx(). This approach introduces assert_in_softirq() which does
the check if lockdep is enabled. This check could then become a nop on
-RT.

As an alternative netif_receive_skb() (or ieee80211_rx_napi() could do
local_bh_disable() / local_bh_enable() unconditionally. The _disable()
part is very cheap. The _enable() part is more expensive because it
includes a function call. We could avoid that jump in the likely case
when BH was already disabled by something like:

 static inline void local_bh_enable(void)
 {
 if (softirq_count() == SOFTIRQ_DISABLE_OFFSET)
 __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
 else
 preempt_count_sub(SOFTIRQ_DISABLE_OFFSET);
 }

Which would make bh_enable() cheaper for everyone.

Sebastian

[RFC PATCH 1/2] lockdep: Add a assert_in_softirq()

2018-05-04 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Instead of directly warn on wrong context, check if softirq context is
set. This check could be a nop on RT.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/lockdep.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..59363c3047c6 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -608,11 +608,17 @@ do {  
\
  "IRQs not disabled as expected\n");   \
} while (0)
 
+#define lockdep_assert_in_softirq()do {\
+   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
+ !current->softirq_context,\
+ "Not in softirq context as expected\n");  \
+   } while (0)
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
+# define lockdep_assert_in_softirq() do { } while (0)
 #endif
 
 #ifdef CONFIG_LOCKDEP
-- 
2.17.0

Re: [PATCH] net: Work around crash in ipv6 fib-walk-continue

2018-05-04 Thread David Ahern

On 4/19/18 12:01 PM, gree...@candelatech.com wrote:
> From: Ben Greear 
> 
> This keeps us from crashing in certain test cases where we
> bring up many (1000, for instance) mac-vlans with IPv6
> enabled in the kernel.  This bug has been around for a
> very long time.
> 
> Until a real fix is found (and for stable), maybe it
> is better to return an incomplete fib walk instead
> of crashing.
> 
> BUG: unable to handle kernel NULL pointer dereference at 8
> IP: fib6_walk_continue+0x5b/0x140 [ipv6]
> PGD 8007dfc0c067 P4D 8007dfc0c067 PUD 7e66ff067 PMD 0
> Oops:  [#1] PREEMPT SMP PTI
> Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 
> libcrc32c vrf]
> CPU: 3 PID: 15117 Comm: ip Tainted: G   O 4.16.0+ #5
> Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
> RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
> RSP: 0018:c90008c3bc10 EFLAGS: 00010287
> RAX: 88085ac45050 RBX: 8807e03008a0 RCX: 
> RDX:  RSI: c90008c3bc48 RDI: 8232b240
> RBP: 880819167600 R08: 0008 R09: 8807dff10071
> R10: c90008c3bbd0 R11:  R12: 8807e03008a0
> R13: 0002 R14: 8807e05744c8 R15: 8807e08ef000
> FS:  7f2f04342700() GS:88087fcc() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0008 CR3: 0007e0556002 CR4: 003606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  inet6_dump_fib+0x14b/0x2c0 [ipv6]
>  netlink_dump+0x216/0x2a0
>  netlink_recvmsg+0x254/0x400
>  ? copy_msghdr_from_user+0xb5/0x110
>  ___sys_recvmsg+0xe9/0x230
>  ? find_held_lock+0x3b/0xb0
>  ? __handle_mm_fault+0x617/0x1180
>  ? __audit_syscall_entry+0xb3/0x110
>  ? __sys_recvmsg+0x39/0x70
>  __sys_recvmsg+0x39/0x70
>  do_syscall_64+0x63/0x120
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f2f03a72030
> RSP: 002b:7fffab3de508 EFLAGS: 0246 ORIG_RAX: 002f
> RAX: ffda RBX: 7fffab3e641c RCX: 7f2f03a72030
> RDX:  RSI: 7fffab3de570 RDI: 0004
> RBP:  R08: 7e6c R09: 7fffab3e63a8
> R10: 7fffab3de5b0 R11: 0246 R12: 7fffab3e6608
> R13: 0066b460 R14: 7e6c R15: 
> Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 
> 89 53 2c c7 4
> RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: c90008c3bc10
> CR2: 0008
> ---[ end trace bd03458864eb266c ]---
> 
> Signed-off-by: Ben Greear 
> ---
> 

Does your use case that triggers this involve replacing routes? I just
noticed the route delete code in fib6_add_rt2node does not have the
'Adjust walkers' code that is in fib6_del_route.

Further, the adjust walkers code in fib6_del_route looks suspicious in
its timing with route deletes. If you have a reliable reproducer we can
try a few things with fib6_del_route and the walker code.

[PATCH v5 1/6] firmware: wrap FW_OPT_* into an enum

2018-05-04 Thread Luis R. Rodriguez

From: Andres Rodriguez 

This should let us associate enum kdoc to these values.
While at it, kdocify the fw_opt.

Signed-off-by: Andres Rodriguez 
Acked-by: Luis R. Rodriguez 
[mcgrof: coding style fixes, merge kdoc with enum move]
Signed-off-by: Luis R. Rodriguez 
---
 drivers/base/firmware_loader/fallback.c | 12 
 drivers/base/firmware_loader/fallback.h |  6 ++--
 drivers/base/firmware_loader/firmware.h | 37 +++--
 drivers/base/firmware_loader/main.c |  6 ++--
 4 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index 358354148dec..b57a7b3b4122 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -512,7 +512,7 @@ static const struct attribute_group *fw_dev_attr_groups[] = 
{
 
 static struct fw_sysfs *
 fw_create_instance(struct firmware *firmware, const char *fw_name,
-  struct device *device, unsigned int opt_flags)
+  struct device *device, enum fw_opt opt_flags)
 {
struct fw_sysfs *fw_sysfs;
struct device *f_dev;
@@ -545,7 +545,7 @@ fw_create_instance(struct firmware *firmware, const char 
*fw_name,
  * In charge of constructing a sysfs fallback interface for firmware loading.
  **/
 static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs,
- unsigned int opt_flags, long timeout)
+ enum fw_opt opt_flags, long timeout)
 {
int retval = 0;
struct device *f_dev = _sysfs->dev;
@@ -599,7 +599,7 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs,
 
 static int fw_load_from_user_helper(struct firmware *firmware,
const char *name, struct device *device,
-   unsigned int opt_flags)
+   enum fw_opt opt_flags)
 {
struct fw_sysfs *fw_sysfs;
long timeout;
@@ -640,7 +640,7 @@ static int fw_load_from_user_helper(struct firmware 
*firmware,
return ret;
 }
 
-static bool fw_force_sysfs_fallback(unsigned int opt_flags)
+static bool fw_force_sysfs_fallback(enum fw_opt opt_flags)
 {
if (fw_fallback_config.force_sysfs_fallback)
return true;
@@ -649,7 +649,7 @@ static bool fw_force_sysfs_fallback(unsigned int opt_flags)
return true;
 }
 
-static bool fw_run_sysfs_fallback(unsigned int opt_flags)
+static bool fw_run_sysfs_fallback(enum fw_opt opt_flags)
 {
if (fw_fallback_config.ignore_sysfs_fallback) {
pr_info_once("Ignoring firmware sysfs fallback due to sysctl 
knob\n");
@@ -664,7 +664,7 @@ static bool fw_run_sysfs_fallback(unsigned int opt_flags)
 
 int fw_sysfs_fallback(struct firmware *fw, const char *name,
  struct device *device,
- unsigned int opt_flags,
+ enum fw_opt opt_flags,
  int ret)
 {
if (!fw_run_sysfs_fallback(opt_flags))
diff --git a/drivers/base/firmware_loader/fallback.h 
b/drivers/base/firmware_loader/fallback.h
index f8255670a663..a3b73a09db6c 100644
--- a/drivers/base/firmware_loader/fallback.h
+++ b/drivers/base/firmware_loader/fallback.h
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+#include "firmware.h"
+
 /**
  * struct firmware_fallback_config - firmware fallback configuration settings
  *
@@ -31,7 +33,7 @@ struct firmware_fallback_config {
 #ifdef CONFIG_FW_LOADER_USER_HELPER
 int fw_sysfs_fallback(struct firmware *fw, const char *name,
  struct device *device,
- unsigned int opt_flags,
+ enum fw_opt opt_flags,
  int ret);
 void kill_pending_fw_fallback_reqs(bool only_kill_custom);
 
@@ -43,7 +45,7 @@ void unregister_sysfs_loader(void);
 #else /* CONFIG_FW_LOADER_USER_HELPER */
 static inline int fw_sysfs_fallback(struct firmware *fw, const char *name,
struct device *device,
-   unsigned int opt_flags,
+   enum fw_opt opt_flags,
int ret)
 {
/* Keep carrying over the same error */
diff --git a/drivers/base/firmware_loader/firmware.h 
b/drivers/base/firmware_loader/firmware.h
index 64acbb1a392c..4f433b447367 100644
--- a/drivers/base/firmware_loader/firmware.h
+++ b/drivers/base/firmware_loader/firmware.h
@@ -2,6 +2,7 @@
 #ifndef __FIRMWARE_LOADER_H
 #define __FIRMWARE_LOADER_H
 
+#include 
 #include 
 #include 
 #include 
@@ -10,13 +11,33 @@
 
 #include 
 
-/* firmware behavior options */
-#define FW_OPT_UEVENT  (1U << 0)
-#define FW_OPT_NOWAIT  (1U << 1)
-#define FW_OPT_USERHELPER  (1U << 2)
-#define FW_OPT_NO_WARN (1U << 3)
-#define FW_OPT_NOCACHE

[PATCH v5 0/6] firmware_loader: cleanups for v4.18

2018-05-04 Thread Luis R. Rodriguez

Greg,

I've reviewed the pending patches for the firmware_laoder and as for
v4.18, the following 3 patches from Andres have been iterated enough
that they're ready after I made some final minor changes, mostly just
style fixes and re-arrangements in terms of order. The new API he was
suggesting to add requires just a bit more review.

The last 3 patches are my own and are new, so I'd like further review
from others on them, but considering the changes Hans de Goede is having
us consider, I think this will prove useful to his work in terms of
splitting up code and documenting things properly. One thing that was
clear -- is our Kconfig entries for FW_LOADER were *extremely* outdated,
as such I've gone ahead and updated them.

The kconfig wording update for FW_LOADER includes prior conclusions made
to help justify keeping the split of the firmware fallback sysfs
interface in terms of size which was discussed with Josh Triplett a
while ago. It also includes modern recommendations, which would otherwise
get lost, on what to do about corner case firmware situations on
provisioning situations which folks have brought to my attention before
and architectural solutions based on firmwared [0] for a few years now.
Finally this work also reveals that a couple of candidate drivers could
likely move to staging considering their age, *or* we could just remove
the respective firmware build options. SCSI folks? Networking folks? To
my surprise *nothing* has been done about PREVENT_FIRMWARE_BUILD for
them since pre-git days!  These sneaky litte buggers are:

  * CONFIG_WANXL --> CONFIG_WANXL_BUILD_FIRMWARE
  * CONFIG_SCSI_AIC79XX --> CONFIG_AIC79XX_BUILD_FIRMWARE

To this day both of these drivers are building driver *firmwares* when
the option CONFIG_PREVENT_FIRMWARE_BUILD is disabled, and they don't
even make use of the firmware API at all. I find it *highly unlikely*
pre-git day drivers are raging in new radical firmware updates these
days. I'll go put a knife into some of that unless I hear back from
SCSI or networking folks that WANXL_BUILD_FIRMWARE and
AIC79XX_BUILD_FIRMWARE are still hip and very much needed.

On my radar as well are Mimi's latest firmware_loader proposed changes,
but I think those need considerable review and attention from more security
folks, Android folks, and the linux-wireless community, our own
scattered random folks of firmware reviewer folks.

These patches are based on top of linux-next next-20180504, they are
also available in a respective git branch, both for linux-next [1] and
linux [2].

Question, and specially rants are greatly appreciated, and of course...
may the 4th be with you.

[0] https://github.com/teg/firmwared
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20180504-firmware_loader
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20180504-firmware_loader

  Luis

Andres Rodriguez (3):
  firmware: wrap FW_OPT_* into an enum
  firmware: use () to terminate kernel-doc function names
  firmware: rename fw_sysfs_fallback to firmware_fallback_sysfs()

Luis R. Rodriguez (3):
  firmware_loader: document firmware_sysfs_fallback()
  firmware_loader: enhance Kconfig documentation over FW_LOADER
  firmware_loader: move kconfig FW_LOADER entries to its own file

 drivers/base/Kconfig|  90 +++---
 drivers/base/firmware_loader/Kconfig| 149 
 drivers/base/firmware_loader/fallback.c |  46 +---
 drivers/base/firmware_loader/fallback.h |  18 +--
 drivers/base/firmware_loader/firmware.h |  37 --
 drivers/base/firmware_loader/main.c |  28 ++---
 6 files changed, 252 insertions(+), 116 deletions(-)
 create mode 100644 drivers/base/firmware_loader/Kconfig

-- 
2.17.0

[PATCH v5 3/6] firmware: rename fw_sysfs_fallback to firmware_fallback_sysfs()

2018-05-04 Thread Luis R. Rodriguez

From: Andres Rodriguez 

This is done since this call is now exposed through kernel-doc,
and since this also paves the way for different future types of
fallback mechanims.

Signed-off-by: Andres Rodriguez 
Acked-by: Luis R. Rodriguez 
[mcgrof: small coding style changes]
Signed-off-by: Luis R. Rodriguez 
---
 drivers/base/firmware_loader/fallback.c |  8 
 drivers/base/firmware_loader/fallback.h | 16 
 drivers/base/firmware_loader/firmware.h |  2 +-
 drivers/base/firmware_loader/main.c |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index 529f7013616f..3db9e0f225ac 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -662,10 +662,10 @@ static bool fw_run_sysfs_fallback(enum fw_opt opt_flags)
return fw_force_sysfs_fallback(opt_flags);
 }
 
-int fw_sysfs_fallback(struct firmware *fw, const char *name,
- struct device *device,
- enum fw_opt opt_flags,
- int ret)
+int firmware_fallback_sysfs(struct firmware *fw, const char *name,
+   struct device *device,
+   enum fw_opt opt_flags,
+   int ret)
 {
if (!fw_run_sysfs_fallback(opt_flags))
return ret;
diff --git a/drivers/base/firmware_loader/fallback.h 
b/drivers/base/firmware_loader/fallback.h
index a3b73a09db6c..21063503e4ea 100644
--- a/drivers/base/firmware_loader/fallback.h
+++ b/drivers/base/firmware_loader/fallback.h
@@ -31,10 +31,10 @@ struct firmware_fallback_config {
 };
 
 #ifdef CONFIG_FW_LOADER_USER_HELPER
-int fw_sysfs_fallback(struct firmware *fw, const char *name,
- struct device *device,
- enum fw_opt opt_flags,
- int ret);
+int firmware_fallback_sysfs(struct firmware *fw, const char *name,
+   struct device *device,
+   enum fw_opt opt_flags,
+   int ret);
 void kill_pending_fw_fallback_reqs(bool only_kill_custom);
 
 void fw_fallback_set_cache_timeout(void);
@@ -43,10 +43,10 @@ void fw_fallback_set_default_timeout(void);
 int register_sysfs_loader(void);
 void unregister_sysfs_loader(void);
 #else /* CONFIG_FW_LOADER_USER_HELPER */
-static inline int fw_sysfs_fallback(struct firmware *fw, const char *name,
-   struct device *device,
-   enum fw_opt opt_flags,
-   int ret)
+static inline int firmware_fallback_sysfs(struct firmware *fw, const char 
*name,
+ struct device *device,
+ enum fw_opt opt_flags,
+ int ret)
 {
/* Keep carrying over the same error */
return ret;
diff --git a/drivers/base/firmware_loader/firmware.h 
b/drivers/base/firmware_loader/firmware.h
index 4f433b447367..4c1395f8e7ed 100644
--- a/drivers/base/firmware_loader/firmware.h
+++ b/drivers/base/firmware_loader/firmware.h
@@ -20,7 +20,7 @@
  * @FW_OPT_NOWAIT: Used to describe the firmware request is asynchronous.
  * @FW_OPT_USERHELPER: Enable the fallback mechanism, in case the direct
  * filesystem lookup fails at finding the firmware.  For details refer to
- * fw_sysfs_fallback().
+ * firmware_fallback_sysfs().
  * @FW_OPT_NO_WARN: Quiet, avoid printing warning messages.
  * @FW_OPT_NOCACHE: Disables firmware caching. Firmware caching is used to
  * cache the firmware upon suspend, so that upon resume races against the
diff --git a/drivers/base/firmware_loader/main.c 
b/drivers/base/firmware_loader/main.c
index 7f2bc7e8e3c0..d951af29017a 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -581,7 +581,7 @@ _request_firmware(const struct firmware **firmware_p, const 
char *name,
dev_warn(device,
 "Direct firmware load for %s failed with error 
%d\n",
 name, ret);
-   ret = fw_sysfs_fallback(fw, name, device, opt_flags, ret);
+   ret = firmware_fallback_sysfs(fw, name, device, opt_flags, ret);
} else
ret = assign_fw(fw, device, opt_flags);
 
-- 
2.17.0

[PATCH v5 2/6] firmware: use () to terminate kernel-doc function names

2018-05-04 Thread Luis R. Rodriguez

From: Andres Rodriguez 

The kernel-doc spec dictates a function name ends in ().

Signed-off-by: Andres Rodriguez 
Acked-by: Randy Dunlap 
Acked-by: Luis R. Rodriguez 
Signed-off-by: Luis R. Rodriguez 
---
 drivers/base/firmware_loader/fallback.c |  8 
 drivers/base/firmware_loader/main.c | 20 ++--
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index b57a7b3b4122..529f7013616f 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -125,7 +125,7 @@ static ssize_t timeout_show(struct class *class, struct 
class_attribute *attr,
 }
 
 /**
- * firmware_timeout_store - set number of seconds to wait for firmware
+ * firmware_timeout_store() - set number of seconds to wait for firmware
  * @class: device class pointer
  * @attr: device attribute pointer
  * @buf: buffer to scan for timeout value
@@ -239,7 +239,7 @@ static int map_fw_priv_pages(struct fw_priv *fw_priv)
 }
 
 /**
- * firmware_loading_store - set value in the 'loading' control file
+ * firmware_loading_store() - set value in the 'loading' control file
  * @dev: device pointer
  * @attr: device attribute pointer
  * @buf: buffer to scan for loading control value
@@ -431,7 +431,7 @@ static int fw_realloc_pages(struct fw_sysfs *fw_sysfs, int 
min_size)
 }
 
 /**
- * firmware_data_write - write method for firmware
+ * firmware_data_write() - write method for firmware
  * @filp: open sysfs file
  * @kobj: kobject for the device
  * @bin_attr: bin_attr structure
@@ -537,7 +537,7 @@ fw_create_instance(struct firmware *firmware, const char 
*fw_name,
 }
 
 /**
- * fw_load_sysfs_fallback - load a firmware via the sysfs fallback mechanism
+ * fw_load_sysfs_fallback() - load a firmware via the sysfs fallback mechanism
  * @fw_sysfs: firmware sysfs information for the firmware to load
  * @opt_flags: flags of options, FW_OPT_*
  * @timeout: timeout to wait for the load
diff --git a/drivers/base/firmware_loader/main.c 
b/drivers/base/firmware_loader/main.c
index 9919f0e6a7cc..7f2bc7e8e3c0 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -597,7 +597,7 @@ _request_firmware(const struct firmware **firmware_p, const 
char *name,
 }
 
 /**
- * request_firmware: - send firmware request and wait for it
+ * request_firmware() - send firmware request and wait for it
  * @firmware_p: pointer to firmware image
  * @name: name of firmware file
  * @device: device for which firmware is being loaded
@@ -657,7 +657,7 @@ int request_firmware_direct(const struct firmware 
**firmware_p,
 EXPORT_SYMBOL_GPL(request_firmware_direct);
 
 /**
- * firmware_request_cache: - cache firmware for suspend so resume can use it
+ * firmware_request_cache() - cache firmware for suspend so resume can use it
  * @name: name of firmware file
  * @device: device for which firmware should be cached for
  *
@@ -681,7 +681,7 @@ int firmware_request_cache(struct device *device, const 
char *name)
 EXPORT_SYMBOL_GPL(firmware_request_cache);
 
 /**
- * request_firmware_into_buf - load firmware into a previously allocated buffer
+ * request_firmware_into_buf() - load firmware into a previously allocated 
buffer
  * @firmware_p: pointer to firmware image
  * @name: name of firmware file
  * @device: device for which firmware is being loaded and DMA region allocated
@@ -713,7 +713,7 @@ request_firmware_into_buf(const struct firmware 
**firmware_p, const char *name,
 EXPORT_SYMBOL(request_firmware_into_buf);
 
 /**
- * release_firmware: - release the resource associated with a firmware image
+ * release_firmware() - release the resource associated with a firmware image
  * @fw: firmware resource to release
  **/
 void release_firmware(const struct firmware *fw)
@@ -755,7 +755,7 @@ static void request_firmware_work_func(struct work_struct 
*work)
 }
 
 /**
- * request_firmware_nowait - asynchronous version of request_firmware
+ * request_firmware_nowait() - asynchronous version of request_firmware
  * @module: module requesting the firmware
  * @uevent: sends uevent to copy the firmware image if this flag
  * is non-zero else the firmware copy must be done manually.
@@ -824,7 +824,7 @@ EXPORT_SYMBOL(request_firmware_nowait);
 static ASYNC_DOMAIN_EXCLUSIVE(fw_cache_domain);
 
 /**
- * cache_firmware - cache one firmware image in kernel memory space
+ * cache_firmware() - cache one firmware image in kernel memory space
  * @fw_name: the firmware image name
  *
  * Cache firmware in kernel memory so that drivers can use it when
@@ -866,7 +866,7 @@ static struct fw_priv *lookup_fw_priv(const char *fw_name)
 }
 
 /**
- * uncache_firmware - remove one cached firmware image
+ * uncache_firmware() - remove one cached firmware image
  * @fw_name: the firmware image name
  *
  * Uncache one

[PATCH v5 4/6] firmware_loader: document firmware_sysfs_fallback()

2018-05-04 Thread Luis R. Rodriguez

This also sets the expecations for future fallback interfaces, even
if they are not exported.

Signed-off-by: Luis R. Rodriguez 
---
 drivers/base/firmware_loader/fallback.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index 3db9e0f225ac..9169e7b9800c 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -662,6 +662,26 @@ static bool fw_run_sysfs_fallback(enum fw_opt opt_flags)
return fw_force_sysfs_fallback(opt_flags);
 }
 
+/**
+ * firmware_fallback_sysfs() - use the fallback mechanism to find firmware
+ * @fw: pointer to firmware image
+ * @name: name of firmware file to look for
+ * @device: device for which firmware is being loaded
+ * @opt_flags: options to control firmware loading behaviour
+ * @ret: return value from direct lookup which triggered the fallback mechanism
+ *
+ * This function is called if direct lookup for the firmware failed, it enables
+ * a fallback mechanism through userspace by exposing a sysfs loading
+ * interface. Userspace is in charge of loading the firmware through the syfs
+ * loading interface. This syfs fallback mechanism may be disabled completely
+ * on a system by setting the proc sysctl value ignore_sysfs_fallback to true.
+ * If this false we check if the internal API caller set the @FW_OPT_NOFALLBACK
+ * flag, if so it would also disable the fallback mechanism. A system may want
+ * to enfoce the sysfs fallback mechanism at all times, it can do this by
+ * setting ignore_sysfs_fallback to false and force_sysfs_fallback to true.
+ * Enabling force_sysfs_fallback is functionally equivalent to build a kernel
+ * with CONFIG_FW_LOADER_USER_HELPER_FALLBACK.
+ **/
 int firmware_fallback_sysfs(struct firmware *fw, const char *name,
struct device *device,
enum fw_opt opt_flags,
-- 
2.17.0

[PATCH v5 6/6] firmware_loader: move kconfig FW_LOADER entries to its own file

2018-05-04 Thread Luis R. Rodriguez

This will make it easier to track and easier to understand
what components and features are part of the FW_LOADER. There
are some components related to firmware which have *nothing* to
do with the FW_LOADER, souch as PREVENT_FIRMWARE_BUILD.

Signed-off-by: Luis R. Rodriguez 
---
 drivers/base/Kconfig | 150 +--
 drivers/base/firmware_loader/Kconfig | 149 ++
 2 files changed, 150 insertions(+), 149 deletions(-)
 create mode 100644 drivers/base/firmware_loader/Kconfig

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index bf2d464b0e2c..06d6e27784fa 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -88,155 +88,7 @@ config PREVENT_FIRMWARE_BUILD
o CONFIG_WANXL through CONFIG_WANXL_BUILD_FIRMWARE
o CONFIG_SCSI_AIC79XX through CONFIG_AIC79XX_BUILD_FIRMWARE
 
-menu "Firmware loader"
-
-config FW_LOADER
-   tristate "Firmware loading facility" if EXPERT
-   default y
-   ---help---
- This enables the firmware loading facility in the kernel. The kernel
- will first look for built-in firmware, if it has any. Next, it will
- look for the requested firmware in a series of filesystem paths:
-
-   o firmware_class path module parameter or kernel boot param
-   o /lib/firmware/updates/UTS_RELEASE
-   o /lib/firmware/updates
-   o /lib/firmware/UTS_RELEASE
-   o /lib/firmware
-
- Enabling this feature only increases your kernel image by about
- 828 bytes, enable this option unless you are certain you don't
- need firmware.
-
- You typically want this built-in (=y) but you can also enable this
- as a module, in which case the firmware_class module will be built.
- You also want to be sure to enable this built-in if you are going to
- enable built-in firmware (CONFIG_EXTRA_FIRMWARE).
-
-if FW_LOADER
-
-config EXTRA_FIRMWARE
-   string "Build these firmware blobs into the kernel binary"
-   help
- Device drivers which require firmware can typically deal with
- having the kernel load firmware from the various supported
- /lib/firmware/ paths. This option enables you to build into the
- kernel firmware files. Built-in firmware searches are preceeded
- over firmware lookups using your filesystem over the supported
- /lib/firmware paths documented on CONFIG_FW_LOADER.
-
- This may be useful for testing or if the firmware is required early on
- in boot and cannot rely on the firmware being placed in an initrd or
- initramfs.
-
- This option is a string and takes the (space-separated) names of the
- firmware files -- the same names that appear in MODULE_FIRMWARE()
- and request_firmware() in the source. These files should exist under
- the directory specified by the EXTRA_FIRMWARE_DIR option, which is
- /lib/firmware by default.
-
- For example, you might set CONFIG_EXTRA_FIRMWARE="usb8388.bin", copy
- the usb8388.bin file into /lib/firmware, and build the kernel. Then
- any request_firmware("usb8388.bin") will be satisfied internally
- inside the kernel without ever looking at your filesystem at runtime.
-
- WARNING: If you include additional firmware files into your binary
- kernel image that are not available under the terms of the GPL,
- then it may be a violation of the GPL to distribute the resulting
- image since it combines both GPL and non-GPL work. You should
- consult a lawyer of your own before distributing such an image.
-
-config EXTRA_FIRMWARE_DIR
-   string "Firmware blobs root directory"
-   depends on EXTRA_FIRMWARE != ""
-   default "/lib/firmware"
-   help
- This option controls the directory in which the kernel build system
- looks for the firmware files listed in the EXTRA_FIRMWARE option.
-
-config FW_LOADER_USER_HELPER
-   bool "Enable the firmware sysfs fallback mechanism"
-   help
- This option enables a sysfs loading facility to enable firmware
- loading to the kernel through userspace as a fallback mechanism
- if and only if the kernel's direct filesystem lookup for the
- firmware failed using the different /lib/firmware/ paths, or the
- path specified in the firmware_class path module parameter, or the
- firmware_class path kernel boot parameter if the firmware_class is
- built-in. For details on how to work with the sysfs fallback mechanism
- refer to Documentation/driver-api/firmware/fallback-mechanisms.rst.
-
- The direct filesystem lookup for firwmare is always used first now.
-
- If the kernel's direct filesystem lookup for firware fails to find
- the requested firmware a sysfs fallback loading facility is

[PATCH v5 5/6] firmware_loader: enhance Kconfig documentation over FW_LOADER

2018-05-04 Thread Luis R. Rodriguez

If you try to read FW_LOADER today it speaks of old riddles and
unless you have been following development closely you will loose
track of what is what. Even the documentation for PREVENT_FIRMWARE_BUILD
is a bit fuzzy and how it fits into this big picture.

Give the FW_LOADER kconfig documentation some love with more up to
date developments and recommendations. While at it, wrap the FW_LOADER
code into its own menu to compartamentalize and make it clearer which
components really are part of the FW_LOADER. This should also make
it easier to later move these kconfig entries into the firmware_loader/
directory later.

Signed-off-by: Luis R. Rodriguez 
---
 drivers/base/Kconfig | 160 ++-
 1 file changed, 126 insertions(+), 34 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 29b0eb452b3a..bf2d464b0e2c 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -70,39 +70,64 @@ config STANDALONE
  If unsure, say Y.
 
 config PREVENT_FIRMWARE_BUILD
-   bool "Prevent firmware from being built"
+   bool "Disable drivers features which enable custom firmware building"
default y
help
- Say yes to avoid building firmware. Firmware is usually shipped
- with the driver and only when updating the firmware should a
- rebuild be made.
- If unsure, say Y here.
+ Say yes to disable driver features which enable building a custom
+ driver firmwar at kernel build time. These drivers do not use the
+ kernel firmware API to load firmware (CONFIG_FW_LOADER), instead they
+ use their own custom loading mechanism. The required firmware is
+ usually shipped with the driver, building the driver firmware
+ should only be needed if you have an updated firmware source.
+
+ Firmware should not be being built as part of kernel, these days
+ you should always prevent this and say Y here. There are only two
+ old drivers which enable building of its firmware at kernel build
+ time:
+
+   o CONFIG_WANXL through CONFIG_WANXL_BUILD_FIRMWARE
+   o CONFIG_SCSI_AIC79XX through CONFIG_AIC79XX_BUILD_FIRMWARE
+
+menu "Firmware loader"
 
 config FW_LOADER
-   tristate "Userspace firmware loading support" if EXPERT
+   tristate "Firmware loading facility" if EXPERT
default y
---help---
- This option is provided for the case where none of the in-tree modules
- require userspace firmware loading support, but a module built
- out-of-tree does.
+ This enables the firmware loading facility in the kernel. The kernel
+ will first look for built-in firmware, if it has any. Next, it will
+ look for the requested firmware in a series of filesystem paths:
+
+   o firmware_class path module parameter or kernel boot param
+   o /lib/firmware/updates/UTS_RELEASE
+   o /lib/firmware/updates
+   o /lib/firmware/UTS_RELEASE
+   o /lib/firmware
+
+ Enabling this feature only increases your kernel image by about
+ 828 bytes, enable this option unless you are certain you don't
+ need firmware.
+
+ You typically want this built-in (=y) but you can also enable this
+ as a module, in which case the firmware_class module will be built.
+ You also want to be sure to enable this built-in if you are going to
+ enable built-in firmware (CONFIG_EXTRA_FIRMWARE).
+
+if FW_LOADER
 
 config EXTRA_FIRMWARE
-   string "External firmware blobs to build into the kernel binary"
-   depends on FW_LOADER
+   string "Build these firmware blobs into the kernel binary"
help
- Various drivers in the kernel source tree may require firmware,
- which is generally available in your distribution's linux-firmware
- package.
+ Device drivers which require firmware can typically deal with
+ having the kernel load firmware from the various supported
+ /lib/firmware/ paths. This option enables you to build into the
+ kernel firmware files. Built-in firmware searches are preceeded
+ over firmware lookups using your filesystem over the supported
+ /lib/firmware paths documented on CONFIG_FW_LOADER.
 
- The linux-firmware package should install firmware into
- /lib/firmware/ on your system, so they can be loaded by userspace
- helpers on request.
-
- This option allows firmware to be built into the kernel for the case
- where the user either cannot or doesn't want to provide it from
- userspace at runtime (for example, when the firmware in question is
- required for accessing the boot device, and the user doesn't want to
- use an initrd).
+ This may be useful for testing or if the firmware is required early on
+

Re: [PATCH v3 net-next] net: dsa: mv88e6xxx: 88E6141/6341 SERDES support

2018-05-04 Thread David Miller

From: Marek Behún 
Date: Fri,  4 May 2018 19:26:10 +0200

> The 88E6141/6341 switches (also known as Topaz) have 1 SGMII lane,
> which can be configured the same way as the SERDES lane on 88E6390.
> 
> Signed-off-by: Marek Behun 

Applied, thank you.

Re: [PATCH v2 2/4] net: hook socketpair() into LSM

2018-05-04 Thread David Miller

From: David Herrmann 
Date: Fri,  4 May 2018 16:28:20 +0200

> Use the newly created LSM-hook for socketpair(). The default hook
> return-value is 0, so behavior stays the same unless LSMs start using
> this hook.
> 
> Acked-by: Serge Hallyn 
> Signed-off-by: Tom Gundersen 
> Signed-off-by: David Herrmann 

Acked-by: David S. Miller

Charity Gift !!!

2018-05-04 Thread Mrs Mavis L. Wanczyk




--
This is the second time i am sending you this mail.

I, Mavis Wanczyk donates $ 5 Million Dollars from part of my Powerball  
Jackpot Lottery of $ 758 Million Dollars, respond with your details  
for claims.


I await your earliest response and God Bless you

Good luck.
Mavis Wanczyk

Re: [PATCH net] net: phy: sfp: fix the BR,min computation

2018-05-04 Thread David Miller

From: Antoine Tenart 
Date: Fri,  4 May 2018 17:10:54 +0200

> In an SFP EEPROM values can be read to get information about a given SFP
> module. One of those is the bitrate, which can be determined using a
> nominal bitrate in addition with min and max values (in %). The SFP code
> currently compute both BR,min and BR,max values thanks to this nominal
> and min,max values.
> 
> This patch fixes the BR,min computation as the min value should be
> subtracted to the nominal one, not added.
> 
> Fixes: 9962acf7fb8c ("sfp: add support for 1000Base-PX and 1000Base-BX10")
> Signed-off-by: Antoine Tenart 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next v2] net: core: rework basic flow dissection helper

2018-05-04 Thread David Miller

From: Paolo Abeni 
Date: Fri,  4 May 2018 11:32:59 +0200

> When the core networking needs to detect the transport offset in a given
> packet and parse it explicitly, a full-blown flow_keys struct is used for
> storage.
> This patch introduces a smaller keys store, rework the basic flow dissect
> helper to use it, and apply this new helper where possible - namely in
> skb_probe_transport_header(). The used flow dissector data structures
> are renamed to match more closely the new role.
> 
> The above gives ~50% performance improvement in micro benchmarking around
> skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
> to the smaller memset. Small, but measurable improvement is measured also
> in macro benchmarking.
> 
> v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
>   as per DaveM suggestion
> 
> Suggested-by: David Miller 
> Signed-off-by: Paolo Abeni 

Awesome.  Applied, thanks Paolo.

[PATCH v3 net-next] net: dsa: mv88e6xxx: 88E6141/6341 SERDES support

2018-05-04 Thread Marek Behún

The 88E6141/6341 switches (also known as Topaz) have 1 SGMII lane,
which can be configured the same way as the SERDES lane on 88E6390.

Signed-off-by: Marek Behun 
---
 drivers/net/dsa/mv88e6xxx/chip.c   |  2 ++
 drivers/net/dsa/mv88e6xxx/serdes.c | 20 
 drivers/net/dsa/mv88e6xxx/serdes.h |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9d62e4acc01b..e7e079b1888c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2529,6 +2529,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
.reset = mv88e6185_g1_reset,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+   .serdes_power = mv88e6341_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6095_ops = {
@@ -2848,6 +2849,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+   .serdes_power = mv88e6341_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6176_ops = {
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c 
b/drivers/net/dsa/mv88e6xxx/serdes.c
index fb058fd35c0d..3a185ea1bf20 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -326,3 +326,23 @@ int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, 
int port, bool on)
 
return 0;
 }
+
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
+{
+   int err;
+   u8 cmode;
+
+   if (port != 5)
+   return 0;
+
+   err = mv88e6xxx_port_get_cmode(chip, port, );
+   if (err)
+   return err;
+
+   if (cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X ||
+   cmode == MV88E6XXX_PORT_STS_CMODE_SGMII ||
+   cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX)
+   return mv88e6390_serdes_sgmii(chip, MV88E6341_ADDR_SERDES, on);
+
+   return 0;
+}
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.h 
b/drivers/net/dsa/mv88e6xxx/serdes.h
index 1897c01c6e19..b6e5fbd46b5e 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.h
+++ b/drivers/net/dsa/mv88e6xxx/serdes.h
@@ -19,6 +19,8 @@
 #define MV88E6352_ADDR_SERDES  0x0f
 #define MV88E6352_SERDES_PAGE_FIBER0x01
 
+#define MV88E6341_ADDR_SERDES  0x15
+
 #define MV88E6390_PORT9_LANE0  0x09
 #define MV88E6390_PORT9_LANE1  0x12
 #define MV88E6390_PORT9_LANE2  0x13
@@ -42,6 +44,7 @@
 #define MV88E6390_SGMII_CONTROL_LOOPBACK   BIT(14)
 #define MV88E6390_SGMII_CONTROL_PDOWN  BIT(11)
 
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6352_serdes_get_sset_count(struct mv88e6xxx_chip *chip, int port);
-- 
2.16.1

[PATCH 1/2] net: nixge: Fix error path for obtaining mac address

2018-05-04 Thread Moritz Fischer

Fix issue where nixge_get_nvmem_address() returns a non-NULL
return value on a failed nvmem_cell_get() that causes an invalid
access when error value encoded in pointer is dereferenced.

Furthermore ensure that buffer allocated by nvmem_cell_read()
actually gets kfreed() if the function succeeds.

Fixes commit 492caffa8a1a ("net: ethernet: nixge: Add support for
National Instruments XGE netdev")
Reported-by: Alex Williams 
Signed-off-by: Moritz Fischer 
---
 drivers/net/ethernet/ni/nixge.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ni/nixge.c b/drivers/net/ethernet/ni/nixge.c
index 27364b7572fc..c41fea9253e3 100644
--- a/drivers/net/ethernet/ni/nixge.c
+++ b/drivers/net/ethernet/ni/nixge.c
@@ -1170,7 +1170,7 @@ static void *nixge_get_nvmem_address(struct device *dev)
 
cell = nvmem_cell_get(dev, "address");
if (IS_ERR(cell))
-   return cell;
+   return NULL;
 
mac = nvmem_cell_read(cell, _size);
nvmem_cell_put(cell);
@@ -1202,10 +1202,12 @@ static int nixge_probe(struct platform_device *pdev)
ndev->max_mtu = NIXGE_JUMBO_MTU;
 
mac_addr = nixge_get_nvmem_address(>dev);
-   if (mac_addr && is_valid_ether_addr(mac_addr))
+   if (mac_addr && is_valid_ether_addr(mac_addr)) {
ether_addr_copy(ndev->dev_addr, mac_addr);
-   else
+   kfree(mac_addr);
+   } else {
eth_hw_addr_random(ndev);
+   }
 
priv = netdev_priv(ndev);
priv->ndev = ndev;
-- 
2.17.0

[PATCH 2/2] net: nixge: Address compiler warnings about signedness

2018-05-04 Thread Moritz Fischer

Fixes the following warnings:
warning: pointer targets in passing argument 1 of
‘is_valid_ether_addr’ differ in signedness [-Wpointer-sign]
  if (mac_addr && is_valid_ether_addr(mac_addr)) {
  ^~~~
expected ‘const u8 * {aka const unsigned char *}’ but argument
is of type ‘const char *’
 static inline bool is_valid_ether_addr(const u8 *addr)
^~~
warning: pointer targets in passing argument 2 of
‘ether_addr_copy’ differ in signedness [-Wpointer-sign]
   ether_addr_copy(ndev->dev_addr, mac_addr);
   ^~~~
expected ‘const u8 * {aka const unsigned char *}’ but argument
is of type ‘const char *’
 static inline void ether_addr_copy(u8 *dst, const u8 *src)

Signed-off-by: Moritz Fischer 
---
 drivers/net/ethernet/ni/nixge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ni/nixge.c b/drivers/net/ethernet/ni/nixge.c
index c41fea9253e3..b092894dd128 100644
--- a/drivers/net/ethernet/ni/nixge.c
+++ b/drivers/net/ethernet/ni/nixge.c
@@ -1183,7 +1183,7 @@ static int nixge_probe(struct platform_device *pdev)
struct nixge_priv *priv;
struct net_device *ndev;
struct resource *dmares;
-   const char *mac_addr;
+   const u8 *mac_addr;
int err;
 
ndev = alloc_etherdev(sizeof(*priv));
-- 
2.17.0

Re: [PATCH net-next v2 02/13] net: phy: sfp: handle non-wired SFP connectors

2018-05-04 Thread Antoine Tenart

Hi Florian,

On Fri, May 04, 2018 at 10:04:48AM -0700, Florian Fainelli wrote:
> On 05/04/2018 06:56 AM, Antoine Tenart wrote:
> > SFP connectors can be solder on a board without having any of their pins
> > (LOS, i2c...) wired. In such cases the SFP link state cannot be guessed,
> > and the overall link status reporting is left to other layers.
> > 
> > In order to achieve this, a new SFP_DEV status is added, named UNKNOWN.
> > This mode is set when it is not possible for the SFP code to get the
> > link status and as a result the link status is reported to be always UP
> > from the SFP point of view.
> 
> Why represent the SFP in Device Tree then? Why not just declare this is
> a fixed link which would avoid having to introduce this "unknown" state.

The other solution would have been to represent this as a fixed-link.
But such a representation would report the link as being up all the
time, which is something we wanted to avoid as the GoP in PPv2 can
report some link status. This is achieved using SFP+phylink+PPv2.

And representing the SFP cage in the device tree, although it's a
"dummy" one, helps describing the hardware.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH] net: ethernet: sun: niu set correct packet size in skb

2018-05-04 Thread David Miller

From: David Miller 
Date: Fri, 04 May 2018 13:09:50 -0400 (EDT)

> From: Rob Taglang 
> Date: Fri, 04 May 2018 13:07:54 -0400
> 
>>> Still corrupted.  Your email client is chopping up long lines.
>>> Please, send a test patch to yourself and make sure you can apply the
>>> patch that arrives in that test email.
>>> Thank you.
>> 
>> Hi David,
>> 
>> I did go through the process of sending myself a test email before
>> submitting.
>> 
>> I can copy-paste the patch from my message on the archive:
>> https://www.spinics.net/lists/netdev/msg500099.html
>> and apply it successfully, so I'm not sure what you need me to do
>> differently.
>> 
>> Any help is appreciated.
> 
> Weird, let me sort this out.

I ended up fixing it up by hand.  I have no idea why the copy that showed
up on spinics looks completely different to what I received directly in
my inbox and what showed up on patchwork.ozlabs.org

Anyways, applied and queued up for -stable, thank you.

Re: [PATCH net-next v2 01/13] net: phy: sfp: make the i2c-bus property really optional

2018-05-04 Thread Antoine Tenart

Hi Florian,

On Fri, May 04, 2018 at 10:03:16AM -0700, Florian Fainelli wrote:
> On 05/04/2018 06:56 AM, Antoine Tenart wrote:
> >  
> >  static int sfp_read(struct sfp *sfp, bool a2, u8 addr, void *buf, size_t 
> > len)
> >  {
> > +   if (!sfp->read)
> > +   return -EOPNOTSUPP;
> 
> -ENODEV would be closer to the intended meaning IMHO, those this could
> be argue that this is yet another color to paint the bikeshed with.

I thought about -ENODEV as well, but ended up choosing -EOPNOTSUPP for
some reason. But I'm really fine with both solutions, it really depends
on if we want to return a callback isn't available from a s/w point of
view (-EOPNOTSUPP) or a h/w point of view (-ENODEV).

> > ret = sfp_read(sfp, false, 0, , sizeof(id));
> > +   if (ret == -EOPNOTSUPP)
> > +   return ret;
> 
> Can you find a way such that only sfp_sm_mod_probe() needs to check
> whether the sfp read/write operations returned failure and then we just
> make sure the SFP state machine does not make any more progress? Having
> to check the sfp_read()/sfp_write() operations all over the place sounds
> error prone and won't scale in the future.

I tried doing this in this way (only having logic in the probe
function), but that wasn't as simple as this solution and it seemed
quite invasive as these read/write calls can be called from a few
functions but many code paths (as it's a state machine). So I choose the
easiest solution to maintain in the long run, as each future state
machine update could impact this.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH net-next v2 02/13] net: phy: sfp: handle non-wired SFP connectors

2018-05-04 Thread Andrew Lunn

On Fri, May 04, 2018 at 10:04:48AM -0700, Florian Fainelli wrote:
> On 05/04/2018 06:56 AM, Antoine Tenart wrote:
> > SFP connectors can be solder on a board without having any of their pins
> > (LOS, i2c...) wired. In such cases the SFP link state cannot be guessed,
> > and the overall link status reporting is left to other layers.
> > 
> > In order to achieve this, a new SFP_DEV status is added, named UNKNOWN.
> > This mode is set when it is not possible for the SFP code to get the
> > link status and as a result the link status is reported to be always UP
> > from the SFP point of view.
> 
> Why represent the SFP in Device Tree then? Why not just declare this is
> a fixed link which would avoid having to introduce this "unknown" state.

Hi Antoine

I agree with Florian here.

It LOS was missing, but i2c worked, i could see some value in using
SFP, or order to be able to read the EEPROM. But if everything is
missing, fixed-link seems a better fit.

 Andrew

Re: [PATCH net-next v2 03/13] net: phy: sfp: warn the user when no tx_disable pin is available

2018-05-04 Thread Andrew Lunn

On Fri, May 04, 2018 at 10:07:53AM -0700, Florian Fainelli wrote:
> On 05/04/2018 06:56 AM, Antoine Tenart wrote:
> > In case no Tx disable pin is available the SFP modules will always be
> > emitting. This could be an issue when using modules using laser as their
> > light source as we would have no way to disable it when the fiber is
> > removed. This patch adds a warning when registering an SFP cage which do
> > not have its tx_disable pin wired or available.
> 
> Is this something that was done in a possibly earlier revision of a
> given board design and which was finally fixed? Nothing wrong with the
> patch, but this seems like a pretty serious board design mistake, that
> needs to be addressed.

Hi Florian

Zii Devel B is like this. Only the "Signal Detect" pin is wired to a
GPIO.

Andrew

1 2 >

1 - 100 of 195 matches

Mail list logo