date:20180512

Re: [RFC] net: Add new LoRaWAN subsystem

2018-05-12 Thread Jian-Hong Pan

Hi Jiri and Marcel,

2018-05-11 23:39 GMT+08:00 Marcel Holtmann :
> Hi Jian-Hong,
>
>> A Low-Power Wide-Area Network (LPWAN) is a type of wireless
>> telecommunication wide area network designed to allow long range
>> communications at a low bit rate among things (connected objects), such
>> as sensors operated on a battery.  It can be used widely in IoT area.
>> LoRaWAN, which is one kind of implementation of LPWAN, is a medium
>> access control (MAC) layer protocol for managing communication between
>> LPWAN gateways and end-node devices, maintained by the LoRa Alliance.
>> LoRaWAN™ Specification could be downloaded at:
>> https://lora-alliance.org/lorawan-for-developers
>>
>> However, LoRaWAN is not implemented in Linux kernel right now, so I am
>> trying to develop it.  Here is my repository:
>> https://github.com/starnight/LoRa/tree/lorawan-ndo/LoRaWAN
>>
>> Because it is a kind of network, the ideal usage in an user space
>> program should be like "socket(PF_LORAWAN, SOCK_DGRAM, 0)" and with
>> other socket APIs.  Therefore, the definitions like AF_LORAWAN,
>> PF_LORAWAN ..., must be listed in the header files of glibc.
>> For the driver in kernel space, the definitions also must be listed in
>> the corresponding Linux socket header files.
>> Especially, both are for the testing programs.
>>
>> Back to the mentioned "LoRaWAN is not implemented in Linux kernel now".
>> Could or should we add the definitions into corresponding kernel header
>> files now, if LoRaWAN will be accepted as a subsystem in Linux?
>
> when you submit your LoRaWAN subsystem to netdev for review, include a patch 
> that adds these new address family definitions. Just pick the next one 
> available. There will be no pre-allocation of numbers until your work has 
> been accepted upstream. Meaning, that the number might change if other 
> address families get merged before yours. So you have to keep updating. glibc 
> will eventually follow the number assigned by the kernel.

Thanks for your guidance.  I will follow the steps.

Thanks a lot,

Jian-Hong Pan

> Regards
>
> Marcel
>

[PATCH ghak81 RFC V2 0/5] audit: group task params

2018-05-12 Thread Richard Guy Briggs

Group the audit parameters for each task into one structure.
In particular, remove the loginuid and sessionid values and the audit
context pointer from the task structure, replacing them with an audit
task information structure to contain them.  Use access functions to
access audit values.

Note:  Use static allocation of the audit task information structure
initially.  Dynamic allocation was considered and attempted, but isn't
ready yet.  Static allocation has the limitation that future audit task
information structure changes would cause a visible change to the rest
of the kernel, whereas dynamic allocation would mostly hide any future
changes.

The first four access normalization patches could stand alone.

Passes audit-testsuite.

Changelog:
v2
- p2/5: add audit header to init/init_task.c to quiet kbuildbot
- audit_signal_info(): fetch loginuid once
- remove task_struct from audit_context() param list
- remove extra task_struct local vars
- do nothing on request to set audit context when audit is disabled

Richard Guy Briggs (5):
  audit: normalize loginuid read access
  audit: convert sessionid unset to a macro
  audit: use inline function to get audit context
  audit: use inline function to set audit context
  audit: collect audit task parameters

 MAINTAINERS  |  2 +-
 include/linux/audit.h| 28 ---
 include/linux/audit_task.h   | 31 
 include/linux/sched.h|  6 +--
 include/net/xfrm.h   |  4 +-
 include/uapi/linux/audit.h   |  1 +
 init/init_task.c |  8 ++-
 kernel/audit.c   |  6 +--
 kernel/audit_watch.c |  2 +-
 kernel/auditsc.c | 97 +---
 kernel/fork.c|  2 +-
 net/bridge/netfilter/ebtables.c  |  2 +-
 net/core/dev.c   |  2 +-
 net/netfilter/x_tables.c |  2 +-
 net/netlabel/netlabel_user.c |  2 +-
 security/integrity/ima/ima_api.c |  2 +-
 security/integrity/integrity_audit.c |  2 +-
 security/lsm_audit.c |  2 +-
 security/selinux/hooks.c |  4 +-
 security/selinux/selinuxfs.c |  6 +--
 security/selinux/ss/services.c   | 12 ++---
 21 files changed, 133 insertions(+), 90 deletions(-)
 create mode 100644 include/linux/audit_task.h

-- 
1.8.3.1

[PATCH ghak81 RFC V2 4/5] audit: use inline function to set audit context

2018-05-12 Thread Richard Guy Briggs

Recognizing that the audit context is an internal audit value, use an
access function to set the audit context pointer for the task
rather than reaching directly into the task struct to set it.

Signed-off-by: Richard Guy Briggs 
---
 include/linux/audit.h | 6 ++
 kernel/auditsc.c  | 7 +++
 kernel/fork.c | 2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 786aa8e..f7973e4 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -235,6 +235,10 @@ extern void __audit_inode_child(struct inode *parent,
 extern void __audit_seccomp(unsigned long syscall, long signr, int code);
 extern void __audit_ptrace(struct task_struct *t);
 
+static inline void audit_set_context(struct task_struct *task, struct 
audit_context *ctx)
+{
+   task->audit_context = ctx;
+}
 static inline struct audit_context *audit_context(void)
 {
return current->audit_context;
@@ -472,6 +476,8 @@ static inline bool audit_dummy_context(void)
 {
return true;
 }
+static inline void audit_set_context(struct task_struct *task, struct 
audit_context *ctx)
+{ }
 static inline struct audit_context *audit_context(void)
 {
return NULL;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index ecc0c23..d441d68 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -865,7 +865,7 @@ static inline struct audit_context 
*audit_take_context(struct task_struct *tsk,
audit_filter_inodes(tsk, context);
}
 
-   tsk->audit_context = NULL;
+   audit_set_context(tsk, NULL);
return context;
 }
 
@@ -952,7 +952,7 @@ int audit_alloc(struct task_struct *tsk)
}
context->filterkey = key;
 
-   tsk->audit_context  = context;
+   audit_set_context(tsk, context);
set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
 }
@@ -1554,7 +1554,6 @@ void __audit_syscall_entry(int major, unsigned long a1, 
unsigned long a2,
  */
 void __audit_syscall_exit(int success, long return_code)
 {
-   struct task_struct *tsk = current;
struct audit_context *context;
 
if (success)
@@ -1589,7 +1588,7 @@ void __audit_syscall_exit(int success, long return_code)
kfree(context->filterkey);
context->filterkey = NULL;
}
-   tsk->audit_context = context;
+   audit_set_context(current, context);
 }
 
 static inline void handle_one(const struct inode *inode)
diff --git a/kernel/fork.c b/kernel/fork.c
index 242c8c9..cd18448 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1713,7 +1713,7 @@ static __latent_entropy struct task_struct *copy_process(
p->start_time = ktime_get_ns();
p->real_start_time = ktime_get_boot_ns();
p->io_context = NULL;
-   p->audit_context = NULL;
+   audit_set_context(p, NULL);
cgroup_fork(p);
 #ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
-- 
1.8.3.1

[PATCH ghak81 RFC V2 3/5] audit: use inline function to get audit context

2018-05-12 Thread Richard Guy Briggs

Recognizing that the audit context is an internal audit value, use an
access function to retrieve the audit context pointer for the task
rather than reaching directly into the task struct to get it.

Signed-off-by: Richard Guy Briggs 
---
 include/linux/audit.h| 14 ++--
 include/net/xfrm.h   |  2 +-
 kernel/audit.c   |  6 ++--
 kernel/audit_watch.c |  2 +-
 kernel/auditsc.c | 64 +---
 net/bridge/netfilter/ebtables.c  |  2 +-
 net/core/dev.c   |  2 +-
 net/netfilter/x_tables.c |  2 +-
 net/netlabel/netlabel_user.c |  2 +-
 security/integrity/ima/ima_api.c |  2 +-
 security/integrity/integrity_audit.c |  2 +-
 security/lsm_audit.c |  2 +-
 security/selinux/hooks.c |  4 +--
 security/selinux/selinuxfs.c |  6 ++--
 security/selinux/ss/services.c   | 12 +++
 15 files changed, 64 insertions(+), 60 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 5f86f7c..786aa8e 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -235,9 +235,13 @@ extern void __audit_inode_child(struct inode *parent,
 extern void __audit_seccomp(unsigned long syscall, long signr, int code);
 extern void __audit_ptrace(struct task_struct *t);
 
+static inline struct audit_context *audit_context(void)
+{
+   return current->audit_context;
+}
 static inline bool audit_dummy_context(void)
 {
-   void *p = current->audit_context;
+   void *p = audit_context();
return !p || *(int *)p;
 }
 static inline void audit_free(struct task_struct *task)
@@ -249,12 +253,12 @@ static inline void audit_syscall_entry(int major, 
unsigned long a0,
   unsigned long a1, unsigned long a2,
   unsigned long a3)
 {
-   if (unlikely(current->audit_context))
+   if (unlikely(audit_context()))
__audit_syscall_entry(major, a0, a1, a2, a3);
 }
 static inline void audit_syscall_exit(void *pt_regs)
 {
-   if (unlikely(current->audit_context)) {
+   if (unlikely(audit_context())) {
int success = is_syscall_success(pt_regs);
long return_code = regs_return_value(pt_regs);
 
@@ -468,6 +472,10 @@ static inline bool audit_dummy_context(void)
 {
return true;
 }
+static inline struct audit_context *audit_context(void)
+{
+   return NULL;
+}
 static inline struct filename *audit_reusename(const __user char *name)
 {
return NULL;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index fcce8ee..7f2e31a 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -736,7 +736,7 @@ static inline struct audit_buffer *xfrm_audit_start(const 
char *op)
 
if (audit_enabled == 0)
return NULL;
-   audit_buf = audit_log_start(current->audit_context, GFP_ATOMIC,
+   audit_buf = audit_log_start(audit_context(), GFP_ATOMIC,
AUDIT_MAC_IPSEC_EVENT);
if (audit_buf == NULL)
return NULL;
diff --git a/kernel/audit.c b/kernel/audit.c
index e9f9a90..e7478cb 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1099,8 +1099,7 @@ static void audit_log_feature_change(int which, u32 
old_feature, u32 new_feature
 
if (audit_enabled == AUDIT_OFF)
return;
-   ab = audit_log_start(current->audit_context,
-GFP_KERNEL, AUDIT_FEATURE_CHANGE);
+   ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_FEATURE_CHANGE);
if (!ab)
return;
audit_log_task_info(ab, current);
@@ -2317,8 +2316,7 @@ void audit_log_link_denied(const char *operation)
return;
 
/* Generate AUDIT_ANOM_LINK with subject, operation, outcome. */
-   ab = audit_log_start(current->audit_context, GFP_KERNEL,
-AUDIT_ANOM_LINK);
+   ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_ANOM_LINK);
if (!ab)
return;
audit_log_format(ab, "op=%s", operation);
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 9eb8b35..f1ba889 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -274,7 +274,7 @@ static void audit_update_watch(struct audit_parent *parent,
/* If the update involves invalidating rules, do the inode-based
 * filtering now, so we don't omit records. */
if (invalidating && !audit_dummy_context())
-   audit_filter_inodes(current, current->audit_context);
+   audit_filter_inodes(current, audit_context());
 
/* updating ino will likely change which audit_hash_list we
 * are on so we need a new watch for the new list */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index e157595..ecc0c23 100644

[PATCH ghak81 RFC V2 2/5] audit: convert sessionid unset to a macro

2018-05-12 Thread Richard Guy Briggs

Use a macro, "AUDIT_SID_UNSET", to replace each instance of
initialization and comparison to an audit session ID.

Signed-off-by: Richard Guy Briggs 
---
 include/linux/audit.h  | 2 +-
 include/net/xfrm.h | 2 +-
 include/uapi/linux/audit.h | 1 +
 init/init_task.c   | 3 ++-
 kernel/auditsc.c   | 4 ++--
 5 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 75d5b03..5f86f7c 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -513,7 +513,7 @@ static inline kuid_t audit_get_loginuid(struct task_struct 
*tsk)
 }
 static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 {
-   return -1;
+   return AUDIT_SID_UNSET;
 }
 static inline void audit_ipc_obj(struct kern_ipc_perm *ipcp)
 { }
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index a872379..fcce8ee 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -751,7 +751,7 @@ static inline void xfrm_audit_helper_usrinfo(bool 
task_valid,
audit_get_loginuid(current) :
INVALID_UID);
const unsigned int ses = task_valid ? audit_get_sessionid(current) :
-   (unsigned int) -1;
+   AUDIT_SID_UNSET;
 
audit_log_format(audit_buf, " auid=%u ses=%u", auid, ses);
audit_log_task_context(audit_buf);
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 4e61a9e..04f9bd2 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -465,6 +465,7 @@ struct audit_tty_status {
 };
 
 #define AUDIT_UID_UNSET (unsigned int)-1
+#define AUDIT_SID_UNSET ((unsigned int)-1)
 
 /* audit_rule_data supports filter rules with both integer and string
  * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
diff --git a/init/init_task.c b/init/init_task.c
index 3ac6e75..74f60ba 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -119,7 +120,7 @@ struct task_struct init_task
.thread_node= LIST_HEAD_INIT(init_signals.thread_head),
 #ifdef CONFIG_AUDITSYSCALL
.loginuid   = INVALID_UID,
-   .sessionid  = (unsigned int)-1,
+   .sessionid  = AUDIT_SID_UNSET,
 #endif
 #ifdef CONFIG_PERF_EVENTS
.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 0d4e269..e157595 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2050,7 +2050,7 @@ static void audit_log_set_loginuid(kuid_t koldloginuid, 
kuid_t kloginuid,
 int audit_set_loginuid(kuid_t loginuid)
 {
struct task_struct *task = current;
-   unsigned int oldsessionid, sessionid = (unsigned int)-1;
+   unsigned int oldsessionid, sessionid = AUDIT_SID_UNSET;
kuid_t oldloginuid;
int rc;
 
@@ -2064,7 +2064,7 @@ int audit_set_loginuid(kuid_t loginuid)
/* are we setting or clearing? */
if (uid_valid(loginuid)) {
sessionid = (unsigned int)atomic_inc_return(_id);
-   if (unlikely(sessionid == (unsigned int)-1))
+   if (unlikely(sessionid == AUDIT_SID_UNSET))
sessionid = (unsigned 
int)atomic_inc_return(_id);
}
 
-- 
1.8.3.1

[PATCH ghak81 RFC V2 1/5] audit: normalize loginuid read access

2018-05-12 Thread Richard Guy Briggs

Recognizing that the loginuid is an internal audit value, use an access
function to retrieve the audit loginuid value for the task rather than
reaching directly into the task struct to get it.

Signed-off-by: Richard Guy Briggs 
---
 kernel/auditsc.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 479c031..0d4e269 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -374,7 +374,7 @@ static int audit_field_compare(struct task_struct *tsk,
case AUDIT_COMPARE_EGID_TO_OBJ_GID:
return audit_compare_gid(cred->egid, name, f, ctx);
case AUDIT_COMPARE_AUID_TO_OBJ_UID:
-   return audit_compare_uid(tsk->loginuid, name, f, ctx);
+   return audit_compare_uid(audit_get_loginuid(tsk), name, f, ctx);
case AUDIT_COMPARE_SUID_TO_OBJ_UID:
return audit_compare_uid(cred->suid, name, f, ctx);
case AUDIT_COMPARE_SGID_TO_OBJ_GID:
@@ -385,7 +385,7 @@ static int audit_field_compare(struct task_struct *tsk,
return audit_compare_gid(cred->fsgid, name, f, ctx);
/* uid comparisons */
case AUDIT_COMPARE_UID_TO_AUID:
-   return audit_uid_comparator(cred->uid, f->op, tsk->loginuid);
+   return audit_uid_comparator(cred->uid, f->op, 
audit_get_loginuid(tsk));
case AUDIT_COMPARE_UID_TO_EUID:
return audit_uid_comparator(cred->uid, f->op, cred->euid);
case AUDIT_COMPARE_UID_TO_SUID:
@@ -394,11 +394,11 @@ static int audit_field_compare(struct task_struct *tsk,
return audit_uid_comparator(cred->uid, f->op, cred->fsuid);
/* auid comparisons */
case AUDIT_COMPARE_AUID_TO_EUID:
-   return audit_uid_comparator(tsk->loginuid, f->op, cred->euid);
+   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
cred->euid);
case AUDIT_COMPARE_AUID_TO_SUID:
-   return audit_uid_comparator(tsk->loginuid, f->op, cred->suid);
+   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
cred->suid);
case AUDIT_COMPARE_AUID_TO_FSUID:
-   return audit_uid_comparator(tsk->loginuid, f->op, cred->fsuid);
+   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
cred->fsuid);
/* euid comparisons */
case AUDIT_COMPARE_EUID_TO_SUID:
return audit_uid_comparator(cred->euid, f->op, cred->suid);
@@ -611,7 +611,7 @@ static int audit_filter_rules(struct task_struct *tsk,
result = match_tree_refs(ctx, rule->tree);
break;
case AUDIT_LOGINUID:
-   result = audit_uid_comparator(tsk->loginuid, f->op, 
f->uid);
+   result = audit_uid_comparator(audit_get_loginuid(tsk), 
f->op, f->uid);
break;
case AUDIT_LOGINUID_SET:
result = audit_comparator(audit_loginuid_set(tsk), 
f->op, f->val);
@@ -2281,14 +2281,14 @@ int audit_signal_info(int sig, struct task_struct *t)
struct audit_aux_data_pids *axp;
struct task_struct *tsk = current;
struct audit_context *ctx = tsk->audit_context;
-   kuid_t uid = current_uid(), t_uid = task_uid(t);
+   kuid_t uid = current_uid(), auid, t_uid = task_uid(t);
 
if (auditd_test_task(t) &&
(sig == SIGTERM || sig == SIGHUP ||
 sig == SIGUSR1 || sig == SIGUSR2)) {
audit_sig_pid = task_tgid_nr(tsk);
-   if (uid_valid(tsk->loginuid))
-   audit_sig_uid = tsk->loginuid;
+   if (uid_valid(auid = audit_get_loginuid(tsk)))
+   audit_sig_uid = auid;
else
audit_sig_uid = uid;
security_task_getsecid(tsk, _sig_sid);
-- 
1.8.3.1

[PATCH ghak81 RFC V2 5/5] audit: collect audit task parameters

2018-05-12 Thread Richard Guy Briggs

The audit-related parameters in struct task_struct should ideally be
collected together and accessed through a standard audit API.

Collect the existing loginuid, sessionid and audit_context together in a
new struct audit_task_info called "audit" in struct task_struct.

See: https://github.com/linux-audit/audit-kernel/issues/81

Signed-off-by: Richard Guy Briggs 
---
 MAINTAINERS|  2 +-
 include/linux/audit.h  | 10 +-
 include/linux/audit_task.h | 31 +++
 include/linux/sched.h  |  6 ++
 init/init_task.c   |  7 +--
 kernel/auditsc.c   |  6 +++---
 6 files changed, 47 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/audit_task.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0a1410d..8c7992d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2510,7 +2510,7 @@ L:linux-au...@redhat.com (moderated for 
non-subscribers)
 W: https://github.com/linux-audit
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
 S: Supported
-F: include/linux/audit.h
+F: include/linux/audit*.h
 F: include/uapi/linux/audit.h
 F: kernel/audit*
 
diff --git a/include/linux/audit.h b/include/linux/audit.h
index f7973e4..6d599b6 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -237,11 +237,11 @@ extern void __audit_inode_child(struct inode *parent,
 
 static inline void audit_set_context(struct task_struct *task, struct 
audit_context *ctx)
 {
-   task->audit_context = ctx;
+   task->audit.ctx = ctx;
 }
 static inline struct audit_context *audit_context(void)
 {
-   return current->audit_context;
+   return current->audit.ctx;
 }
 static inline bool audit_dummy_context(void)
 {
@@ -250,7 +250,7 @@ static inline bool audit_dummy_context(void)
 }
 static inline void audit_free(struct task_struct *task)
 {
-   if (unlikely(task->audit_context))
+   if (unlikely(task->audit.ctx))
__audit_free(task);
 }
 static inline void audit_syscall_entry(int major, unsigned long a0,
@@ -330,12 +330,12 @@ extern int auditsc_get_stamp(struct audit_context *ctx,
 
 static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
 {
-   return tsk->loginuid;
+   return tsk->audit.loginuid;
 }
 
 static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 {
-   return tsk->sessionid;
+   return tsk->audit.sessionid;
 }
 
 extern void __audit_ipc_obj(struct kern_ipc_perm *ipcp);
diff --git a/include/linux/audit_task.h b/include/linux/audit_task.h
new file mode 100644
index 000..d4b3a20
--- /dev/null
+++ b/include/linux/audit_task.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* audit_task.h -- definition of audit_task_info structure
+ *
+ * Copyright 2018 Red Hat Inc., Raleigh, North Carolina.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Written by Richard Guy Briggs 
+ *
+ */
+
+#ifndef _LINUX_AUDIT_TASK_H_
+#define _LINUX_AUDIT_TASK_H_
+
+struct audit_context;
+struct audit_task_info {
+   kuid_t  loginuid;
+   unsigned intsessionid;
+   struct audit_context*ctx;
+};
+
+#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3d697f..b58eca0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -27,9 +27,9 @@
 #include 
 #include 
 #include 
+#include 
 
 /* task_struct member predeclarations (sorted alphabetically): */
-struct audit_context;
 struct backing_dev_info;
 struct bio_list;
 struct blk_plug;
@@ -832,10 +832,8 @@ struct task_struct {
 
struct callback_head*task_works;
 
-   struct audit_context*audit_context;
 #ifdef CONFIG_AUDITSYSCALL
-   kuid_t  loginuid;
-   unsigned intsessionid;
+   struct audit_task_info  audit;
 #endif
struct seccomp  seccomp;
 
diff --git a/init/init_task.c b/init/init_task.c
index 74f60ba..d33260d 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -119,8 +119,11 @@ struct task_struct init_task
.thread_group   = LIST_HEAD_INIT(init_task.thread_group),
.thread_node= LIST_HEAD_INIT(init_signals.thread_head),
 #ifdef CONFIG_AUDITSYSCALL
-   .loginuid   = INVALID_UID,
-   .sessionid  = AUDIT_SID_UNSET,
+   .audit  = {
+   .loginuid   = INVALID_UID,
+   .sessionid  =

Re: [PATCH bpf-next 3/4] samples: bpf: fix build after move to compiling full libbpf.a

2018-05-12 Thread Daniel Borkmann

On 05/12/2018 09:38 PM, Jakub Kicinski wrote:
> On Fri, 11 May 2018 17:17:28 -0700, Jakub Kicinski wrote:
>> There are many ways users may compile samples, some of them got
>> broken by commit 5f9380572b4b ("samples: bpf: compile and link
>> against full libbpf").  Improve path resolution and make libbpf
>> building a dependency of source files to force its build.
>>
>> Samples should now again build with any of:
>>  cd samples/bpf; make
>>  make samples/bpf
>>  make -C samples/bpf
>>  cd samples/bpf; make O=builddir
>>  make samples/bpf O=builddir
>>  make -C samples/bpf O=builddir
>>
>> Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf")
>> Reported-by: Björn Töpel 
>> Signed-off-by: Jakub Kicinski 
> 
> Unfortunately Björn reports this still doesn't fix the build for him.
> Investigating further.

Ok, thanks for letting us know.

Re: [PATCH bpf] tools: bpf: fix NULL return handling in bpf__prepare_load

2018-05-12 Thread Daniel Borkmann

[ +Arnaldo ]

On 05/11/2018 01:21 PM, YueHaibing wrote:
> bpf_object__open()/bpf_object__open_buffer can return error pointer or NULL,
> check the return values with IS_ERR_OR_NULL() in bpf__prepare_load and
> bpf__prepare_load_buffer
> 
> Signed-off-by: YueHaibing 
> ---
>  tools/perf/util/bpf-loader.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

This should probably be routed via Arnaldo due to the fix in perf itself. If
there's no particular preference on which tree, we could potentially route it
as well via bpf with Acked-by from Arnaldo, but that is up to him. Arnaldo,
any preference?

> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> index af7ad81..cee6587 100644
> --- a/tools/perf/util/bpf-loader.c
> +++ b/tools/perf/util/bpf-loader.c
> @@ -66,7 +66,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, 
> const char *name)
>   }
>  
>   obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
> - if (IS_ERR(obj)) {
> + if (IS_ERR_OR_NULL(obj)) {
>   pr_debug("bpf: failed to load buffer\n");
>   return ERR_PTR(-EINVAL);
>   }
> @@ -102,14 +102,14 @@ struct bpf_object *bpf__prepare_load(const char 
> *filename, bool source)
>   pr_debug("bpf: successfull builtin compilation\n");
>   obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
>  
> - if (!IS_ERR(obj) && llvm_param.dump_obj)
> + if (!IS_ERR_OR_NULL(obj) && llvm_param.dump_obj)
>   llvm__dump_obj(filename, obj_buf, obj_buf_sz);
>  
>   free(obj_buf);
>   } else
>   obj = bpf_object__open(filename);
>  
> - if (IS_ERR(obj)) {
> + if (IS_ERR_OR_NULL(obj)) {
>   pr_debug("bpf: failed to load %s\n", filename);
>   return obj;
>   }
>

[PATCH net-next v2 3/3] sctp: checkpatch fixups

2018-05-12 Thread Marcelo Ricardo Leitner

A collection of fixups from previous patches, left for later to not
introduce unnecessary changes while moving code around.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
a594d181fa1178c34cf477e13d700f7b37e72e21..9a2fa7d6d68b1d695cd745ed612eb32193f947e0
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -812,8 +812,7 @@ static void sctp_outq_select_transport(struct 
sctp_flush_ctx *ctx,
 
if (!new_transport) {
if (!sctp_chunk_is_data(chunk)) {
-   /*
-* If we have a prior transport pointer, see if
+   /* If we have a prior transport pointer, see if
 * the destination address of the chunk
 * matches the destination address of the
 * current transport.  If not a match, then
@@ -912,8 +911,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
sctp_outq_select_transport(ctx, chunk);
 
switch (chunk->chunk_hdr->type) {
-   /*
-* 6.10 Bundling
+   /* 6.10 Bundling
 *   ...
 *   An endpoint MUST NOT bundle INIT, INIT ACK or SHUTDOWN
 *   COMPLETE with any other chunks.  [Send them immediately.]
@@ -1061,8 +1059,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx 
*ctx,
return;
}
 
-   /*
-* RFC 2960 6.1  Transmission of DATA Chunks
+   /* RFC 2960 6.1  Transmission of DATA Chunks
 *
 * C) When the time comes for the sender to transmit,
 * before sending new DATA chunks, the sender MUST
@@ -1101,8 +1098,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx 
*ctx,
 
sctp_outq_select_transport(ctx, chunk);
 
-   pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
-"skb->users:%d\n",
+   pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p 
skb->users:%d\n",
 __func__, ctx->q, chunk, chunk && chunk->chunk_hdr ?
 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
@@ -1175,8 +1171,7 @@ static void sctp_outq_flush_transports(struct 
sctp_flush_ctx *ctx)
}
 }
 
-/*
- * Try to flush an outqueue.
+/* Try to flush an outqueue.
  *
  * Description: Send everything in q which we legally can, subject to
  * congestion limitations.
@@ -1196,8 +1191,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
.gfp = gfp,
};
 
-   /*
-* 6.10 Bundling
+   /* 6.10 Bundling
 *   ...
 *   When bundling control chunks with DATA chunks, an
 *   endpoint MUST place control chunks first in the outbound
@@ -1768,7 +1762,7 @@ static int sctp_acked(struct sctp_sackhdr *sack, __u32 
tsn)
if (TSN_lte(tsn, ctsn))
goto pass;
 
-   /* 3.3.4 Selective Acknowledgement (SACK) (3):
+   /* 3.3.4 Selective Acknowledgment (SACK) (3):
 *
 * Gap Ack Blocks:
 *  These fields contain the Gap Ack Blocks. They are repeated
-- 
2.14.3

[PATCH net-next v2 2/3] sctp: add asoc and packet to sctp_flush_ctx

2018-05-12 Thread Marcelo Ricardo Leitner

Pre-compute these so the compiler won't reload them (due to
no-strict-aliasing).

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 99 -
 1 file changed, 45 insertions(+), 54 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
db94a2513dd874149aa77c4936f68537e97f8855..a594d181fa1178c34cf477e13d700f7b37e72e21
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -798,16 +798,17 @@ struct sctp_flush_ctx {
struct sctp_transport *transport;
/* These transports have chunks to send. */
struct list_head transport_list;
+   struct sctp_association *asoc;
+   /* Packet on the current transport above */
+   struct sctp_packet *packet;
gfp_t gfp;
 };
 
 /* transport: current transport */
-static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
+static void sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
   struct sctp_chunk *chunk)
 {
struct sctp_transport *new_transport = chunk->transport;
-   struct sctp_association *asoc = ctx->q->asoc;
-   bool changed = false;
 
if (!new_transport) {
if (!sctp_chunk_is_data(chunk)) {
@@ -825,7 +826,7 @@ static bool sctp_outq_select_transport(struct 
sctp_flush_ctx *ctx,

>transport->ipaddr))
new_transport = ctx->transport;
else
-   new_transport = sctp_assoc_lookup_paddr(asoc,
+   new_transport = 
sctp_assoc_lookup_paddr(ctx->asoc,
  >dest);
}
 
@@ -833,7 +834,7 @@ static bool sctp_outq_select_transport(struct 
sctp_flush_ctx *ctx,
 * use the current active path.
 */
if (!new_transport)
-   new_transport = asoc->peer.active_path;
+   new_transport = ctx->asoc->peer.active_path;
} else {
__u8 type;
 
@@ -858,7 +859,7 @@ static bool sctp_outq_select_transport(struct 
sctp_flush_ctx *ctx,
if (type != SCTP_CID_HEARTBEAT &&
type != SCTP_CID_HEARTBEAT_ACK &&
type != SCTP_CID_ASCONF_ACK)
-   new_transport = asoc->peer.active_path;
+   new_transport = ctx->asoc->peer.active_path;
break;
default:
break;
@@ -867,27 +868,25 @@ static bool sctp_outq_select_transport(struct 
sctp_flush_ctx *ctx,
 
/* Are we switching transports? Take care of transport locks. */
if (new_transport != ctx->transport) {
-   changed = true;
ctx->transport = new_transport;
+   ctx->packet = >transport->packet;
+
if (list_empty(>transport->send_ready))
list_add_tail(>transport->send_ready,
  >transport_list);
 
-   sctp_packet_config(>transport->packet, 
asoc->peer.i.init_tag,
-  asoc->peer.ecn_capable);
+   sctp_packet_config(ctx->packet,
+  ctx->asoc->peer.i.init_tag,
+  ctx->asoc->peer.ecn_capable);
/* We've switched transports, so apply the
 * Burst limit to the new transport.
 */
sctp_transport_burst_limited(ctx->transport);
}
-
-   return changed;
 }
 
 static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 {
-   struct sctp_association *asoc = ctx->q->asoc;
-   struct sctp_packet *packet = NULL;
struct sctp_chunk *chunk, *tmp;
enum sctp_xmit status;
int one_packet, error;
@@ -901,7 +900,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 * NOT use the new IP address as a source for ANY SCTP
 * packet except on carrying an ASCONF Chunk.
 */
-   if (asoc->src_out_of_asoc_ok &&
+   if (ctx->asoc->src_out_of_asoc_ok &&
chunk->chunk_hdr->type != SCTP_CID_ASCONF)
continue;
 
@@ -910,8 +909,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
/* Pick the right transport to use. Should always be true for
 * the first chunk as we don't have a transport by then.
 */
-   if (sctp_outq_select_transport(ctx, chunk))
-   packet = >transport->packet;
+   sctp_outq_select_transport(ctx, chunk);
 
switch (chunk->chunk_hdr->type) {
/*
@@ -926,14 +924,14 @@ static void

[PATCH net-next v2 1/3] sctp: add sctp_flush_ctx, a context struct on outq_flush routines

2018-05-12 Thread Marcelo Ricardo Leitner

With this struct we avoid passing lots of variables around and taking care
of updating the current transport/packet.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 182 +---
 1 file changed, 88 insertions(+), 94 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
a9400cb0cc249affcf2bedfc7a070d9e48843d27..db94a2513dd874149aa77c4936f68537e97f8855
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -791,13 +791,22 @@ static int sctp_packet_singleton(struct sctp_transport 
*transport,
return sctp_packet_transmit(, gfp);
 }
 
-static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
-  struct sctp_association *asoc,
-  struct sctp_transport **transport,
-  struct list_head *transport_list)
+/* Struct to hold the context during sctp outq flush */
+struct sctp_flush_ctx {
+   struct sctp_outq *q;
+   /* Current transport being used. It's NOT the same as curr active one */
+   struct sctp_transport *transport;
+   /* These transports have chunks to send. */
+   struct list_head transport_list;
+   gfp_t gfp;
+};
+
+/* transport: current transport */
+static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
+  struct sctp_chunk *chunk)
 {
struct sctp_transport *new_transport = chunk->transport;
-   struct sctp_transport *curr = *transport;
+   struct sctp_association *asoc = ctx->q->asoc;
bool changed = false;
 
if (!new_transport) {
@@ -812,9 +821,9 @@ static bool sctp_outq_select_transport(struct sctp_chunk 
*chunk,
 * after processing ASCONFs, we may have new
 * transports created.
 */
-   if (curr && sctp_cmp_addr_exact(>dest,
-   >ipaddr))
-   new_transport = curr;
+   if (ctx->transport && sctp_cmp_addr_exact(>dest,
+   
>transport->ipaddr))
+   new_transport = ctx->transport;
else
new_transport = sctp_assoc_lookup_paddr(asoc,
  >dest);
@@ -857,37 +866,33 @@ static bool sctp_outq_select_transport(struct sctp_chunk 
*chunk,
}
 
/* Are we switching transports? Take care of transport locks. */
-   if (new_transport != curr) {
+   if (new_transport != ctx->transport) {
changed = true;
-   curr = new_transport;
-   *transport = curr;
-   if (list_empty(>send_ready))
-   list_add_tail(>send_ready, transport_list);
+   ctx->transport = new_transport;
+   if (list_empty(>transport->send_ready))
+   list_add_tail(>transport->send_ready,
+ >transport_list);
 
-   sctp_packet_config(>packet, asoc->peer.i.init_tag,
+   sctp_packet_config(>transport->packet, 
asoc->peer.i.init_tag,
   asoc->peer.ecn_capable);
/* We've switched transports, so apply the
 * Burst limit to the new transport.
 */
-   sctp_transport_burst_limited(curr);
+   sctp_transport_burst_limited(ctx->transport);
}
 
return changed;
 }
 
-static void sctp_outq_flush_ctrl(struct sctp_outq *q,
-struct sctp_transport **_transport,
-struct list_head *transport_list,
-gfp_t gfp)
+static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 {
-   struct sctp_transport *transport = *_transport;
-   struct sctp_association *asoc = q->asoc;
+   struct sctp_association *asoc = ctx->q->asoc;
struct sctp_packet *packet = NULL;
struct sctp_chunk *chunk, *tmp;
enum sctp_xmit status;
int one_packet, error;
 
-   list_for_each_entry_safe(chunk, tmp, >control_chunk_list, list) {
+   list_for_each_entry_safe(chunk, tmp, >q->control_chunk_list, list) 
{
one_packet = 0;
 
/* RFC 5061, 5.3
@@ -905,11 +910,8 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
/* Pick the right transport to use. Should always be true for
 * the first chunk as we don't have a transport by then.
 */
-   if (sctp_outq_select_transport(chunk, asoc, ,
-  transport_list)) {
-   transport = *_transport;
-   packet = >packet;
-

[PATCH net-next v2 0/3] sctp: Introduce sctp_flush_ctx

2018-05-12 Thread Marcelo Ricardo Leitner

This struct will hold all the context used during the outq flush, so we
don't have to pass lots of pointers all around.

Checked on x86_64, the compiler inlines all these functions and there is no
derreference added because of the struct.

This patchset depends on 'sctp: refactor sctp_outq_flush'

Changes since v1:
- updated to build on top of v2 of 'sctp: refactor sctp_outq_flush'

Marcelo Ricardo Leitner (3):
  sctp: add sctp_flush_ctx, a context struct on outq_flush routines
  sctp: add asoc and packet to sctp_flush_ctx
  sctp: checkpatch fixups

 net/sctp/outqueue.c | 259 
 1 file changed, 119 insertions(+), 140 deletions(-)

--
2.14.3

[PATCH net-next v2 8/8] sctp: rework switch cases in sctp_outq_flush_data

2018-05-12 Thread Marcelo Ricardo Leitner

Remove an inner one, which tended to be error prone due to the cascading
and it can be replaced by a simple if ().

Rework the outer one so that the actual flush code is not inside it. Now
we first validate if we can or cannot send data, return if not, and then
the flush code.

Suggested-by: Xin Long 
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 191 +---
 1 file changed, 93 insertions(+), 98 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
8173dd26f5878cbf67dd7e162ac5e6b18d9a3332..a9400cb0cc249affcf2bedfc7a070d9e48843d27
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1058,122 +1058,117 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 * chunk.
 */
if (!packet || !packet->has_cookie_echo)
-   break;
+   return;
 
/* fallthru */
case SCTP_STATE_ESTABLISHED:
case SCTP_STATE_SHUTDOWN_PENDING:
case SCTP_STATE_SHUTDOWN_RECEIVED:
-   /*
-* RFC 2960 6.1  Transmission of DATA Chunks
-*
-* C) When the time comes for the sender to transmit,
-* before sending new DATA chunks, the sender MUST
-* first transmit any outstanding DATA chunks which
-* are marked for retransmission (limited by the
-* current cwnd).
-*/
-   if (!list_empty(>retransmit)) {
-   if (!sctp_outq_flush_rtx(q, _transport, transport_list,
-rtx_timeout, gfp))
-   break;
-   /* We may have switched current transport */
-   transport = *_transport;
-   packet = >packet;
-   }
+   break;
 
-   /* Apply Max.Burst limitation to the current transport in
-* case it will be used for new data.  We are going to
-* rest it before we return, but we want to apply the limit
-* to the currently queued data.
-*/
-   if (transport)
-   sctp_transport_burst_limited(transport);
-
-   /* Finally, transmit new packets.  */
-   while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
-   __u32 sid = ntohs(chunk->subh.data_hdr->stream);
-
-   /* Has this chunk expired? */
-   if (sctp_chunk_abandoned(chunk)) {
-   sctp_sched_dequeue_done(q, chunk);
-   sctp_chunk_fail(chunk, 0);
-   sctp_chunk_free(chunk);
-   continue;
-   }
+   default:
+   /* Do nothing. */
+   return;
+   }
 
-   if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
-   sctp_outq_head_data(q, chunk);
-   break;
-   }
+   /*
+* RFC 2960 6.1  Transmission of DATA Chunks
+*
+* C) When the time comes for the sender to transmit,
+* before sending new DATA chunks, the sender MUST
+* first transmit any outstanding DATA chunks which
+* are marked for retransmission (limited by the
+* current cwnd).
+*/
+   if (!list_empty(>retransmit)) {
+   if (!sctp_outq_flush_rtx(q, _transport, transport_list,
+rtx_timeout, gfp))
+   return;
+   /* We may have switched current transport */
+   transport = *_transport;
+   packet = >packet;
+   }
 
-   if (sctp_outq_select_transport(chunk, asoc, _transport,
-  transport_list)) {
-   transport = *_transport;
-   packet = >packet;
-   }
+   /* Apply Max.Burst limitation to the current transport in
+* case it will be used for new data.  We are going to
+* rest it before we return, but we want to apply the limit
+* to the currently queued data.
+*/
+   if (transport)
+   sctp_transport_burst_limited(transport);
 
-   pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x 
skb->head:%p "
-"skb->users:%d\n",
-__func__, q, chunk, chunk && chunk->chunk_hdr ?
-
sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
-"illegal chunk", 
ntohl(chunk->subh.data_hdr->tsn),
-

[PATCH net-next v2 1/8] sctp: add sctp_packet_singleton

2018-05-12 Thread Marcelo Ricardo Leitner

Factor out the code for generating singletons. It's used only once, but
helps to keep the context contained.

The const variables are to ease the reading of subsequent calls in there.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
dee7cbd5483149024f2f3195db2fe4d473b1a00a..300bd0dfc7c14c9df579dbe2f9e78dd8356ae1a3
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -776,6 +776,20 @@ void sctp_outq_uncork(struct sctp_outq *q, gfp_t gfp)
sctp_outq_flush(q, 0, gfp);
 }
 
+static int sctp_packet_singleton(struct sctp_transport *transport,
+struct sctp_chunk *chunk, gfp_t gfp)
+{
+   const struct sctp_association *asoc = transport->asoc;
+   const __u16 sport = asoc->base.bind_addr.port;
+   const __u16 dport = asoc->peer.port;
+   const __u32 vtag = asoc->peer.i.init_tag;
+   struct sctp_packet singleton;
+
+   sctp_packet_init(, transport, sport, dport);
+   sctp_packet_config(, vtag, 0);
+   sctp_packet_append_chunk(, chunk);
+   return sctp_packet_transmit(, gfp);
+}
 
 /*
  * Try to flush an outqueue.
@@ -789,10 +803,7 @@ void sctp_outq_uncork(struct sctp_outq *q, gfp_t gfp)
 static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 {
struct sctp_packet *packet;
-   struct sctp_packet singleton;
struct sctp_association *asoc = q->asoc;
-   __u16 sport = asoc->base.bind_addr.port;
-   __u16 dport = asoc->peer.port;
__u32 vtag = asoc->peer.i.init_tag;
struct sctp_transport *transport = NULL;
struct sctp_transport *new_transport;
@@ -905,10 +916,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
case SCTP_CID_INIT:
case SCTP_CID_INIT_ACK:
case SCTP_CID_SHUTDOWN_COMPLETE:
-   sctp_packet_init(, transport, sport, dport);
-   sctp_packet_config(, vtag, 0);
-   sctp_packet_append_chunk(, chunk);
-   error = sctp_packet_transmit(, gfp);
+   error = sctp_packet_singleton(transport, chunk, gfp);
if (error < 0) {
asoc->base.sk->sk_err = -error;
return;
-- 
2.14.3

[PATCH net-next v2 7/8] sctp: make use of gfp on retransmissions

2018-05-12 Thread Marcelo Ricardo Leitner

Retransmissions may be triggered when in user context, so lets make use
of gfp.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
3b738fdb08b9c596e6d4d4b18bef645187e0da4a..8173dd26f5878cbf67dd7e162ac5e6b18d9a3332
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -608,7 +608,7 @@ void sctp_retransmit(struct sctp_outq *q, struct 
sctp_transport *transport,
  * The return value is a normal kernel error return value.
  */
 static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
-int rtx_timeout, int *start_timer)
+int rtx_timeout, int *start_timer, gfp_t gfp)
 {
struct sctp_transport *transport = pkt->transport;
struct sctp_chunk *chunk, *chunk1;
@@ -684,12 +684,12 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, 
struct sctp_packet *pkt,
 * control chunks are already freed so there
 * is nothing we can do.
 */
-   sctp_packet_transmit(pkt, GFP_ATOMIC);
+   sctp_packet_transmit(pkt, gfp);
goto redo;
}
 
/* Send this packet.  */
-   error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+   error = sctp_packet_transmit(pkt, gfp);
 
/* If we are retransmitting, we should only
 * send a single packet.
@@ -705,7 +705,7 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, 
struct sctp_packet *pkt,
 
case SCTP_XMIT_RWND_FULL:
/* Send this packet. */
-   error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+   error = sctp_packet_transmit(pkt, gfp);
 
/* Stop sending DATA as there is no more room
 * at the receiver.
@@ -715,7 +715,7 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, 
struct sctp_packet *pkt,
 
case SCTP_XMIT_DELAY:
/* Send this packet. */
-   error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+   error = sctp_packet_transmit(pkt, gfp);
 
/* Stop sending DATA because of nagle delay. */
done = 1;
@@ -991,7 +991,7 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 static bool sctp_outq_flush_rtx(struct sctp_outq *q,
struct sctp_transport **_transport,
struct list_head *transport_list,
-   int rtx_timeout)
+   int rtx_timeout, gfp_t gfp)
 {
struct sctp_transport *transport = *_transport;
struct sctp_packet *packet = transport ? >packet : NULL;
@@ -1015,7 +1015,8 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q,
   asoc->peer.ecn_capable);
}
 
-   error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, _timer);
+   error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, _timer,
+ gfp);
if (error < 0)
asoc->base.sk->sk_err = -error;
 
@@ -1074,7 +1075,7 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 */
if (!list_empty(>retransmit)) {
if (!sctp_outq_flush_rtx(q, _transport, transport_list,
-rtx_timeout))
+rtx_timeout, gfp))
break;
/* We may have switched current transport */
transport = *_transport;
-- 
2.14.3

[PATCH net-next v2 4/8] sctp: move outq data rtx code out of sctp_outq_flush

2018-05-12 Thread Marcelo Ricardo Leitner

This patch renames current sctp_outq_flush_rtx to __sctp_outq_flush_rtx
and create a new sctp_outq_flush_rtx, with the code that was on
sctp_outq_flush. Again, the idea is to have functions with small and
defined objectives.

Yes, there is an open-coded path selection in the now sctp_outq_flush_rtx.
That is kept as is for now because it may be very different when we
implement retransmission path selection algorithms for CMT-SCTP.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 101 ++--
 1 file changed, 58 insertions(+), 43 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
800202c68cb89f1086ee7d3a4493dc752c8bf6ac..6d7ee372a9d6b8e68a759277830d5334ec992d47
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -601,14 +601,14 @@ void sctp_retransmit(struct sctp_outq *q, struct 
sctp_transport *transport,
 
 /*
  * Transmit DATA chunks on the retransmit queue.  Upon return from
- * sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which
+ * __sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which
  * need to be transmitted by the caller.
  * We assume that pkt->transport has already been set.
  *
  * The return value is a normal kernel error return value.
  */
-static int sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
-  int rtx_timeout, int *start_timer)
+static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
+int rtx_timeout, int *start_timer)
 {
struct sctp_transport *transport = pkt->transport;
struct sctp_chunk *chunk, *chunk1;
@@ -987,6 +987,57 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
}
 }
 
+/* Returns false if new data shouldn't be sent */
+static bool sctp_outq_flush_rtx(struct sctp_outq *q,
+   struct sctp_transport **_transport,
+   struct list_head *transport_list,
+   int rtx_timeout)
+{
+   struct sctp_transport *transport = *_transport;
+   struct sctp_packet *packet = transport ? >packet : NULL;
+   struct sctp_association *asoc = q->asoc;
+   int error, start_timer = 0;
+
+   if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
+   return false;
+
+   if (transport != asoc->peer.retran_path) {
+   /* Switch transports & prepare the packet.  */
+   transport = asoc->peer.retran_path;
+   *_transport = transport;
+
+   if (list_empty(>send_ready))
+   list_add_tail(>send_ready,
+ transport_list);
+
+   packet = >packet;
+   sctp_packet_config(packet, asoc->peer.i.init_tag,
+  asoc->peer.ecn_capable);
+   }
+
+   error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, _timer);
+   if (error < 0)
+   asoc->base.sk->sk_err = -error;
+
+   if (start_timer) {
+   sctp_transport_reset_t3_rtx(transport);
+   transport->last_time_sent = jiffies;
+   }
+
+   /* This can happen on COOKIE-ECHO resend.  Only
+* one chunk can get bundled with a COOKIE-ECHO.
+*/
+   if (packet->has_cookie_echo)
+   return false;
+
+   /* Don't send new data if there is still data
+* waiting to retransmit.
+*/
+   if (!list_empty(>retransmit))
+   return false;
+
+   return true;
+}
 /*
  * Try to flush an outqueue.
  *
@@ -1000,12 +1051,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 {
struct sctp_packet *packet;
struct sctp_association *asoc = q->asoc;
-   __u32 vtag = asoc->peer.i.init_tag;
struct sctp_transport *transport = NULL;
struct sctp_chunk *chunk;
enum sctp_xmit status;
int error = 0;
-   int start_timer = 0;
 
/* These transports have chunks to send. */
struct list_head transport_list;
@@ -1052,45 +1101,11 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 * current cwnd).
 */
if (!list_empty(>retransmit)) {
-   if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
-   goto sctp_flush_out;
-   if (transport == asoc->peer.retran_path)
-   goto retran;
-
-   /* Switch transports & prepare the packet.  */
-
-   transport = asoc->peer.retran_path;
-
-   if (list_empty(>send_ready)) {
-   list_add_tail(>send_ready,
- _list);
-   }
-
+   if (!sctp_outq_flush_rtx(q, , _list,
+

[PATCH net-next v2 6/8] sctp: move transport flush code out of sctp_outq_flush

2018-05-12 Thread Marcelo Ricardo Leitner

To the new sctp_outq_flush_transports.

Comment on Nagle is outdated and removed. Nagle is performed earlier, while
checking if the chunk fits the packet: if the outq length is not enough to
fill the packet, it returns SCTP_XMIT_DELAY.

So by when it gets to sctp_outq_flush_transports, it has to go through all
enlisted transports.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 56 +
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
7522188107792643f3bb5f00e5c254b00e91ef12..3b738fdb08b9c596e6d4d4b18bef645187e0da4a
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1176,6 +1176,29 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
}
 }
 
+static void sctp_outq_flush_transports(struct sctp_outq *q,
+  struct list_head *transport_list,
+  gfp_t gfp)
+{
+   struct list_head *ltransport;
+   struct sctp_packet *packet;
+   struct sctp_transport *t;
+   int error = 0;
+
+   while ((ltransport = sctp_list_dequeue(transport_list)) != NULL) {
+   t = list_entry(ltransport, struct sctp_transport, send_ready);
+   packet = >packet;
+   if (!sctp_packet_empty(packet)) {
+   error = sctp_packet_transmit(packet, gfp);
+   if (error < 0)
+   q->asoc->base.sk->sk_err = -error;
+   }
+
+   /* Clear the burst limited state, if any */
+   sctp_transport_burst_reset(t);
+   }
+}
+
 /*
  * Try to flush an outqueue.
  *
@@ -1187,17 +1210,10 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
  */
 static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 {
-   struct sctp_packet *packet;
-   struct sctp_association *asoc = q->asoc;
+   /* Current transport being used. It's NOT the same as curr active one */
struct sctp_transport *transport = NULL;
-   int error = 0;
-
/* These transports have chunks to send. */
-   struct list_head transport_list;
-   struct list_head *ltransport;
-
-   INIT_LIST_HEAD(_list);
-   packet = NULL;
+   LIST_HEAD(transport_list);
 
/*
 * 6.10 Bundling
@@ -1218,27 +1234,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 
 sctp_flush_out:
 
-   /* Before returning, examine all the transports touched in
-* this call.  Right now, we bluntly force clear all the
-* transports.  Things might change after we implement Nagle.
-* But such an examination is still required.
-*
-* --xguo
-*/
-   while ((ltransport = sctp_list_dequeue(_list)) != NULL) {
-   struct sctp_transport *t = list_entry(ltransport,
- struct sctp_transport,
- send_ready);
-   packet = >packet;
-   if (!sctp_packet_empty(packet)) {
-   error = sctp_packet_transmit(packet, gfp);
-   if (error < 0)
-   asoc->base.sk->sk_err = -error;
-   }
-
-   /* Clear the burst limited state, if any */
-   sctp_transport_burst_reset(t);
-   }
+   sctp_outq_flush_transports(q, _list, gfp);
 }
 
 /* Update unack_data based on the incoming SACK chunk */
-- 
2.14.3

[PATCH net-next v2 2/8] sctp: factor out sctp_outq_select_transport

2018-05-12 Thread Marcelo Ricardo Leitner

We had two spots doing such complex operation and they were very close to
each other, a bit more tailored to here or there.

This patch unifies these under the same function,
sctp_outq_select_transport, which knows how to handle control chunks and
original transmissions (but not retransmissions).

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 187 +---
 1 file changed, 90 insertions(+), 97 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
300bd0dfc7c14c9df579dbe2f9e78dd8356ae1a3..bda50596d4bfebeac03966c5a161473df1c1986a
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -791,6 +791,90 @@ static int sctp_packet_singleton(struct sctp_transport 
*transport,
return sctp_packet_transmit(, gfp);
 }
 
+static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
+  struct sctp_association *asoc,
+  struct sctp_transport **transport,
+  struct list_head *transport_list)
+{
+   struct sctp_transport *new_transport = chunk->transport;
+   struct sctp_transport *curr = *transport;
+   bool changed = false;
+
+   if (!new_transport) {
+   if (!sctp_chunk_is_data(chunk)) {
+   /*
+* If we have a prior transport pointer, see if
+* the destination address of the chunk
+* matches the destination address of the
+* current transport.  If not a match, then
+* try to look up the transport with a given
+* destination address.  We do this because
+* after processing ASCONFs, we may have new
+* transports created.
+*/
+   if (curr && sctp_cmp_addr_exact(>dest,
+   >ipaddr))
+   new_transport = curr;
+   else
+   new_transport = sctp_assoc_lookup_paddr(asoc,
+ >dest);
+   }
+
+   /* if we still don't have a new transport, then
+* use the current active path.
+*/
+   if (!new_transport)
+   new_transport = asoc->peer.active_path;
+   } else {
+   __u8 type;
+
+   switch (new_transport->state) {
+   case SCTP_INACTIVE:
+   case SCTP_UNCONFIRMED:
+   case SCTP_PF:
+   /* If the chunk is Heartbeat or Heartbeat Ack,
+* send it to chunk->transport, even if it's
+* inactive.
+*
+* 3.3.6 Heartbeat Acknowledgement:
+* ...
+* A HEARTBEAT ACK is always sent to the source IP
+* address of the IP datagram containing the
+* HEARTBEAT chunk to which this ack is responding.
+* ...
+*
+* ASCONF_ACKs also must be sent to the source.
+*/
+   type = chunk->chunk_hdr->type;
+   if (type != SCTP_CID_HEARTBEAT &&
+   type != SCTP_CID_HEARTBEAT_ACK &&
+   type != SCTP_CID_ASCONF_ACK)
+   new_transport = asoc->peer.active_path;
+   break;
+   default:
+   break;
+   }
+   }
+
+   /* Are we switching transports? Take care of transport locks. */
+   if (new_transport != curr) {
+   changed = true;
+   curr = new_transport;
+   *transport = curr;
+   if (list_empty(>send_ready))
+   list_add_tail(>send_ready, transport_list);
+
+   sctp_packet_config(>packet, asoc->peer.i.init_tag,
+  asoc->peer.ecn_capable);
+   /* We've switched transports, so apply the
+* Burst limit to the new transport.
+*/
+   sctp_transport_burst_limited(curr);
+   }
+
+   return changed;
+}
+
 /*
  * Try to flush an outqueue.
  *
@@ -806,7 +890,6 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
struct sctp_association *asoc = q->asoc;
__u32 vtag = asoc->peer.i.init_tag;
struct sctp_transport *transport = NULL;
-   struct sctp_transport *new_transport;
struct sctp_chunk *chunk, *tmp;
enum sctp_xmit status;
int error = 0;
@@ -843,68 +926,12 @@ static void

[PATCH net-next v2 5/8] sctp: move flushing of data chunks out of sctp_outq_flush

2018-05-12 Thread Marcelo Ricardo Leitner

To the new sctp_outq_flush_data. Again, smaller functions and with well
defined objectives.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 148 ++--
 1 file changed, 75 insertions(+), 73 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
6d7ee372a9d6b8e68a759277830d5334ec992d47..7522188107792643f3bb5f00e5c254b00e91ef12
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1038,45 +1038,17 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q,
 
return true;
 }
-/*
- * Try to flush an outqueue.
- *
- * Description: Send everything in q which we legally can, subject to
- * congestion limitations.
- * * Note: This function can be called from multiple contexts so appropriate
- * locking concerns must be made.  Today we use the sock lock to protect
- * this function.
- */
-static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+
+static void sctp_outq_flush_data(struct sctp_outq *q,
+struct sctp_transport **_transport,
+struct list_head *transport_list,
+int rtx_timeout, gfp_t gfp)
 {
-   struct sctp_packet *packet;
+   struct sctp_transport *transport = *_transport;
+   struct sctp_packet *packet = transport ? >packet : NULL;
struct sctp_association *asoc = q->asoc;
-   struct sctp_transport *transport = NULL;
struct sctp_chunk *chunk;
enum sctp_xmit status;
-   int error = 0;
-
-   /* These transports have chunks to send. */
-   struct list_head transport_list;
-   struct list_head *ltransport;
-
-   INIT_LIST_HEAD(_list);
-   packet = NULL;
-
-   /*
-* 6.10 Bundling
-*   ...
-*   When bundling control chunks with DATA chunks, an
-*   endpoint MUST place control chunks first in the outbound
-*   SCTP packet.  The transmitter MUST transmit DATA chunks
-*   within a SCTP packet in increasing order of TSN.
-*   ...
-*/
-
-   sctp_outq_flush_ctrl(q, , _list, gfp);
-
-   if (q->asoc->src_out_of_asoc_ok)
-   goto sctp_flush_out;
 
/* Is it OK to send data chunks?  */
switch (asoc->state) {
@@ -1101,10 +1073,11 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 * current cwnd).
 */
if (!list_empty(>retransmit)) {
-   if (!sctp_outq_flush_rtx(q, , _list,
+   if (!sctp_outq_flush_rtx(q, _transport, transport_list,
 rtx_timeout))
break;
/* We may have switched current transport */
+   transport = *_transport;
packet = >packet;
}
 
@@ -1130,12 +1103,14 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 
if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
sctp_outq_head_data(q, chunk);
-   goto sctp_flush_out;
+   break;
}
 
-   if (sctp_outq_select_transport(chunk, asoc, ,
-  _list))
+   if (sctp_outq_select_transport(chunk, asoc, _transport,
+  transport_list)) {
+   transport = *_transport;
packet = >packet;
+   }
 
pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x 
skb->head:%p "
 "skb->users:%d\n",
@@ -1147,8 +1122,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 
/* Add the chunk to the packet.  */
status = sctp_packet_transmit_chunk(packet, chunk, 0, 
gfp);
-
switch (status) {
+   case SCTP_XMIT_OK:
+   break;
+
case SCTP_XMIT_PMTU_FULL:
case SCTP_XMIT_RWND_FULL:
case SCTP_XMIT_DELAY:
@@ -1160,41 +1137,25 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 status);
 
sctp_outq_head_data(q, chunk);
-   goto sctp_flush_out;
-
-   case SCTP_XMIT_OK:
-   /* The sender is in the SHUTDOWN-PENDING state,
-* The sender MAY set the I-bit in the DATA
-* chunk header.
-*/
-

[PATCH net-next v2 3/8] sctp: move the flush of ctrl chunks into its own function

2018-05-12 Thread Marcelo Ricardo Leitner

Named sctp_outq_flush_ctrl and, with that, keep the contexts contained.

One small fix embedded is the reset of one_packet at every iteration.
This allows bundling of some control chunks in case they were preceded by
another control chunk that cannot be bundled.

Other than this, it has the same behavior.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 89 -
 1 file changed, 54 insertions(+), 35 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
bda50596d4bfebeac03966c5a161473df1c1986a..800202c68cb89f1086ee7d3a4493dc752c8bf6ac
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -875,45 +875,21 @@ static bool sctp_outq_select_transport(struct sctp_chunk 
*chunk,
return changed;
 }

-/*
- * Try to flush an outqueue.
- *
- * Description: Send everything in q which we legally can, subject to
- * congestion limitations.
- * * Note: This function can be called from multiple contexts so appropriate
- * locking concerns must be made.  Today we use the sock lock to protect
- * this function.
- */
-static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+static void sctp_outq_flush_ctrl(struct sctp_outq *q,
+struct sctp_transport **_transport,
+struct list_head *transport_list,
+gfp_t gfp)
 {
-   struct sctp_packet *packet;
+   struct sctp_transport *transport = *_transport;
struct sctp_association *asoc = q->asoc;
-   __u32 vtag = asoc->peer.i.init_tag;
-   struct sctp_transport *transport = NULL;
+   struct sctp_packet *packet = NULL;
struct sctp_chunk *chunk, *tmp;
enum sctp_xmit status;
-   int error = 0;
-   int start_timer = 0;
-   int one_packet = 0;
-
-   /* These transports have chunks to send. */
-   struct list_head transport_list;
-   struct list_head *ltransport;
-
-   INIT_LIST_HEAD(_list);
-   packet = NULL;
-
-   /*
-* 6.10 Bundling
-*   ...
-*   When bundling control chunks with DATA chunks, an
-*   endpoint MUST place control chunks first in the outbound
-*   SCTP packet.  The transmitter MUST transmit DATA chunks
-*   within a SCTP packet in increasing order of TSN.
-*   ...
-*/
+   int one_packet, error;

list_for_each_entry_safe(chunk, tmp, >control_chunk_list, list) {
+   one_packet = 0;
+
/* RFC 5061, 5.3
 * F1) This means that until such time as the ASCONF
 * containing the add is acknowledged, the sender MUST
@@ -930,8 +906,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
 * the first chunk as we don't have a transport by then.
 */
if (sctp_outq_select_transport(chunk, asoc, ,
-  _list))
+  transport_list)) {
+   transport = *_transport;
packet = >packet;
+   }

switch (chunk->chunk_hdr->type) {
/*
@@ -954,6 +932,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
if (sctp_test_T_bit(chunk))
packet->vtag = asoc->c.my_vtag;
/* fallthru */
+
/* The following chunks are "response" chunks, i.e.
 * they are generated in response to something we
 * received.  If we are sending these, then we can
@@ -979,7 +958,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
case SCTP_CID_RECONF:
status = sctp_packet_transmit_chunk(packet, chunk,
one_packet, gfp);
-   if (status  != SCTP_XMIT_OK) {
+   if (status != SCTP_XMIT_OK) {
/* put the chunk back */
list_add(>list, >control_chunk_list);
break;
@@ -1006,6 +985,46 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
rtx_timeout, gfp_t gfp)
BUG();
}
}
+}
+
+/*
+ * Try to flush an outqueue.
+ *
+ * Description: Send everything in q which we legally can, subject to
+ * congestion limitations.
+ * * Note: This function can be called from multiple contexts so appropriate
+ * locking concerns must be made.  Today we use the sock lock to protect
+ * this function.
+ */
+static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+{
+   struct sctp_packet *packet;
+   struct sctp_association *asoc = q->asoc;
+   __u32 vtag =

[PATCH net-next v2 0/8] sctp: refactor sctp_outq_flush

2018-05-12 Thread Marcelo Ricardo Leitner

Currently sctp_outq_flush does many different things and arguably
unrelated, such as doing transport selection and outq dequeueing.

This patchset refactors it into smaller and more dedicated functions.
The end behavior should be the same.

The next patchset will rework the function parameters.

Changes since v1:
- fix build issues on patches 3 and 4, and updated 5 and 8 because of
  it.

Marcelo Ricardo Leitner (8):
  sctp: add sctp_packet_singleton
  sctp: factor out sctp_outq_select_transport
  sctp: move the flush of ctrl chunks into its own function
  sctp: move outq data rtx code out of sctp_outq_flush
  sctp: move flushing of data chunks out of sctp_outq_flush
  sctp: move transport flush code out of sctp_outq_flush
  sctp: make use of gfp on retransmissions
  sctp: rework switch cases in sctp_outq_flush_data

 net/sctp/outqueue.c | 593 +++-
 1 file changed, 311 insertions(+), 282 deletions(-)

--
2.14.3

Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()

2018-05-12 Thread Andrew Lunn

On Fri, May 11, 2018 at 03:22:39PM -0700, Paul Burton wrote:
> Hi Andrew,
> 
> On Fri, May 11, 2018 at 09:24:46PM +0200, Andrew Lunn wrote:
> > > I could reorder the probe function a little to initialize the PHY before
> > > performing the MAC reset, drop this patch and the AR803X hibernation
> > > stuff from patch 2 if you like. But again, I can't actually test the
> > > result on the affected hardware.
> > 
> > Hi Paul
> > 
> > I don't like a MAC driver poking around in PHY registers.
> > 
> > So if you can rearrange the code, that would be great.
> > 
> >Thanks
> > Andrew
> 
> Sure, I'll give it a shot.
> 
> After digging into it I see 2 ways to go here:
> 
>   1) We could just always reset the PHY before we reset the MAC. That
>  would give us a window of however long the PHY takes to enter its
>  low power state & stop providing the RX clock during which we'd
>  need the MAC reset to complete. In the case of the AR8031 that's
>  "about 10 seconds" according to its data sheet. In this particular
>  case that feels like plenty, but it does also feel a bit icky to
>  rely on the timing chosen by the PHY manufacturer to line up with
>  that of the MAC reset.
> 
>   2) We could introduce a couple of new phy_* functions to disable &
>  enable low power states like the AR8031's hibernation feature, by
>  calling new function pointers in struct phy_driver. Then pch_gbe &
>  other MACs could call those to have the PHY driver disable
>  hibernation at times where we know we'll need the RX clock and
>  re-enable it afterwards.

Hi Paul

When there is no link, you don't need the MAC running. My assumption
is that the PHY is designed around that idea, you leave the MAC idle
until there is a link. When the phylib calls the link_change handler,
the MAC should then be started/stopped depending on the state of the
link. You are then guaranteed to have the clock when you need it.

I've no idea how easy this is to implement given the current code...

 Andrew

Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto

2018-05-12 Thread Thomas Gleixner

On Sat, 12 May 2018, Alexei Starovoitov wrote:
> On Thu, May 10, 2018 at 10:58 AM, Alexei Starovoitov
>  wrote:
> > I see no option, but to fix the kernel.
> > Regardless whether it's called user space breakage or kernel breakage.

There is a big difference. If you are abusing a kernel internal header in a
user space tool, then there is absolutely ZERO excuse for requesting that
the header in question has to be modified.

But yes, the situation is slightly different here because tools which
create trace event magic _HAVE_ to pull in kernel headers. At the same time
these tools depend on a compiler which failed to implement asm_goto for
fricking 8 years.

So while Boris is right, that nothing has to fiddle with a kernel only
header, I grumpily agree with you that we need a workaround in the kernel
for this particular issue.

> could you please ack the patch or better yet take it into tip tree
> and send to Linus asap ?

Nope. The patch is a horrible hack.

Why the heck do we need that extra fugly define? That has exactly zero
value simply because we already have a define which denotes availablity of
ASM GOTO: CC_HAVE_ASM_GOTO.

In case of samples/bpf/ and libbcc the compile does not go through the
arch/x86 Makefile which stops the build anyway when ASM_GOTO is
missing. Those builds merily pull in the headers and have their own build
magic, which is broken btw: Changing a kernel header which gets pulled into
the build does not rebuild anything in samples/bpf. Qualitee..

So we can just use CC_HAVE_ASM_GOTO and be done with it.

But we also want the tools which needs this to be aware of this. Peter
requested -D __BPF__ several times which got ignored. It's not too much of
a request to add that.

Find a patch which deos exactly this for samples/bpf, but also allows other
tools to build with a warning emitted so they get fixed.

Thanks,

tglx

8<
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -140,6 +140,20 @@ extern void clear_cpu_cap(struct cpuinfo

 #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)

+#ifndef CC_HAVE_ASM_GOTO
+
+/*
+ * Workaround for the sake of BPF compilation which utilizes kernel
+ * headers, but clang does not support ASM GOTO and fails the build.
+ */
+#ifndef __BPF__
+#warning "Compiler lacks ASM_GOTO support. Add -D __BPF__ to your compiler 
arguments"
+#endif
+
+#define static_cpu_has(bit)boot_cpu_has(bit)
+
+#else
+
 /*
  * Static testing of CPU features.  Used the same as boot_cpu_has().
  * These will statically patch the target code for additional
@@ -195,6 +209,7 @@ static __always_inline __pure bool _stat
boot_cpu_has(bit) : \
_static_cpu_has(bit)\
 )
+#endif

 #define cpu_has_bug(c, bit)cpu_has(c, (bit))
 #define set_cpu_bug(c, bit)set_cpu_cap(c, (bit))
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -255,7 +255,7 @@ verify_target_bpf: verify_cmds
 $(obj)/%.o: $(src)/%.c
$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
-I$(srctree)/tools/testing/selftests/bpf/ \
-   -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
+   -D__KERNEL__ -D__BPF__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \

Re: KASAN: use-after-free Read in sit_tunnel_xmit

2018-05-12 Thread Eric Biggers

On Thu, Feb 15, 2018 at 04:22:28PM -0800, Cong Wang wrote:
> On Tue, Feb 13, 2018 at 10:48 AM, Dmitry Vyukov  wrote:
> > On Mon, Oct 30, 2017 at 7:41 PM, Cong Wang  wrote:
> >> On Mon, Oct 30, 2017 at 8:34 AM, syzbot
> >> 
> >> wrote:
> >>> Hello,
> >>>
> >>> syzkaller hit the following crash on
> >>> 4dc12ffeaeac939097a3f55c881d3dc3523dff0c
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> >>> compiler: gcc (GCC) 7.1.1 20170620
> >>> .config is attached
> >>> Raw console output is attached.
> >>>
> >>> skbuff: bad partial csum: csum=53081/14726 len=2273
> >>> ==
> >>> BUG: KASAN: use-after-free in ipv6_get_dsfield include/net/dsfield.h:23
> >>> [inline]
> >>> BUG: KASAN: use-after-free in ipip6_tunnel_xmit net/ipv6/sit.c:968 
> >>> [inline]
> >>> BUG: KASAN: use-after-free in sit_tunnel_xmit+0x2a41/0x3130
> >>> net/ipv6/sit.c:1016
> >>> Read of size 2 at addr 8801c64afd00 by task syz-executor3/16942
> >>>
> >>> CPU: 0 PID: 16942 Comm: syz-executor3 Not tainted 4.14.0-rc5+ #97
> >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >>> Google 01/01/2011
> >>> Call Trace:
> >>>  __dump_stack lib/dump_stack.c:16 [inline]
> >>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
> >>>  print_address_description+0x73/0x250 mm/kasan/report.c:252
> >>>  kasan_report_error mm/kasan/report.c:351 [inline]
> >>>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
> >>>  __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
> >>>  ipv6_get_dsfield include/net/dsfield.h:23 [inline]
> >>>  ipip6_tunnel_xmit net/ipv6/sit.c:968 [inline]
> >>>  sit_tunnel_xmit+0x2a41/0x3130 net/ipv6/sit.c:1016
> >>>  __netdev_start_xmit include/linux/netdevice.h:4022 [inline]
> >>>  netdev_start_xmit include/linux/netdevice.h:4031 [inline]
> >>>  xmit_one net/core/dev.c:3008 [inline]
> >>>  dev_hard_start_xmit+0x248/0xac0 net/core/dev.c:3024
> >>>  __dev_queue_xmit+0x17d2/0x2070 net/core/dev.c:3505
> >>>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3538
> >>>  neigh_direct_output+0x15/0x20 net/core/neighbour.c:1390
> >>>  neigh_output include/net/neighbour.h:481 [inline]
> >>>  ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
> >>>  ip6_fragment+0x25ae/0x3420 net/ipv6/ip6_output.c:723
> >>>  ip6_finish_output+0x319/0x920 net/ipv6/ip6_output.c:144
> >>>  NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> >>>  ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
> >>>  dst_output include/net/dst.h:459 [inline]
> >>>  ip6_local_out+0x95/0x160 net/ipv6/output_core.c:176
> >>>  ip6_send_skb+0xa1/0x330 net/ipv6/ip6_output.c:1658
> >>>  ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1678
> >>>  rawv6_push_pending_frames net/ipv6/raw.c:616 [inline]
> >>>  rawv6_sendmsg+0x2eb9/0x3e40 net/ipv6/raw.c:935
> >>>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
> >>>  sock_sendmsg_nosec net/socket.c:633 [inline]
> >>>  sock_sendmsg+0xca/0x110 net/socket.c:643
> >>>  SYSC_sendto+0x352/0x5a0 net/socket.c:1750
> >>>  SyS_sendto+0x40/0x50 net/socket.c:1718
> >>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> >>> RIP: 0033:0x452869
> >>> RSP: 002b:7fe3c12e5be8 EFLAGS: 0212 ORIG_RAX: 002c
> >>> RAX: ffda RBX: 007580d8 RCX: 00452869
> >>> RDX: 07f1 RSI: 2013b7ff RDI: 0014
> >>> RBP: 0161 R08: 204e8fe4 R09: 001c
> >>> R10: 0100 R11: 0212 R12: 006f01b8
> >>> R13:  R14: 7fe3c12e66d4 R15: 0017
> >>>
> >>> Allocated by task 16924:
> >>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >>>  set_track mm/kasan/kasan.c:459 [inline]
> >>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> >>>  __do_kmalloc_node mm/slab.c:3689 [inline]
> >>>  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3703
> >>>  __kmalloc_reserve.isra.40+0x41/0xd0 net/core/skbuff.c:138
> >>>  __alloc_skb+0x13b/0x780 net/core/skbuff.c:206
> >>>  alloc_skb include/linux/skbuff.h:985 [inline]
> >>>  sock_wmalloc+0x140/0x1d0 net/core/sock.c:1932
> >>>  __ip6_append_data.isra.43+0x2681/0x3340 net/ipv6/ip6_output.c:1397
> >>>  ip6_append_data+0x189/0x290 net/ipv6/ip6_output.c:1552
> >>>  rawv6_sendmsg+0x1dd9/0x3e40 net/ipv6/raw.c:928
> >>>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
> >>>  sock_sendmsg_nosec net/socket.c:633 [inline]
> >>>  sock_sendmsg+0xca/0x110 net/socket.c:643
> >>>  SYSC_sendto+0x352/0x5a0 net/socket.c:1750
> >>>  SyS_sendto+0x40/0x50 net/socket.c:1718
> >>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> >>>
> >>> Freed by task 16942:
> >>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >>>  set_track mm/kasan/kasan.c:459 [inline]
> >>>

Re: KASAN: use-after-free Read in sctp_packet_transmit

2018-05-12 Thread Eric Biggers

On Fri, Jan 05, 2018 at 02:07:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 8a4816cad00bf14642f0ed6043b32d29a05006ce
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+5adcca18fca253b4c...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> ==
> BUG: KASAN: use-after-free in sctp_packet_transmit+0x3505/0x3750
> net/sctp/output.c:643
> Read of size 8 at addr 8801bda9fb80 by task modprobe/23740
> 
> CPU: 1 PID: 23740 Comm: modprobe Not tainted 4.15.0-rc5+ #175
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>  sctp_packet_transmit+0x3505/0x3750 net/sctp/output.c:643
>  sctp_outq_flush+0x121b/0x4060 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0x5a/0x70 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1807 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1210 [inline]
>  sctp_do_sm+0x4e0/0x6ed0 net/sctp/sm_sideeffect.c:1181
>  sctp_generate_heartbeat_event+0x292/0x3f0 net/sctp/sm_sideeffect.c:406
>  call_timer_fn+0x228/0x820 kernel/time/timer.c:1320
>  expire_timers kernel/time/timer.c:1357 [inline]
>  __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660
>  run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
>  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904
>  
> RIP: 0010:__preempt_count_add arch/x86/include/asm/preempt.h:76 [inline]
> RIP: 0010:__rcu_read_lock include/linux/rcupdate.h:83 [inline]
> RIP: 0010:rcu_read_lock include/linux/rcupdate.h:629 [inline]
> RIP: 0010:__is_insn_slot_addr+0x8f/0x330 kernel/kprobes.c:303
> RSP: 0018:8801d4937430 EFLAGS: 0283 ORIG_RAX: ff11
> RAX: 8801bf13c000 RBX: 8656dd00 RCX: 8170bd88
> RDX:  RSI:  RDI: 8656dd00
> RBP: 8801d4937518 R08:  R09: 11003a926e67
> R10: 8801d4937300 R11:  R12: 
> R13:  R14: 8801d49374f0 R15: 8801dae230c0
>  is_kprobe_insn_slot include/linux/kprobes.h:318 [inline]
>  kernel_text_address+0x132/0x140 kernel/extable.c:150
>  __kernel_text_address+0xd/0x40 kernel/extable.c:107
>  unwind_get_return_address+0x61/0xa0 arch/x86/kernel/unwind_frame.c:18
>  __save_stack_trace+0x7e/0xd0 arch/x86/kernel/stacktrace.c:45
>  save_stack_trace+0x1a/0x20 arch/x86/kernel/stacktrace.c:60
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3544
>  kmem_cache_zalloc include/linux/slab.h:678 [inline]
>  file_alloc_security security/selinux/hooks.c:369 [inline]
>  selinux_file_alloc_security+0xae/0x190 security/selinux/hooks.c:3454
>  security_file_alloc+0x6d/0xa0 security/security.c:873
>  get_empty_filp+0x189/0x4f0 fs/file_table.c:129
>  path_openat+0xed/0x3530 fs/namei.c:3496
>  do_filp_open+0x25b/0x3b0 fs/namei.c:3554
>  do_sys_open+0x502/0x6d0 fs/open.c:1059
>  SYSC_open fs/open.c:1077 [inline]
>  SyS_open+0x2d/0x40 fs/open.c:1072
>  entry_SYSCALL_64_fastpath+0x23/0x9a
> RIP: 0033:0x7efdff1bb120
> RSP: 002b:7ffde6213c08 EFLAGS: 0246 ORIG_RAX: 0002
> RAX: ffda RBX: 55c34fab4090 RCX: 7efdff1bb120
> RDX: 01b6 RSI: 0008 RDI: 7ffde6213d20
> RBP: 7ffde6214d90 R08: 0008 R09: 0001
> R10:  R11: 0246 R12: 55c34fab4090
> R13: 7ffde6215de0 R14:  R15: 
> 
> Allocated by task 23739:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3544
>  kmem_cache_zalloc include/linux/slab.h:678 [inline]
>  sctp_chunkify+0xce/0x3f0

Re: [PATCH bpf-next 3/4] samples: bpf: fix build after move to compiling full libbpf.a

2018-05-12 Thread Jakub Kicinski

On Fri, 11 May 2018 17:17:28 -0700, Jakub Kicinski wrote:
> There are many ways users may compile samples, some of them got
> broken by commit 5f9380572b4b ("samples: bpf: compile and link
> against full libbpf").  Improve path resolution and make libbpf
> building a dependency of source files to force its build.
> 
> Samples should now again build with any of:
>  cd samples/bpf; make
>  make samples/bpf
>  make -C samples/bpf
>  cd samples/bpf; make O=builddir
>  make samples/bpf O=builddir
>  make -C samples/bpf O=builddir
> 
> Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf")
> Reported-by: Björn Töpel 
> Signed-off-by: Jakub Kicinski 

Unfortunately Björn reports this still doesn't fix the build for him.
Investigating further.

[PATCH] net/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'

2018-05-12 Thread Christophe JAILLET

'out' is allocated with 'kvzalloc()'. 'kvfree()' must be used to free it.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 177e076b8d17..49968a4db758 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -511,7 +511,7 @@ int mlx5_query_nic_vport_system_image_guid(struct 
mlx5_core_dev *mdev,
*system_image_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.system_image_guid);
 
-   kfree(out);
+   kvfree(out);
 
return 0;
 }
-- 
2.17.0

Re: iproute2 - modifying routes in place

2018-05-12 Thread David Ahern

On 5/11/18 4:42 AM, Ryan Whelan wrote:
> `ip route` has 2 subcommands that don't seem to work as expected and i'm
> not sure if its a bug, or if i'm misunderstanding the semantics.

Can you try with ipv6/route-bugs branch in
https://github.com/dsahern/linux

Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto

2018-05-12 Thread Alexei Starovoitov

On Thu, May 10, 2018 at 10:58 AM, Alexei Starovoitov
 wrote:
> I see no option, but to fix the kernel.
> Regardless whether it's called user space breakage or kernel breakage.

Peter,

could you please ack the patch or better yet take it into tip tree
and send to Linus asap ?
rc5 is almost here and we didn't have full test coverage
for more than a month due to this issue.

Thanks

Re: [PATCH bpf-next v5 0/6] ipv6: sr: introduce seg6local End.BPF action

2018-05-12 Thread Mathieu Xhonneux

Sorry for the v4 still throwing warnings from the kbuild bot, this
version should be OK.

2018-05-12 18:25 GMT+01:00 Mathieu Xhonneux :
> As of Linux 4.14, it is possible to define advanced local processing for
> IPv6 packets with a Segment Routing Header through the seg6local LWT
> infrastructure. This LWT implements the network programming principles
> defined in the IETF “SRv6 Network Programming” draft.
>
> The implemented operations are generic, and it would be very interesting to
> be able to implement user-specific seg6local actions, without having to
> modify the kernel directly. To do so, this patchset adds an End.BPF action
> to seg6local, powered by some specific Segment Routing-related helpers,
> which provide SR functionalities that can be applied on the packet. This
> BPF hook would then allow to implement specific actions at native kernel
> speed such as OAM features, advanced SR SDN policies, SRv6 actions like
> Segment Routing Header (SRH) encapsulation depending on the content of
> the packet, etc.
>
> This patchset is divided in 6 patches, whose main features are :
>
> - A new seg6local action End.BPF with the corresponding new BPF program
>   type BPF_PROG_TYPE_LWT_SEG6LOCAL. Such attached BPF program can be
>   passed to the LWT seg6local through netlink, the same way as the LWT
>   BPF hook operates.
> - 3 new BPF helpers for the seg6local BPF hook, allowing to edit/grow/
>   shrink a SRH and apply on a packet some of the generic SRv6 actions.
> - 1 new BPF helper for the LWT BPF IN hook, allowing to add a SRH through
>   encapsulation (via IPv6 encapsulation or inlining if the packet contains
>   already an IPv6 header).
>
> As this patchset adds a new LWT BPF hook, I took into account the result of
> the discussions when the LWT BPF infrastructure got merged. Hence, the
> seg6local BPF hook doesn’t allow write access to skb->data directly, only
> the SRH can be modified through specific helpers, which ensures that the
> integrity of the packet is maintained.
> More details are available in the related patches messages.
>
> The performances of this BPF hook have been assessed with the BPF JIT
> enabled on a Intel Xeon X3440 processors with 4 cores and 8 threads
> clocked at 2.53 GHz. No throughput losses are noted with the seg6local
> BPF hook when the BPF program does nothing (440kpps). Adding a 8-bytes
> TLV (1 call each to bpf_lwt_seg6_adjust_srh and bpf_lwt_seg6_store_bytes)
> drops the throughput to 410kpps, and inlining a SRH via
> bpf_lwt_seg6_action drops the throughput to 420kpps.
> All throughputs are stable.
>
> ---
> v2: move the SRH integrity state from skb->cb to a per-cpu buffer
> v3: - document helpers in man-page style
> - fix kbuild bugs
> - un-break BPF LWT out hook
> - bpf_push_seg6_encap is now static
> - preempt_enable is now called when the packet is dropped in
>   input_action_end_bpf
> v4: fix kbuild bugs when CONFIG_IPV6=m
> v5: fix kbuild sparse warnings when CONFIG_IPV6=m
>
> Thanks.
>
>
> Mathieu Xhonneux (6):
>   ipv6: sr: make seg6.h includable without IPv6
>   ipv6: sr: export function lookup_nexthop
>   bpf: Add IPv6 Segment Routing helpers
>   bpf: Split lwt inout verifier structures
>   ipv6: sr: Add seg6local action End.BPF
>   selftests/bpf: test for seg6local End.BPF action
>
>  include/linux/bpf_types.h |   5 +-
>  include/net/seg6.h|   7 +-
>  include/net/seg6_local.h  |  32 ++
>  include/uapi/linux/bpf.h  |  98 -
>  include/uapi/linux/seg6_local.h   |   3 +
>  kernel/bpf/verifier.c |   1 +
>  net/core/filter.c | 390 ---
>  net/ipv6/Kconfig  |   5 +
>  net/ipv6/seg6_local.c | 180 -
>  tools/include/uapi/linux/bpf.h|  98 -
>  tools/lib/bpf/libbpf.c|   1 +
>  tools/testing/selftests/bpf/Makefile  |   5 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |  12 +
>  tools/testing/selftests/bpf/test_lwt_seg6local.c  | 438 
> ++
>  tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++
>  15 files changed, 1340 insertions(+), 75 deletions(-)
>  create mode 100644 include/net/seg6_local.h
>  create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
>  create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh
>
> --
> 2.16.1
>

[PATCH bpf-next v5 4/6] bpf: Split lwt inout verifier structures

2018-05-12 Thread Mathieu Xhonneux

The new bpf_lwt_push_encap helper should only be accessible within the
LWT BPF IN hook, and not the OUT one, as this may lead to a skb under
panic.

At the moment, both LWT BPF IN and OUT share the same list of helpers,
whose calls are authorized by the verifier. This patch separates the
verifier ops for the IN and OUT hooks, and allows the IN hook to call the
bpf_lwt_push_encap helper.

This patch is also the occasion to put all lwt_*_func_proto functions
together for clarity. At the moment, socks_op_func_proto is in the middle
of lwt_inout_func_proto and lwt_xmit_func_proto.

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/linux/bpf_types.h |  4 +--
 net/core/filter.c | 83 +--
 2 files changed, 54 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index d7df1b323082..cc9d7e031330 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -9,8 +9,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp)
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SKB, cg_skb)
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock)
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_inout)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_inout)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
diff --git a/net/core/filter.c b/net/core/filter.c
index 67b4ab4ec404..71434204b037 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4715,33 +4715,6 @@ xdp_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
}
 }
 
-static const struct bpf_func_proto *
-lwt_inout_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
-{
-   switch (func_id) {
-   case BPF_FUNC_skb_load_bytes:
-   return _skb_load_bytes_proto;
-   case BPF_FUNC_skb_pull_data:
-   return _skb_pull_data_proto;
-   case BPF_FUNC_csum_diff:
-   return _csum_diff_proto;
-   case BPF_FUNC_get_cgroup_classid:
-   return _get_cgroup_classid_proto;
-   case BPF_FUNC_get_route_realm:
-   return _get_route_realm_proto;
-   case BPF_FUNC_get_hash_recalc:
-   return _get_hash_recalc_proto;
-   case BPF_FUNC_perf_event_output:
-   return _skb_event_output_proto;
-   case BPF_FUNC_get_smp_processor_id:
-   return _get_smp_processor_id_proto;
-   case BPF_FUNC_skb_under_cgroup:
-   return _skb_under_cgroup_proto;
-   default:
-   return bpf_base_func_proto(func_id);
-   }
-}
-
 static const struct bpf_func_proto *
 sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -4801,6 +4774,44 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
}
 }
 
+static const struct bpf_func_proto *
+lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_skb_load_bytes:
+   return _skb_load_bytes_proto;
+   case BPF_FUNC_skb_pull_data:
+   return _skb_pull_data_proto;
+   case BPF_FUNC_csum_diff:
+   return _csum_diff_proto;
+   case BPF_FUNC_get_cgroup_classid:
+   return _get_cgroup_classid_proto;
+   case BPF_FUNC_get_route_realm:
+   return _get_route_realm_proto;
+   case BPF_FUNC_get_hash_recalc:
+   return _get_hash_recalc_proto;
+   case BPF_FUNC_perf_event_output:
+   return _skb_event_output_proto;
+   case BPF_FUNC_get_smp_processor_id:
+   return _get_smp_processor_id_proto;
+   case BPF_FUNC_skb_under_cgroup:
+   return _skb_under_cgroup_proto;
+   default:
+   return bpf_base_func_proto(func_id);
+   }
+}
+
+static const struct bpf_func_proto *
+lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_lwt_push_encap:
+   return _lwt_push_encap_proto;
+   default:
+   return lwt_out_func_proto(func_id, prog);
+   }
+}
+
 static const struct bpf_func_proto *
 lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -4832,7 +4843,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
case BPF_FUNC_set_hash_invalid:
return _set_hash_invalid_proto;
default:
-   return lwt_inout_func_proto(func_id, prog);
+   return lwt_out_func_proto(func_id, prog);
}
 }
 
@@ -6405,13 +6416,23 @@ const struct bpf_prog_ops cg_skb_prog_ops = {
.test_run   = bpf_prog_test_run_skb,
 };
 
-const struct bpf_verifier_ops

[PATCH bpf-next v5 5/6] ipv6: sr: Add seg6local action End.BPF

2018-05-12 Thread Mathieu Xhonneux

This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.

Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.

Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.

This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.

The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
 seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
  bpf_lwt_seg6_action helper, the BPF program should return this
  value, as the skb's destination is already set and the default
  lookup should not be performed.
- BPF_DROP : the packet will be dropped.

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/linux/bpf_types.h   |   1 +
 include/uapi/linux/bpf.h|   1 +
 include/uapi/linux/seg6_local.h |   3 +
 kernel/bpf/verifier.c   |   1 +
 net/core/filter.c   |  25 +++
 net/ipv6/seg6_local.c   | 158 +++-
 tools/lib/bpf/libbpf.c  |   1 +
 7 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index cc9d7e031330..6a979f95f986 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -12,6 +12,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_SEG6LOCAL, lwt_seg6local)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0349c91329fd..c6a213075368 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -140,6 +140,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+   BPF_PROG_TYPE_LWT_SEG6LOCAL,
 };
 
 enum bpf_attach_type {
diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
index ef2d8c3e76c1..aadcc11fb918 100644
--- a/include/uapi/linux/seg6_local.h
+++ b/include/uapi/linux/seg6_local.h
@@ -25,6 +25,7 @@ enum {
SEG6_LOCAL_NH6,
SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF,
+   SEG6_LOCAL_BPF,
__SEG6_LOCAL_MAX,
 };
 #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
@@ -59,6 +60,8 @@ enum {
SEG6_LOCAL_ACTION_END_AS= 13,
/* forward to SR-unaware VNF with masquerading */
SEG6_LOCAL_ACTION_END_AM= 14,
+   /* custom BPF action */
+   SEG6_LOCAL_ACTION_END_BPF   = 15,
 
__SEG6_LOCAL_ACTION_MAX,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d92d9c37affd..c6b5eadcad16 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1262,6 +1262,7 @@ static bool may_access_direct_pkt_data(struct 
bpf_verifier_env *env,
switch (env->prog->type) {
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_OUT:
+   case BPF_PROG_TYPE_LWT_SEG6LOCAL:
/* dst_input() and dst_output() can't write for now */
if (t == BPF_WRITE)
return false;
diff --git a/net/core/filter.c b/net/core/filter.c
index 71434204b037..d69771e56d1f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4847,6 +4847,21 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
}
 }
 
+static const struct bpf_func_proto *
+lwt_seg6local_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_lwt_seg6_store_bytes:
+   return _lwt_seg6_store_bytes_proto;
+   case BPF_FUNC_lwt_seg6_action:
+   return _lwt_seg6_action_proto;
+   case BPF_FUNC_lwt_seg6_adjust_srh:
+

[PATCH bpf-next v5 2/6] ipv6: sr: export function lookup_nexthop

2018-05-12 Thread Mathieu Xhonneux

The function lookup_nexthop is essential to implement most of the seg6local
actions. As we want to provide a BPF helper allowing to apply some of these
actions on the packet being processed, the helper should be able to call
this function, hence the need to make it public.

Moreover, if one argument is incorrect or if the next hop can not be found,
an error should be returned by the BPF helper so the BPF program can adapt
its processing of the packet (return an error, properly force the drop,
...). This patch hence makes this function return dst->error to indicate a
possible error.

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/net/seg6.h   |  3 ++-
 include/net/seg6_local.h | 24 
 net/ipv6/seg6_local.c| 20 +++-
 3 files changed, 37 insertions(+), 10 deletions(-)
 create mode 100644 include/net/seg6_local.h

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 70b4cfac52d7..e029e301faa5 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -67,5 +67,6 @@ extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int 
len);
 extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
 int proto);
 extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
-
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+  u32 tbl_id);
 #endif
diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
new file mode 100644
index ..57498b23085d
--- /dev/null
+++ b/include/net/seg6_local.h
@@ -0,0 +1,24 @@
+/*
+ *  SR-IPv6 implementation
+ *
+ *  Authors:
+ *  David Lebrun 
+ *  eBPF support: Mathieu Xhonneux 
+ *
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NET_SEG6_LOCAL_H
+#define _NET_SEG6_LOCAL_H
+
+#include 
+#include 
+
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+  u32 tbl_id);
+
+#endif
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 45722327375a..e9b23fb924ad 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -30,6 +30,7 @@
 #ifdef CONFIG_IPV6_SEG6_HMAC
 #include 
 #endif
+#include 
 #include 
 
 struct seg6_local_lwt;
@@ -140,8 +141,8 @@ static void advance_nextseg(struct ipv6_sr_hdr *srh, struct 
in6_addr *daddr)
*daddr = *addr;
 }
 
-static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
-  u32 tbl_id)
+int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+   u32 tbl_id)
 {
struct net *net = dev_net(skb->dev);
struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -187,6 +188,7 @@ static void lookup_nexthop(struct sk_buff *skb, struct 
in6_addr *nhaddr,
 
skb_dst_drop(skb);
skb_dst_set(skb, dst);
+   return dst->error;
 }
 
 /* regular endpoint function */
@@ -200,7 +202,7 @@ static int input_action_end(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
 
advance_nextseg(srh, _hdr(skb)->daddr);
 
-   lookup_nexthop(skb, NULL, 0);
+   seg6_lookup_nexthop(skb, NULL, 0);
 
return dst_input(skb);
 
@@ -220,7 +222,7 @@ static int input_action_end_x(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
 
advance_nextseg(srh, _hdr(skb)->daddr);
 
-   lookup_nexthop(skb, >nh6, 0);
+   seg6_lookup_nexthop(skb, >nh6, 0);
 
return dst_input(skb);
 
@@ -239,7 +241,7 @@ static int input_action_end_t(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
 
advance_nextseg(srh, _hdr(skb)->daddr);
 
-   lookup_nexthop(skb, NULL, slwt->table);
+   seg6_lookup_nexthop(skb, NULL, slwt->table);
 
return dst_input(skb);
 
@@ -331,7 +333,7 @@ static int input_action_end_dx6(struct sk_buff *skb,
if (!ipv6_addr_any(>nh6))
nhaddr = >nh6;
 
-   lookup_nexthop(skb, nhaddr, 0);
+   seg6_lookup_nexthop(skb, nhaddr, 0);
 
return dst_input(skb);
 drop:
@@ -380,7 +382,7 @@ static int input_action_end_dt6(struct sk_buff *skb,
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
goto drop;
 
-   lookup_nexthop(skb, NULL, slwt->table);
+   seg6_lookup_nexthop(skb, NULL, slwt->table);
 
return dst_input(skb);
 
@@ -406,7 +408,7 @@ static int input_action_end_b6(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
-   lookup_nexthop(skb, NULL, 0);
+   seg6_lookup_nexthop(skb, NULL, 0);
 
return dst_input(skb);

[PATCH bpf-next v5 6/6] selftests/bpf: test for seg6local End.BPF action

2018-05-12 Thread Mathieu Xhonneux

Add a new test for the seg6local End.BPF action. The following helpers
are also tested :

- bpf_lwt_push_encap within the LWT BPF IN hook
- bpf_lwt_seg6_action
- bpf_lwt_seg6_adjust_srh
- bpf_lwt_seg6_store_bytes

A chain of End.BPF actions is built. The SRH is injected through a LWT
BPF IN hook before the chain. Each End.BPF action validates the previous
one, otherwise the packet is dropped.
The test succeeds if the last node in the chain receives the packet and
the UDP datagram contained can be retrieved from userspace.

Signed-off-by: Mathieu Xhonneux 
---
 tools/include/uapi/linux/bpf.h|  98 -
 tools/testing/selftests/bpf/Makefile  |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  12 +
 tools/testing/selftests/bpf/test_lwt_seg6local.c  | 438 ++
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++
 5 files changed, 689 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
 create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 02e4112510f8..c6a213075368 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -140,6 +140,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+   BPF_PROG_TYPE_LWT_SEG6LOCAL,
 };
 
 enum bpf_attach_type {
@@ -1828,7 +1829,6 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
- *
  * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 
flags)
  * Description
  * Do FIB lookup in kernel tables using parameters in *params*.
@@ -1855,6 +1855,90 @@ union bpf_attr {
  * Egress device index on success, 0 if packet needs to continue
  * up the stack for further processing or a negative error in case
  * of failure.
+ *
+ * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+ * Description
+ * Encapsulate the packet associated to *skb* within a Layer 3
+ * protocol header. This header is provided in the buffer at
+ * address *hdr*, with *len* its size in bytes. *type* indicates
+ * the protocol of the header and can be one of:
+ *
+ * **BPF_LWT_ENCAP_SEG6**
+ * IPv6 encapsulation with Segment Routing Header
+ * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
+ * the IPv6 header is computed by the kernel.
+ * **BPF_LWT_ENCAP_SEG6_INLINE**
+ * Only works if *skb* contains an IPv6 packet. Insert a
+ * Segment Routing Header (**struct ipv6_sr_hdr**) inside
+ * the IPv6 header.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const void 
*from, u32 len)
+ * Description
+ * Store *len* bytes from address *from* into the packet
+ * associated to *skb*, at *offset*. Only the flags, tag and TLVs
+ * inside the outermost IPv6 Segment Routing Header can be
+ * modified through this helper.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
+ * Description
+ * Adjust the size allocated to TLVs in the outermost IPv6
+ * Segment Routing Header contained in the packet associated to
+ * *skb*, at position *offset* by *delta* bytes. Only offsets
+ * after the segments are accepted. *delta* can be as well
+ * positive (growing) as negative (shrinking).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ *

[PATCH bpf-next v5 3/6] bpf: Add IPv6 Segment Routing helpers

2018-05-12 Thread Mathieu Xhonneux

The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
- bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
- bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
   (to add/delete TLVs)
- bpf_lwt_seg6_action: Apply some SRv6 network programming actions
   (specifically End.X, End.T, End.B6 and
End.B6.Encap)

The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).

The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.

Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.

Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.

Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.

These helpers require CONFIG_IPV6=y (and not =m).

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/net/seg6_local.h |   8 ++
 include/uapi/linux/bpf.h |  97 +++-
 net/core/filter.c| 282 +++
 net/ipv6/Kconfig |   5 +
 net/ipv6/seg6_local.c|   2 +
 5 files changed, 369 insertions(+), 25 deletions(-)

diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
index 57498b23085d..661fd5b4d3e0 100644
--- a/include/net/seg6_local.h
+++ b/include/net/seg6_local.h
@@ -15,10 +15,18 @@
 #ifndef _NET_SEG6_LOCAL_H
 #define _NET_SEG6_LOCAL_H
 
+#include 
 #include 
 #include 
 
 extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
   u32 tbl_id);
 
+struct seg6_bpf_srh_state {
+   bool valid;
+   u16 hdrlen;
+};
+
+DECLARE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
+
 #endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 02e4112510f8..0349c91329fd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1828,7 +1828,6 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
- *
  * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 
flags)
  * Description
  * Do FIB lookup in kernel tables using parameters in *params*.
@@ -1855,6 +1854,90 @@ union bpf_attr {
  * Egress device index on success, 0 if packet needs to continue
  * up the stack for further processing or a negative error in case
  * of failure.
+ *
+ * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+ * Description
+ * Encapsulate the packet associated to *skb* within a Layer 3
+ * protocol header. This header is provided in the buffer at
+ * address *hdr*, with *len* its size in bytes. *type* indicates
+ * the protocol of the header and can be one of:
+ *
+ * **BPF_LWT_ENCAP_SEG6**
+ * IPv6 encapsulation with Segment Routing Header
+ * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
+ *

[PATCH bpf-next v5 0/6] ipv6: sr: introduce seg6local End.BPF action

2018-05-12 Thread Mathieu Xhonneux

As of Linux 4.14, it is possible to define advanced local processing for
IPv6 packets with a Segment Routing Header through the seg6local LWT
infrastructure. This LWT implements the network programming principles
defined in the IETF “SRv6 Network Programming” draft.

The implemented operations are generic, and it would be very interesting to
be able to implement user-specific seg6local actions, without having to
modify the kernel directly. To do so, this patchset adds an End.BPF action
to seg6local, powered by some specific Segment Routing-related helpers,
which provide SR functionalities that can be applied on the packet. This
BPF hook would then allow to implement specific actions at native kernel
speed such as OAM features, advanced SR SDN policies, SRv6 actions like
Segment Routing Header (SRH) encapsulation depending on the content of
the packet, etc.

This patchset is divided in 6 patches, whose main features are :

- A new seg6local action End.BPF with the corresponding new BPF program
  type BPF_PROG_TYPE_LWT_SEG6LOCAL. Such attached BPF program can be
  passed to the LWT seg6local through netlink, the same way as the LWT
  BPF hook operates.
- 3 new BPF helpers for the seg6local BPF hook, allowing to edit/grow/
  shrink a SRH and apply on a packet some of the generic SRv6 actions.
- 1 new BPF helper for the LWT BPF IN hook, allowing to add a SRH through
  encapsulation (via IPv6 encapsulation or inlining if the packet contains
  already an IPv6 header).

As this patchset adds a new LWT BPF hook, I took into account the result of
the discussions when the LWT BPF infrastructure got merged. Hence, the
seg6local BPF hook doesn’t allow write access to skb->data directly, only
the SRH can be modified through specific helpers, which ensures that the
integrity of the packet is maintained.
More details are available in the related patches messages.

The performances of this BPF hook have been assessed with the BPF JIT
enabled on a Intel Xeon X3440 processors with 4 cores and 8 threads
clocked at 2.53 GHz. No throughput losses are noted with the seg6local
BPF hook when the BPF program does nothing (440kpps). Adding a 8-bytes
TLV (1 call each to bpf_lwt_seg6_adjust_srh and bpf_lwt_seg6_store_bytes)
drops the throughput to 410kpps, and inlining a SRH via
bpf_lwt_seg6_action drops the throughput to 420kpps.
All throughputs are stable.

---
v2: move the SRH integrity state from skb->cb to a per-cpu buffer
v3: - document helpers in man-page style
- fix kbuild bugs
- un-break BPF LWT out hook
- bpf_push_seg6_encap is now static
- preempt_enable is now called when the packet is dropped in
  input_action_end_bpf
v4: fix kbuild bugs when CONFIG_IPV6=m
v5: fix kbuild sparse warnings when CONFIG_IPV6=m

Thanks.


Mathieu Xhonneux (6):
  ipv6: sr: make seg6.h includable without IPv6
  ipv6: sr: export function lookup_nexthop
  bpf: Add IPv6 Segment Routing helpers
  bpf: Split lwt inout verifier structures
  ipv6: sr: Add seg6local action End.BPF
  selftests/bpf: test for seg6local End.BPF action

 include/linux/bpf_types.h |   5 +-
 include/net/seg6.h|   7 +-
 include/net/seg6_local.h  |  32 ++
 include/uapi/linux/bpf.h  |  98 -
 include/uapi/linux/seg6_local.h   |   3 +
 kernel/bpf/verifier.c |   1 +
 net/core/filter.c | 390 ---
 net/ipv6/Kconfig  |   5 +
 net/ipv6/seg6_local.c | 180 -
 tools/include/uapi/linux/bpf.h|  98 -
 tools/lib/bpf/libbpf.c|   1 +
 tools/testing/selftests/bpf/Makefile  |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  12 +
 tools/testing/selftests/bpf/test_lwt_seg6local.c  | 438 ++
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++
 15 files changed, 1340 insertions(+), 75 deletions(-)
 create mode 100644 include/net/seg6_local.h
 create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
 create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh

-- 
2.16.1

[PATCH bpf-next v5 1/6] ipv6: sr: make seg6.h includable without IPv6

2018-05-12 Thread Mathieu Xhonneux

include/net/seg6.h cannot be included in a source file if CONFIG_IPV6 is
not enabled:
   include/net/seg6.h: In function 'seg6_pernet':
>> include/net/seg6.h:52:14: error: 'struct net' has no member named
'ipv6'; did you mean 'ipv4'?
 return net->ipv6.seg6_data;
 ^~~~
 ipv4

This commit makes seg6_pernet return NULL if IPv6 is not compiled, hence
allowing seg6.h to be included regardless of the configuration.

Signed-off-by: Mathieu Xhonneux 
---
 include/net/seg6.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 099bad59dc90..70b4cfac52d7 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -49,7 +49,11 @@ struct seg6_pernet_data {
 
 static inline struct seg6_pernet_data *seg6_pernet(struct net *net)
 {
+#if IS_ENABLED(CONFIG_IPV6)
return net->ipv6.seg6_data;
+#else
+   return NULL;
+#endif
 }
 
 extern int seg6_init(void);
-- 
2.16.1

Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-12 Thread Jamal Hadi Salim


Sorry for the latency..

On 09/05/18 01:37 PM, Michel Machado wrote:

On 05/09/2018 10:43 AM, Jamal Hadi Salim wrote:

On 08/05/18 10:27 PM, Cong Wang wrote:
On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim  
wrote:




I like the suggestion of extending skbmod to mark skbprio based on ds. 
Given that DSprio would no longer depend on the DS field, would you have 
a name suggestion for this new queue discipline since the name "prio" is 
currently in use?




Not sure what to call it.
My struggle is still with the intended end goal of the qdisc.
It looks like prio qdisc except for the enqueue part which attempts
to use a shared global queue size for all prios. I would have
pointed to other approaches which use global priority queue pool
which do early congestion detection like RED or variants like GRED but
those use average values of the queue lengths not instantenous values 
such as you do.

I am tempted to say - based on my current understanding - that you dont
need a new qdisc; rather you need to map your dsfields to skbprio
(via skbmod) and stick with prio qdisc. I also think the skbmod
mapping is useful regardless of this need.

What should be the range of priorities that this new queue discipline 
would accept? skb->prioriry is of type __u32, but supporting 2^32 
priorities would require too large of an array to index packets by 
priority; the DS field is only 6 bits long. Do you have a use case in 
mind to guide us here?




Look at the priomap or prio2band arrangement on prio qdisc
or pfifo_fast qdisc. You take an skbprio as an index into the array
and retrieve a queue to enqueue to. The size of the array is 16.
In the past this was based IIRC on ip precedence + 1 bit. Those map
similarly to DS fields (calls selectors, assured forwarding etc). So
no need to even increase the array beyond current 16.


I find the cleverness in changing the highest/low prios confusing.
It looks error-prone (I guess that is why there is a BUG check)
To the authors: Is there a document/paper on the theory of this thing
as to why no explicit queues are "faster"?


The priority orientation in GKprio is due to two factors: failing safe 
and elegance. If zero were the highest priority, any operational mistake 
that leads not-classified packets through GKprio would potentially 
disrupt the system. We are humans, we'll make mistakes. The elegance 
aspect comes from the fact that the assigned priority is not massaged to 
fit the DS field. We find it helpful while inspecting packets on the wire.


The reason for us to avoid explicit queues in GKprio, which could change 
the behavior within a given priority, is to closely abide to the 
expected behavior assumed to prove Theorem 4.1 in the paper "Portcullis: 
Protecting Connection Setup from Denial-of-Capability Attacks":


https://dl.acm.org/citation.cfm?id=1282413



Paper seems to be under paywall. Googling didnt help.
My concern is still the science behind this; if you had written up
some test setup which shows how you concluded this was a better
approach at DOS prevention and showed some numbers it would have
helped greatly clarify.


1) I agree that using multiple queues as in prio qdisc would make it
more manageable; does not necessarily need to be classful if you
use implicit skbprio classification. i.e on equeue use a priority
map to select a queue; on dequeue always dequeu from highest prio
until it has no more packets to send.


In my reply to Cong, I point out that there is a technical limitation in 
the interface of queue disciplines that forbids GKprio to have explicit 
sub-queues:


https://www.mail-archive.com/netdev@vger.kernel.org/msg234201.html


2) Dropping already enqueued packets will not work well for
local feedback (__NET_XMIT_BYPASS return code is about the
packet that has been dropped from earlier enqueueing because
it is lower priority - it does not  signify anything with
current skb to which actually just got enqueud).
Perhaps (off top of my head) is to always enqueue packets on
high priority when their limit is exceeded as long as lower prio has
some space. Means youd have to increment low prio accounting if their
space is used.


I don't understand the point you are making here. Could you develop it 
further?




Sorry - I was meaning NET_XMIT_CN
If you drop an already enqueued packet - it makes sense to signify as
such using NET_XMIT_CN
this does not make sense for forwarded packets but it does
for locally sourced packets.

cheers,
jamal

Re: [PATCH net-next 3/8] sctp: move the flush of ctrl chunks into its own function

2018-05-12 Thread Marcelo Ricardo Leitner

On Fri, May 11, 2018 at 08:28:45PM -0300, Marcelo Ricardo Leitner wrote:
> Named sctp_outq_flush_ctrl and, with that, keep the contexts contained.

kbuild bot spotted some issues with this patch. They were corrected
later on on the series, but I should fix them here.

Will post a v2 later today.

[PATCH] ipvlan: flush arp table when mac address changed

2018-05-12 Thread liuqifa

From: Keefe Liu 

When master device's mac has been changed, the
commit <32c10bbfe914> "ipvlan: always use the current L2
addr of the master" makes the IPVlan devices's mac changed
also, but it doesn't flush the IPVlan's arp table.

Signed-off-by: Keefe Liu 
---
 drivers/net/ipvlan/ipvlan_main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 450eec2..a1edfe1 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -7,6 +7,8 @@
  *
  */
 
+#include 
+#include 
 #include "ipvlan.h"
 
 static unsigned int ipvlan_netid __read_mostly;
@@ -792,8 +794,10 @@ static int ipvlan_device_event(struct notifier_block 
*unused,
break;
 
case NETDEV_CHANGEADDR:
-   list_for_each_entry(ipvlan, >ipvlans, pnode)
+   list_for_each_entry(ipvlan, >ipvlans, pnode) {
ether_addr_copy(ipvlan->dev->dev_addr, dev->dev_addr);
+   neigh_changeaddr(_tbl, ipvlan->dev);
+   }
break;
 
case NETDEV_PRE_TYPE_CHANGE:
-- 
1.8.3.1

[PATCH] 3c59x: convert to generic DMA API

2018-05-12 Thread Christoph Hellwig

This driver supports EISA devices in addition to PCI devices, and relied
on the legacy behavior of the pci_dma* shims to pass on a NULL pointer
to the DMA API, and the DMA API being able to handle that.  When the
NULL forwarding broke the EISA support got broken.  Fix this by converting
to the DMA API instead of the legacy PCI shims.

Fixes: 4167b2ad ("PCI: Remove NULL device handling from PCI DMA API")
Reported-by: tedheadster 
Tested-by: tedheadster 
Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/3com/3c59x.c | 104 +++---
 1 file changed, 51 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 36c8950dbd2d..176861bd2252 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -1212,9 +1212,9 @@ static int vortex_probe1(struct device *gendev, void 
__iomem *ioaddr, int irq,
vp->mii.reg_num_mask = 0x1f;
 
/* Makes sure rings are at least 16 byte aligned. */
-   vp->rx_ring = pci_alloc_consistent(pdev, sizeof(struct boom_rx_desc) * 
RX_RING_SIZE
+   vp->rx_ring = dma_alloc_coherent(gendev, sizeof(struct boom_rx_desc) * 
RX_RING_SIZE
   + sizeof(struct boom_tx_desc) * 
TX_RING_SIZE,
-  >rx_ring_dma);
+  >rx_ring_dma, GFP_KERNEL);
retval = -ENOMEM;
if (!vp->rx_ring)
goto free_device;
@@ -1476,11 +1476,10 @@ static int vortex_probe1(struct device *gendev, void 
__iomem *ioaddr, int irq,
return 0;
 
 free_ring:
-   pci_free_consistent(pdev,
-   sizeof(struct boom_rx_desc) * 
RX_RING_SIZE
-   + sizeof(struct 
boom_tx_desc) * TX_RING_SIZE,
-   vp->rx_ring,
-   vp->rx_ring_dma);
+   dma_free_coherent(>dev,
+   sizeof(struct boom_rx_desc) * RX_RING_SIZE +
+   sizeof(struct boom_tx_desc) * TX_RING_SIZE,
+   vp->rx_ring, vp->rx_ring_dma);
 free_device:
free_netdev(dev);
pr_err(PFX "vortex_probe1 fails.  Returns %d\n", retval);
@@ -1751,9 +1750,9 @@ vortex_open(struct net_device *dev)
break;  /* Bad news!  */
 
skb_reserve(skb, NET_IP_ALIGN); /* Align IP on 16 byte 
boundaries */
-   dma = pci_map_single(VORTEX_PCI(vp), skb->data,
-PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
-   if (dma_mapping_error(_PCI(vp)->dev, dma))
+   dma = dma_map_single(vp->gendev, skb->data,
+PKT_BUF_SZ, DMA_FROM_DEVICE);
+   if (dma_mapping_error(vp->gendev, dma))
break;
vp->rx_ring[i].addr = cpu_to_le32(dma);
}
@@ -2067,9 +2066,9 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
if (vp->bus_master) {
/* Set the bus-master controller to transfer the packet. */
int len = (skb->len + 3) & ~3;
-   vp->tx_skb_dma = pci_map_single(VORTEX_PCI(vp), skb->data, len,
-   PCI_DMA_TODEVICE);
-   if (dma_mapping_error(_PCI(vp)->dev, vp->tx_skb_dma)) {
+   vp->tx_skb_dma = dma_map_single(vp->gendev, skb->data, len,
+   DMA_TO_DEVICE);
+   if (dma_mapping_error(vp->gendev, vp->tx_skb_dma)) {
dev_kfree_skb_any(skb);
dev->stats.tx_dropped++;
return NETDEV_TX_OK;
@@ -2168,9 +2167,9 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
vp->tx_ring[entry].status = cpu_to_le32(skb->len | 
TxIntrUploaded | AddTCPChksum | AddUDPChksum);
 
if (!skb_shinfo(skb)->nr_frags) {
-   dma_addr = pci_map_single(VORTEX_PCI(vp), skb->data, skb->len,
- PCI_DMA_TODEVICE);
-   if (dma_mapping_error(_PCI(vp)->dev, dma_addr))
+   dma_addr = dma_map_single(vp->gendev, skb->data, skb->len,
+ DMA_TO_DEVICE);
+   if (dma_mapping_error(vp->gendev, dma_addr))
goto out_dma_err;
 
vp->tx_ring[entry].frag[0].addr = cpu_to_le32(dma_addr);
@@ -2178,9 +2177,9 @@ boomerang_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
} else {
int i;
 
-   dma_addr = pci_map_single(VORTEX_PCI(vp), skb->data,
- skb_headlen(skb),

Re: [PATCH 14/32] net/tcp: convert to ->poll_mask

2018-05-12 Thread Christoph Hellwig

On Fri, May 11, 2018 at 06:13:11AM -0700, Eric Dumazet wrote:
> > +struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t 
> > events)
> > +{
> > +   sock_poll_busy_loop(sock, events);
> > +   sock_rps_record_flow(sock->sk);
> 
> Why are you adding sock_rps_record_flow() ?

Because I mismerged the removal of it from tcp_poll in
'net: revert "Update RFS target at poll for tcp/udp"'

Thanks for the headsup, this will be removed for the next version.

[PATCH net] xfrm6: avoid potential infinite loop in _decode_session6()

2018-05-12 Thread Eric Dumazet

syzbot found a way to trigger an infinitie loop by overflowing
@offset variable that has been forced to use u16 for some very
obscure reason in the past.

We probably want to look at NEXTHDR_FRAGMENT handling which looks
wrong, in a separate patch.

In net-next, we shall try to use skb_header_pointer() instead of
pskb_may_pull().

watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553]
Modules linked in:
irq event stamp: 13885653
hardirqs last  enabled at (13885652): [] 
restore_regs_and_return_to_kernel+0x0/0x2b
hardirqs last disabled at (13885653): [] 
interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625
softirqs last  enabled at (13614028): [] tun_napi_alloc_frags 
drivers/net/tun.c:1478 [inline]
softirqs last  enabled at (13614028): [] 
tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825
softirqs last disabled at (13614032): [] 
tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942
CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101
RSP: 0018:8801d8cfe250 EFLAGS: 0246 ORIG_RAX: ff13
RAX: 8801d88a8080 RBX: 8801d7389e40 RCX: 0006
RDX:  RSI: 868da4ad RDI: 8801c8a53277
RBP: 8801d8cfe250 R08: 8801d88a8080 R09: 8801d8cfe3e8
R10: ed003b19fc87 R11: 8801d8cfe43f R12: 8801c8a5327f
R13:  R14: 8801c8a4e5fe R15: 8801d8cfe3e8
FS:  00d88940() GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff600400 CR3: 0001acab3000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 _decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368
 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
 icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372
 icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551
 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
 ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208
 __netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711
 netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785
 napi_frags_finish net/core/dev.c:5226 [inline]
 napi_gro_frags+0x631/0xc40 net/core/dev.c:5299
 tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996
 call_write_iter include/linux/fs.h:1784 [inline]
 do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680
 do_iter_write+0x185/0x5f0 fs/read_write.c:959
 vfs_writev+0x1c7/0x330 fs/read_write.c:1004
 do_writev+0x112/0x2f0 fs/read_write.c:1039
 __do_sys_writev fs/read_write.c:1112 [inline]
 __se_sys_writev fs/read_write.c:1109 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Cc: Steffen Klassert 
Cc: Nicolas Dichtel 
Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com
---
 net/ipv6/xfrm6_policy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 
416fe67271a920f5a86dd3007c03e3113f857f8a..86dba282a147ce6ad4b3e4e2f3b5c81962493130
 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -126,7 +126,7 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
struct flowi6 *fl6 = >u.ip6;
int onlyproto = 0;
const struct ipv6hdr *hdr = ipv6_hdr(skb);
-   u16 offset = sizeof(*hdr);
+   u32 offset = sizeof(*hdr);
struct ipv6_opt_hdr *exthdr;
const unsigned char *nh = skb_network_header(skb);
u16 nhoff = IP6CB(skb)->nhoff;
-- 
2.17.0.441.gb46fe60e1d-goog

Re: rsi: fix spelling mistake: "thead" -> "thread"

2018-05-12 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in rsi_dbg debug message text
> 
> Signed-off-by: Colin Ian King 

Patch applied to wireless-drivers-next.git, thanks.

b41c39367669 rsi: fix spelling mistake: "thead" -> "thread"

-- 
https://patchwork.kernel.org/patch/10391879/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: KASAN: use-after-free Write in xt_rateest_put

2018-05-12 Thread Dmitry Vyukov

On Mon, Jan 29, 2018 at 3:58 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 24b1cccf922914f3d6eeb84036dde8338bc03abb (Sun Jan 28 20:24:36 2018 +)
> Merge branch 'x86-pti-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+551ff4604e8325884...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.

This was bisected as fixed by:

#syz fix: netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insert

https://gist.githubusercontent.com/dvyukov/9d5b710cf4f429969b93aa90ec217c29/raw/68c1fee7f7e133574a0787c9e46d97a6cf521759/gistfile1.txt

> ==
> BUG: KASAN: use-after-free in __hlist_del include/linux/list.h:651 [inline]
> BUG: KASAN: use-after-free in hlist_del include/linux/list.h:656 [inline]
> BUG: KASAN: use-after-free in xt_rateest_put+0x2e3/0x300
> net/netfilter/xt_RATEEST.c:65
> Write of size 8 at addr 8801d4d40e58 by task syzkaller770396/3682
>
> CPU: 1 PID: 3682 Comm: syzkaller770396 Not tainted 4.15.0-rc9+ #284
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_store8_noabort+0x17/0x20 mm/kasan/report.c:435
>  __hlist_del include/linux/list.h:651 [inline]
>  hlist_del include/linux/list.h:656 [inline]
>  xt_rateest_put+0x2e3/0x300 net/netfilter/xt_RATEEST.c:65
>  xt_rateest_tg_destroy+0x50/0x70 net/netfilter/xt_RATEEST.c:154
>  cleanup_entry+0x242/0x380 net/ipv6/netfilter/ip6_tables.c:678
>  __do_replace+0x7e6/0xab0 net/ipv6/netfilter/ip6_tables.c:1115
>  do_replace net/ipv6/netfilter/ip6_tables.c:1171 [inline]
>  do_ip6t_set_ctl+0x40f/0x5f0 net/ipv6/netfilter/ip6_tables.c:1693
>  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>  nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>  ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
>  udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1452
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2968
>  SYSC_setsockopt net/socket.c:1831 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1810
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x4412d9
> RSP: 002b:7fffb8cbf5f8 EFLAGS: 0203 ORIG_RAX: 0036
> RAX: ffda RBX:  RCX: 004412d9
> RDX: 0040 RSI: 0029 RDI: 0326
> RBP: f6fcce9cd855ec40 R08: 03b8 R09: 
> R10: 20019c48 R11: 0203 R12: fbfe5b6031634428
> R13: 5826b2d59f7fe9a1 R14: fd7217c033abf8b5 R15: 
>
> Allocated by task 3687:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3610
>  kmalloc include/linux/slab.h:499 [inline]
>  kzalloc include/linux/slab.h:688 [inline]
>  xt_rateest_tg_checkentry+0x25a/0xaa0 net/netfilter/xt_RATEEST.c:120
>  xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
>  check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
>  find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:580
>  translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
>  do_replace net/ipv6/netfilter/ip6_tables.c:1167 [inline]
>  do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1693
>  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>  nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>  ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
>  udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1452
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2968
>  SYSC_setsockopt net/socket.c:1831 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1810
>  entry_SYSCALL_64_fastpath+0x29/0xa0
>
> Freed by task 3682:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3488 [inline]
>  kfree+0xd6/0x260 mm/slab.c:3803
>  __rcu_reclaim kernel/rcu/rcu.h:190 [inline]
>  rcu_do_batch kernel/rcu/tree.c:2758 [inline]
>  invoke_rcu_callbacks kernel/rcu/tree.c:3012 [inline]
>  __rcu_process_callbacks kernel/rcu/tree.c:2979 [inline]
>  rcu_process_callbacks+0xe94/0x17f0 kernel/rcu/tree.c:2996
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>
> The

Re: [PATCH bpf-next 0/4] samples: bpf: fix build after move to full libbpf

2018-05-12 Thread Jesper Dangaard Brouer

On Fri, 11 May 2018 17:17:25 -0700
Jakub Kicinski  wrote:

> Following patches address build issues after recent move to libbpf.
> For out-of-tree builds we would see the following error:
> 
> gcc: error: samples/bpf/../../tools/lib/bpf/libbpf.a: No such file or 
> directory
> 
> Mini-library called libbpf.h in samples is renamed to bpf_insn.h,
> using linux/filter.h seems not completely trivial since some samples
> get upset when order on include search path in changed.  We do have
> to rename libbpf.h, however, because otherwise it's hard to reliably
> get to libbpf's header in out-of-tree builds.


Acked-by: Jesper Dangaard Brouer 

Thank you for doing this... this mini-library also called libbpf.h have
confused me before, and I bet it will/would confuse others as well.
Glad to see it being renamed :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: possible deadlock in sk_diag_fill

2018-05-12 Thread Dmitry Vyukov

On Fri, May 11, 2018 at 8:33 PM, Andrei Vagin  wrote:
> On Sat, May 05, 2018 at 10:59:02AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:c1c07416cdd4 Merge tag 'kbuild-fixes-v4.17' of git://git.k..
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12164c9780
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=5a1dc06635c10d27
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c1872be62e587eae9669
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> userspace arch: i386
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+c1872be62e587eae9...@syzkaller.appspotmail.com
>>
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.17.0-rc3+ #59 Not tainted
>> --
>> syz-executor1/25282 is trying to acquire lock:
>> 4fddf743 (&(>lock)->rlock/1){+.+.}, at: sk_diag_dump_icons
>> net/unix/diag.c:82 [inline]
>> 4fddf743 (&(>lock)->rlock/1){+.+.}, at:
>> sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
>>
>> but task is already holding lock:
>> b6895645 (rlock-AF_UNIX){+.+.}, at: spin_lock
>> include/linux/spinlock.h:310 [inline]
>> b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_dump_icons
>> net/unix/diag.c:64 [inline]
>> b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_fill.isra.5+0x94e/0x10d0
>> net/unix/diag.c:144
>>
>> which lock already depends on the new lock.
>
> In the code, we have a comment which explains why it is safe to take this lock
>
> /*
>  * The state lock is outer for the same sk's
>  * queue lock. With the other's queue locked it's
>  * OK to lock the state.
>  */
> unix_state_lock_nested(req);
>
> It is a question how to explain this to lockdep.

Do I understand it correctly that (>lock)->rlock associated with
AF_UNIX is locked under rlock-AF_UNIX, and then rlock-AF_UNIX is
locked under (>lock)->rlock associated with AF_NETLINK? If so, I
think we need to split (>lock)->rlock by family too, so that we
have u->lock-AF_UNIX and u->lock-AF_NETLINK.



>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (rlock-AF_UNIX){+.+.}:
>>__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>>_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>>skb_queue_tail+0x26/0x150 net/core/skbuff.c:2900
>>unix_dgram_sendmsg+0xf77/0x1730 net/unix/af_unix.c:1797
>>sock_sendmsg_nosec net/socket.c:629 [inline]
>>sock_sendmsg+0xd5/0x120 net/socket.c:639
>>___sys_sendmsg+0x525/0x940 net/socket.c:2117
>>__sys_sendmmsg+0x3bb/0x6f0 net/socket.c:2205
>>__compat_sys_sendmmsg net/compat.c:770 [inline]
>>__do_compat_sys_sendmmsg net/compat.c:777 [inline]
>>__se_compat_sys_sendmmsg net/compat.c:774 [inline]
>>__ia32_compat_sys_sendmmsg+0x9f/0x100 net/compat.c:774
>>do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
>>do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
>>entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
>>
>> -> #0 (&(>lock)->rlock/1){+.+.}:
>>lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
>>_raw_spin_lock_nested+0x28/0x40 kernel/locking/spinlock.c:354
>>sk_diag_dump_icons net/unix/diag.c:82 [inline]
>>sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
>>sk_diag_dump net/unix/diag.c:178 [inline]
>>unix_diag_dump+0x35f/0x550 net/unix/diag.c:206
>>netlink_dump+0x507/0xd20 net/netlink/af_netlink.c:2226
>>__netlink_dump_start+0x51a/0x780 net/netlink/af_netlink.c:2323
>>netlink_dump_start include/linux/netlink.h:214 [inline]
>>unix_diag_handler_dump+0x3f4/0x7b0 net/unix/diag.c:307
>>__sock_diag_cmd net/core/sock_diag.c:230 [inline]
>>sock_diag_rcv_msg+0x2e0/0x3d0 net/core/sock_diag.c:261
>>netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
>>sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:272
>>netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
>>netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
>>netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
>>sock_sendmsg_nosec net/socket.c:629 [inline]
>>sock_sendmsg+0xd5/0x120 net/socket.c:639
>>sock_write_iter+0x35a/0x5a0 net/socket.c:908
>>call_write_iter include/linux/fs.h:1784 [inline]
>>new_sync_write fs/read_write.c:474 [inline]
>>__vfs_write+0x64d/0x960 fs/read_write.c:487
>>vfs_write+0x1f8/0x560 fs/read_write.c:549
>>ksys_write+0xf9/0x250 fs/read_write.c:598
>>__do_sys_write fs/read_write.c:610 [inline]
>>__se_sys_write

48 matches

Mail list logo