[PATCH v3] ipv6: Not to probe neighbourless routes

2019-08-29 Thread Cheng Lin
From: Cheng Lin 

Originally, Router Reachability Probing require a neighbour entry
existed. Commit 2152caea7196 ("ipv6: Do not depend on rt->n in
rt6_probe().") removed the requirement for a neighbour entry. And
commit f547fac624be ("ipv6: rate-limit probes for neighbourless
routes") adds rate-limiting for neighbourless routes.

And, the Neighbor Discovery for IP version 6 (IPv6)(rfc4861) says,
"
7.2.5.  Receipt of Neighbor Advertisements

When a valid Neighbor Advertisement is received (either solicited or
unsolicited), the Neighbor Cache is searched for the target's entry.
If no entry exists, the advertisement SHOULD be silently discarded.
There is no need to create an entry if none exists, since the
recipient has apparently not initiated any communication with the
target.
".

In rt6_probe(), just a Neighbor Solicitation message are transmited.
When receiving a Neighbor Advertisement, the node does nothing in a
Neighborless condition.

Not sure it's needed to create a neighbor entry in Router
Reachability Probing. And the Original way may be the right way.

This patch recover the requirement for a neighbour entry.

Signed-off-by: Cheng Lin 
---
 include/net/ip6_fib.h | 5 -
 net/ipv6/route.c  | 6 +-
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 4b5656c..8c2e022 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -124,11 +124,6 @@ struct rt6_exception {
 
 struct fib6_nh {
struct fib_nh_commonnh_common;
-
-#ifdef CONFIG_IPV6_ROUTER_PREF
-   unsigned long   last_probe;
-#endif
-
struct rt6_info * __percpu *rt6i_pcpu;
struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
 };
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fd059e0..1839dd7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -639,12 +639,12 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
nh_gw = &fib6_nh->fib_nh_gw6;
dev = fib6_nh->fib_nh_dev;
rcu_read_lock_bh();
-   idev = __in6_dev_get(dev);
neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
if (neigh) {
if (neigh->nud_state & NUD_VALID)
goto out;
 
+   idev = __in6_dev_get(dev);
write_lock(&neigh->lock);
if (!(neigh->nud_state & NUD_VALID) &&
time_after(jiffies,
@@ -654,13 +654,9 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
__neigh_set_probe_once(neigh);
}
write_unlock(&neigh->lock);
-   } else if (time_after(jiffies, fib6_nh->last_probe +
-  idev->cnf.rtr_probe_interval)) {
-   work = kmalloc(sizeof(*work), GFP_ATOMIC);
}
 
if (work) {
-   fib6_nh->last_probe = jiffies;
INIT_WORK(&work->work, rt6_probe_deferred);
work->target = *nh_gw;
dev_hold(dev);
-- 
1.8.3.1



[PATCH] proc/sysctl: fix return error for proc_doulongvec_minmax

2018-12-05 Thread Cheng Lin
If the number of input parameters is less than the total
parameters, an EINVAL error will be returned.

e.g.
We use proc_doulongvec_minmax to pass up to two parameters
with kern_table.

{
.procname   = "monitor_signals",
.data   = &monitor_sigs,
.maxlen = 2*sizeof(unsigned long),
.mode   = 0644,
.proc_handler   = proc_doulongvec_minmax,
},

Reproduce:
When passing two parameters, it's work normal. But passing
only one parameter, an error "Invalid argument"(EINVAL) is
returned.

[root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
1   2
[root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
-bash: echo: write error: Invalid argument
[root@cl150 ~]# echo $?
1
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
3   2
[root@cl150 ~]#

The following is the result after apply this patch. No error
is returned when the number of input parameters is less than
the total parameters.

[root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
1   2
[root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
[root@cl150 ~]# echo $?
0
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
3   2
[root@cl150 ~]#

There are three processing functions dealing with digital parameters,
__do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.

This patch deals with __do_proc_doulongvec_minmax, just as
__do_proc_dointvec does, adding a check for parameters 'left'. In
__do_proc_douintvec, its code implementation explicitly does not
support multiple inputs.

static int __do_proc_douintvec(...){
 ...
 /*
  * Arrays are not supported, keep this simple. *Do not* add
  * support for them.
  */
 if (vleft != 1) {
 *lenp = 0;
 return -EINVAL;
 }
 ...
}

So, just __do_proc_doulongvec_minmax has the problem. And most use of
proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
parameter.

Signed-off-by: Cheng Lin 
---
 kernel/sysctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5fc724e..9ee261f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2779,6 +2779,8 @@ static int __do_proc_doulongvec_minmax(void *data, struct 
ctl_table *table, int
bool neg;
 
left -= proc_skip_spaces(&p);
+   if (!left)
+   break;
 
err = proc_get_long(&p, &left, &val, &neg,
 proc_wspace_sep,
-- 
1.8.3.1



[PATCH] proc/sysctl: fix return error for proc_doulongvec_minmax

2018-11-29 Thread Cheng Lin
If the number of input parameters is less than the total
parameters, an INVAL error will be returned.

This patch ensure no error returned in this condition, just
like other interfaces do.

Signed-off-by: Cheng Lin 
---
 kernel/sysctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5fc724e..9ee261f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2779,6 +2779,8 @@ static int __do_proc_doulongvec_minmax(void *data, struct 
ctl_table *table, int
bool neg;
 
left -= proc_skip_spaces(&p);
+   if (!left)
+   break;
 
err = proc_get_long(&p, &left, &val, &neg,
 proc_wspace_sep,
-- 
1.8.3.1



[PATCH v2] sched/numa: do not balance tasks onto isolated cpus

2018-07-26 Thread Cheng Lin
By default, there is one sched domain covering all CPUs, including
those isolated ones using "isolcpus=" boot parameter. However, the
isolated CPUs will not participate in load balancing, and will not
have tasks running on them unless explicitly assigning by CPU
affinity.

But, NUMA balancing has not taken *isolcpus(isolated cpus)* into 
consideration. It may migrate tasks onto isolated cpus and the 
migrated tasks will never escape from the isolated cpus, which will
break the isolation provided by *isolcpus* boot parameter and 
intrduce various problems. The typical scenario is,

When we wanna use the isolated CPUs in a cgroup, cpuset must include
them(e.g. in container).In that case, task's CPU-affinity in the
cgroup includes the isolated CPU by default; If we pin a task onto an
isolated CPU or a CPU which on the same NUMA node with the isolated
CPU, and if there is another task sharing memory with the pinned task,
it will be migrated to the same NUMA node by NUMA-balancing for better
performance. In this case, the isolated CPU maybe chosen as the target
CPU.

Although Load-balancing never migrate a task onto isolated CPU, 
NUMA-balancing does not consider isolated CPU currently. This patch
ensure NUMA balancing not to balance tasks onto isolated

Signed-off-by: Cheng Lin 
Reviewed-by: Tan Hu 
Reviewed-by: Jiang Biao 
---
v2: 
* rework and retest on latest kernel
* detail the scenario in the commit log
* fix the SoB chain

 kernel/sched/core.c | 9 ++---
 kernel/sched/fair.c | 3 ++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe365c9..170a673 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1302,10 +1302,12 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
goto out;
 
-   if (!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed))
+   if ((!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed))
+   || !housekeeping_test_cpu(arg.dst_cpu, HK_FLAG_DOMAIN))
goto out;
 
-   if (!cpumask_test_cpu(arg.src_cpu, &arg.dst_task->cpus_allowed))
+   if ((!cpumask_test_cpu(arg.src_cpu, &arg.dst_task->cpus_allowed))
+   || !housekeeping_test_cpu(arg.src_cpu, HK_FLAG_DOMAIN))
goto out;
 
trace_sched_swap_numa(cur, arg.src_cpu, p, arg.dst_cpu);
@@ -5508,7 +5510,8 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
if (curr_cpu == target_cpu)
return 0;
 
-   if (!cpumask_test_cpu(target_cpu, &p->cpus_allowed))
+   if ((!cpumask_test_cpu(target_cpu, &p->cpus_allowed))
+   || !housekeeping_test_cpu(target_cpu, HK_FLAG_DOMAIN))
return -EINVAL;
 
/* TODO: This is not properly updating schedstats */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be..1ea2953 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1724,7 +1724,8 @@ static void task_numa_find_cpu(struct task_numa_env *env,
 
for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) {
/* Skip this CPU if the source task cannot migrate */
-   if (!cpumask_test_cpu(cpu, &env->p->cpus_allowed))
+   if ((!cpumask_test_cpu(cpu, &env->p->cpus_allowed))
+   || !housekeeping_test_cpu(cpu, HK_FLAG_DOMAIN))
continue;
 
env->dst_cpu = cpu;
-- 
1.8.3.1