from:"Xi Wang"

Re: [PATCH v3 1/1] sched: watchdog: Touch kernel watchdog with sched count

2020-10-27 Thread Xi Wang

On Mon, Oct 26, 2020 at 1:32 AM Vincent Guittot
 wrote:
>
> On Sun, 25 Oct 2020 at 00:57, Xi Wang  wrote:
> >
> > Instead of periodically resetting watchdogs from thread context,
> > this patch simply forces resched and checks rq->sched_count.
> > Watchdog is reset if the sched count increases. If the same thread
> > is picked by pick_next_task during resched, there is no context
> > switch.
> >
> > With the new method we lose coverage on: a migration/n thread
> > actually gets picked and we actually context switch to the
> > migration/n thread. These steps are unlikely to silently fail.
> > The change would provide nearly the same level of protection with
> > less latency / jitter.
>
> When a patch provides an improvement, it's usually good to give
> figures that show the improvement

This change would reduce jitters for a continuously running thread.
The difference is likely too small to tell for sched latency
benchmarks.

will-it-scale mmap1 reported 15.8% improvement for the v1 patch:
https://lkml.org/lkml/2020/3/10/129

The performance change is actually unexpected. If it's not noise there
might be contentions related to watchdog thread wake up or context
switch.

-Xi


>
> >
> > Suggested-by: Paul Turner 
> > Suggested-by: Peter Zijlstra 
> > Signed-off-by: Xi Wang 
> > ---
> >  include/linux/sched.h |  4 
> >  kernel/sched/core.c   | 23 +++--
> >  kernel/sched/sched.h  |  6 +-
> >  kernel/watchdog.c | 47 +--
> >  4 files changed, 44 insertions(+), 36 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index d383cf09e78f..1e3bef9a9cdb 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1662,6 +1662,10 @@ extern int sched_setattr(struct task_struct *, const 
> > struct sched_attr *);
> >  extern int sched_setattr_nocheck(struct task_struct *, const struct 
> > sched_attr *);
> >  extern struct task_struct *idle_task(int cpu);
> >
> > +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> > +extern unsigned int sched_get_count(int cpu);
> > +#endif
> > +
> >  /**
> >   * is_idle_task - is the specified task an idle task?
> >   * @p: the task in question.
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 8160ab5263f8..378f0f36c402 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4293,8 +4293,6 @@ static inline void schedule_debug(struct task_struct 
> > *prev, bool preempt)
> > rcu_sleep_check();
> >
> > profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> > -
> > -   schedstat_inc(this_rq()->sched_count);
> >  }
> >
> >  static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
> > @@ -4492,6 +4490,12 @@ static void __sched notrace __schedule(bool preempt)
> > clear_tsk_need_resched(prev);
> > clear_preempt_need_resched();
> >
> > +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> > +   this_rq()->sched_count++; /* sched count is also used by watchdog */
> > +#else
> > +   schedstat_inc(this_rq()->sched_count);
> > +#endif
> > +
> > if (likely(prev != next)) {
> > rq->nr_switches++;
> > /*
> > @@ -5117,6 +5121,21 @@ struct task_struct *idle_task(int cpu)
> > return cpu_rq(cpu)->idle;
> >  }
> >
> > +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> > +
> > +/**
> > + * sched_get_count - get the sched count of a CPU.
> > + * @cpu: the CPU in question.
> > + *
> > + * Return: sched count.
> > + */
> > +unsigned int sched_get_count(int cpu)
> > +{
> > +   return cpu_rq(cpu)->sched_count;
> > +}
> > +
> > +#endif
> > +
> >  /**
> >   * find_process_by_pid - find a process with a matching PID value.
> >   * @pid: the pid in question.
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 28709f6b0975..f23255981d52 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -959,6 +959,11 @@ struct rq {
> > u64 clock_pelt;
> > unsigned long   lost_idle_time;
> >
> > +#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_SOFTLOCKUP_DETECTOR)
> > +   /* Also used by watchdog - no longer grouping with other sched 
> > stats */
> > +   unsigned intsched_count;
> > +#endif
> > +
> > atomic_tnr_iowait;
>

[PATCH v3 1/1] sched: watchdog: Touch kernel watchdog with sched count

2020-10-24 Thread Xi Wang

Instead of periodically resetting watchdogs from thread context,
this patch simply forces resched and checks rq->sched_count.
Watchdog is reset if the sched count increases. If the same thread
is picked by pick_next_task during resched, there is no context
switch.

With the new method we lose coverage on: a migration/n thread
actually gets picked and we actually context switch to the
migration/n thread. These steps are unlikely to silently fail.
The change would provide nearly the same level of protection with
less latency / jitter.

Suggested-by: Paul Turner 
Suggested-by: Peter Zijlstra 
Signed-off-by: Xi Wang 
---
 include/linux/sched.h |  4 
 kernel/sched/core.c   | 23 +++--
 kernel/sched/sched.h  |  6 +-
 kernel/watchdog.c | 47 +--
 4 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d383cf09e78f..1e3bef9a9cdb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1662,6 +1662,10 @@ extern int sched_setattr(struct task_struct *, const 
struct sched_attr *);
 extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr 
*);
 extern struct task_struct *idle_task(int cpu);
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+extern unsigned int sched_get_count(int cpu);
+#endif
+
 /**
  * is_idle_task - is the specified task an idle task?
  * @p: the task in question.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8160ab5263f8..378f0f36c402 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4293,8 +4293,6 @@ static inline void schedule_debug(struct task_struct 
*prev, bool preempt)
rcu_sleep_check();
 
profile_hit(SCHED_PROFILING, __builtin_return_address(0));
-
-   schedstat_inc(this_rq()->sched_count);
 }
 
 static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
@@ -4492,6 +4490,12 @@ static void __sched notrace __schedule(bool preempt)
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+   this_rq()->sched_count++; /* sched count is also used by watchdog */
+#else
+   schedstat_inc(this_rq()->sched_count);
+#endif
+
if (likely(prev != next)) {
rq->nr_switches++;
/*
@@ -5117,6 +5121,21 @@ struct task_struct *idle_task(int cpu)
return cpu_rq(cpu)->idle;
 }
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+
+/**
+ * sched_get_count - get the sched count of a CPU.
+ * @cpu: the CPU in question.
+ *
+ * Return: sched count.
+ */
+unsigned int sched_get_count(int cpu)
+{
+   return cpu_rq(cpu)->sched_count;
+}
+
+#endif
+
 /**
  * find_process_by_pid - find a process with a matching PID value.
  * @pid: the pid in question.
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 28709f6b0975..f23255981d52 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -959,6 +959,11 @@ struct rq {
u64 clock_pelt;
unsigned long   lost_idle_time;
 
+#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_SOFTLOCKUP_DETECTOR)
+   /* Also used by watchdog - no longer grouping with other sched stats */
+   unsigned intsched_count;
+#endif
+
atomic_tnr_iowait;
 
 #ifdef CONFIG_MEMBARRIER
@@ -1036,7 +1041,6 @@ struct rq {
unsigned intyld_count;
 
/* schedule() stats */
-   unsigned intsched_count;
unsigned intsched_goidle;
 
/* try_to_wake_up() stats */
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5abb5b22ad13..22f87aded95a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -170,6 +170,7 @@ static bool softlockup_initialized __read_mostly;
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
+static DEFINE_PER_CPU(unsigned int, watchdog_sched_prev);
 static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
 static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
@@ -239,6 +240,7 @@ static void set_sample_period(void)
 static void __touch_watchdog(void)
 {
__this_cpu_write(watchdog_touch_ts, get_timestamp());
+   __this_cpu_write(watchdog_sched_prev, 
sched_get_count(smp_processor_id()));
 }
 
 /**
@@ -318,25 +320,6 @@ static void watchdog_interrupt_count(void)
__this_cpu_inc(hrtimer_interrupts);
 }
 
-static DEFINE_PER_CPU(struct completion, softlockup_completion);
-static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work);
-
-/*
- * The watchdog thread function - touches the timestamp.
- *
- * It only runs once every sample_period seconds (4 seconds by
- * default) to reset the softlockup timestamp. If this gets delayed
- * for more than 2*watchdog_thresh seconds then the debug-printout
- * triggers in watchdog_timer_fn().
- */
-static int softlockup_fn(void *data)
-{
-

gFrom 9667d5ddbb1dc230653e5f8cedb778e9c562d46c Mon Sep 17 00:00:00 2001

2020-10-24 Thread Xi Wang

Instead of periodically resetting watchdogs from thread context,
this patch simply forces resched and checks rq->sched_count.
Watchdog is reset if the sched count increases. If the same thread
is picked by pick_next_task during resched, there is no context
switch.

With the new method we lose coverage on: a migration/n thread
actually gets picked and we actually context switch to the
migration/n thread. These steps are unlikely to silently fail.
The change would provide nearly the same level of protection with
less latency / jitter.

v3:
 - Removed the old method and boot option
 - Still need to check resched

v2:
 - Use sched_count instead of having sched calling into watchdog code
 - Remove the sysctl and add a boot option, which can be removed later
 - Changed the subject line

Xi Wang (1):
  sched: watchdog: Touch kernel watchdog with sched count

 include/linux/sched.h |  4 
 kernel/sched/core.c   | 23 +++--
 kernel/sched/sched.h  |  6 +-
 kernel/watchdog.c | 47 +--
 4 files changed, 44 insertions(+), 36 deletions(-)

-- 
2.29.0.rc2.309.g374f81d7ae-goog

Re: [PATCH v2 1/1] sched: watchdog: Touch kernel watchdog with sched count

2020-10-21 Thread Xi Wang

On Wed, Oct 21, 2020 at 3:13 AM Peter Zijlstra  wrote:
>
> On Tue, Oct 20, 2020 at 01:57:04PM -0700, Xi Wang wrote:
>
> > + if (watchdog_touch_with_sched) {
> > + /* Trigger reschedule for the next round */
> > + set_tsk_need_resched(current);
> > + set_preempt_need_resched();
>
> Blergh.. that's gross. This relies on this being in IRQ context and
> either: PREEMPT=y *OR* this always being from userspace. Otherwise
> there's no guarantee the return-from-interrupt will actually schedule.
>

Maybe I missed something but I think immediate rescheduling is not
required? E.g. software watchdog should fire if there is a kernel busy
loop and kernel preemption is not enabled. The current method ends up
with a thread wakeup so there is no guaranteed reschedule either?

-Xi

[PATCH v2 0/1] Touch kernel watchdog with sched count

2020-10-20 Thread Xi Wang

The main purpose of kernel watchdog is to test whether scheduler can
still schedule tasks on a cpu. In order to reduce latency / jitter
from periodically invoking watchdog reset in thread context, we can
simply test if pick_next_task can run. This is done by forcing resched
and checking rq->sched_count. Compared to actually resetting watchdog
from cpu stop / migration threads, we lose coverage on: a migration
thread actually get picked and we actually context switch to the
migration thread. These steps are unlikely to silently fail. The
change would provide nearly the same level of protection with less
overhead.

With this patch we can still switch back to the old method with the
boot option watchdog_touch_with_thread. However code for the old
method can be completely removed in the future.


v2:
 - Use sched_count instead of having sched calling into watchdog code
 - Remove the sysctl and add a boot option, which can be removed later
 - Changed the subject line
  


Xi Wang (1):
  sched: watchdog: Touch kernel watchdog with sched count

 include/linux/sched.h |  4 
 kernel/sched/core.c   | 23 --
 kernel/sched/sched.h  |  6 +-
 kernel/watchdog.c | 44 +--
 4 files changed, 68 insertions(+), 9 deletions(-)

-- 
2.29.0.rc1.297.gfa9743e501-goog

[PATCH v2 1/1] sched: watchdog: Touch kernel watchdog with sched count

2020-10-20 Thread Xi Wang

The main purpose of kernel watchdog is to test whether scheduler can
still schedule tasks on a cpu. In order to reduce latency / jitter
from periodically invoking watchdog reset in thread context, we can
simply test if pick_next_task can run. This is done by forcing resched
and checking rq->sched_count. Compared to actually resetting watchdog
from cpu stop / migration threads, we lose coverage on: a migration
thread actually get picked and we actually context switch to the
migration thread. These steps are unlikely to silently fail. The
change would provide nearly the same level of protection with less
overhead.

With this patch we can still switch back to the old method with the
boot option watchdog_touch_with_thread. However code for the old
method can be completely removed in the future.

Suggested-by: Paul Turner 
Suggested-by: Peter Zijlstra 
Signed-off-by: Xi Wang 
---
 include/linux/sched.h |  4 
 kernel/sched/core.c   | 23 --
 kernel/sched/sched.h  |  6 +-
 kernel/watchdog.c | 44 +--
 4 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d383cf09e78f..1e3bef9a9cdb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1662,6 +1662,10 @@ extern int sched_setattr(struct task_struct *, const 
struct sched_attr *);
 extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr 
*);
 extern struct task_struct *idle_task(int cpu);
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+extern unsigned int sched_get_count(int cpu);
+#endif
+
 /**
  * is_idle_task - is the specified task an idle task?
  * @p: the task in question.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8160ab5263f8..378f0f36c402 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4293,8 +4293,6 @@ static inline void schedule_debug(struct task_struct 
*prev, bool preempt)
rcu_sleep_check();
 
profile_hit(SCHED_PROFILING, __builtin_return_address(0));
-
-   schedstat_inc(this_rq()->sched_count);
 }
 
 static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
@@ -4492,6 +4490,12 @@ static void __sched notrace __schedule(bool preempt)
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+   this_rq()->sched_count++; /* sched count is also used by watchdog */
+#else
+   schedstat_inc(this_rq()->sched_count);
+#endif
+
if (likely(prev != next)) {
rq->nr_switches++;
/*
@@ -5117,6 +5121,21 @@ struct task_struct *idle_task(int cpu)
return cpu_rq(cpu)->idle;
 }
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+
+/**
+ * sched_get_count - get the sched count of a CPU.
+ * @cpu: the CPU in question.
+ *
+ * Return: sched count.
+ */
+unsigned int sched_get_count(int cpu)
+{
+   return cpu_rq(cpu)->sched_count;
+}
+
+#endif
+
 /**
  * find_process_by_pid - find a process with a matching PID value.
  * @pid: the pid in question.
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 28709f6b0975..f23255981d52 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -959,6 +959,11 @@ struct rq {
u64 clock_pelt;
unsigned long   lost_idle_time;
 
+#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_SOFTLOCKUP_DETECTOR)
+   /* Also used by watchdog - no longer grouping with other sched stats */
+   unsigned intsched_count;
+#endif
+
atomic_tnr_iowait;
 
 #ifdef CONFIG_MEMBARRIER
@@ -1036,7 +1041,6 @@ struct rq {
unsigned intyld_count;
 
/* schedule() stats */
-   unsigned intsched_count;
unsigned intsched_goidle;
 
/* try_to_wake_up() stats */
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5abb5b22ad13..df7f7e585502 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -170,6 +170,7 @@ static bool softlockup_initialized __read_mostly;
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
+static DEFINE_PER_CPU(unsigned int, watchdog_sched_prev);
 static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
 static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
@@ -177,6 +178,12 @@ static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 static unsigned long soft_lockup_nmi_warn;
 
+/*
+ * Touch watchdog if __schedule and pick_next_task can run - avoid actual
+ * context switch and associated latency for most cases
+ */
+int __read_mostly watchdog_touch_with_sched = 1;
+
 static int __init nowatchdog_setup(char *str)
 {
watchdog_user_enabled = 0;
@@ -198,6 +205,13 @@ static int __init watchdog_thresh_setup(char *str)
 }
 __setup("watchdog_thresh=", watchdog_thre

Re: [PATCH] sched: watchdog: Touch kernel watchdog in sched code

2020-10-05 Thread Xi Wang

On Mon, Oct 5, 2020 at 4:19 AM Peter Zijlstra  wrote:
>
> On Fri, Mar 06, 2020 at 02:34:20PM -0800, Xi Wang wrote:
> > On Fri, Mar 6, 2020 at 12:40 AM Peter Zijlstra  wrote:
> > >
> > > On Thu, Mar 05, 2020 at 02:11:49PM -0800, Paul Turner wrote:
> > > > The goal is to improve jitter since we're constantly periodically
> > > > preempting other classes to run the watchdog.   Even on a single CPU
> > > > this is measurable as jitter in the us range.  But, what increases the
> > > > motivation is this disruption has been recently magnified by CPU
> > > > "gifts" which require evicting the whole core when one of the siblings
> > > > schedules one of these watchdog threads.
> > > >
> > > > The majority outcome being asserted here is that we could actually
> > > > exercise pick_next_task if required -- there are other potential
> > > > things this will catch, but they are much more braindead generally
> > > > speaking (e.g. a bug in pick_next_task itself).
> > >
> > > I still utterly hate what the patch does though; there is no way I'll
> > > have watchdog code hook in the scheduler like this. That's just asking
> > > for trouble.
> > >
> > > Why isn't it sufficient to sample the existing context switch counters
> > > from the watchdog? And why can't we fix that?
> >
> > We could go to pick next and repick the same task. There won't be a
> > context switch but we still want to hold the watchdog. I assume such a
> > counter also needs to be per cpu and inside the rq lock. There doesn't
> > seem to be an existing one that fits this purpose.
>
> Sorry, your reply got lost, but I just ran into something that reminded
> me of this.
>
> There's sched_count. That's currently schedstat, but if you can find a
> spot in a hot cacheline (from schedule()'s perspective) then it
> should be cheap to incremenent unconditionally.
>
> If only someone were to write a useful cacheline perf tool (and no that
> c2c trainwreck doesn't count).
>

Thanks, I'll try the alternative implementation.

-Xi

Re: [PATCH] arm64: bpf: Fix branch offset in JIT

2020-09-14 Thread Xi Wang

On Mon, Sep 14, 2020 at 11:28 AM Ilias Apalodimas
 wrote:
> Even if that's true, is any reason at all why we should skip the first element
> of the array, that's now needed since 7c2e988f400 to jump back to the first
> instruction?
> Introducing 2 extra if conditions and hotfix the array on the fly (and for
> every future invocation of that), seems better to you?

My point was that there's no inherently correct/wrong way to construct
offsets.  As Luke explained in his email, 1) there are two different
strategies used by the JITs and 2) there are likely similar bugs
beyond arm64.

Each strategy has pros and cons, and I'm fine with either.  I like the
strategy used in your patch because it's more intuitive (offset[i] is
the start of the emitted instructions for BPF instruction i, rather
than the end), though the changes to the construction process are
trickier.

If we decide to patch the arm64 JIT the way you proposed, we should
consider whether to change other JITs consistently.

Re: [PATCH] arm64: bpf: Fix branch offset in JIT

2020-09-14 Thread Xi Wang

On Mon, Sep 14, 2020 at 10:55 AM Ilias Apalodimas
 wrote:
> We've briefly discussed this approach with Yauheni while coming up with the
> posted patch.
> I think that contructing the array correctly in the first place is better.
> Right now it might only be used in bpf2a64_offset() and 
> bpf_prog_fill_jited_linfo()
> but if we fixup the values on the fly in there, everyone that intends to use 
> the
> offset for any reason will have to account for the missing instruction.

I don't understand what you mean by "correctly."  What's your correctness spec?

I don't think there's some consistent semantics of "offsets" across
the JITs of different architectures (maybe it's good to clean that
up).  RV64 and RV32 JITs are doing something similar to arm64 with
respect to offsets.  CCing Björn and Luke.

Re: [PATCH] arm64: bpf: Fix branch offset in JIT

2020-09-14 Thread Xi Wang

On Mon, Sep 14, 2020 at 10:03 AM Ilias Apalodimas
 wrote:
> Naresh from Linaro reported it during his tests on 5.8-rc1 as well [1].
> I've included both Jiri and him on the v2 as reporters.
>
> [1] https://lkml.org/lkml/2020/8/11/58

I'm curious what you think of Luke's earlier patch to this bug:

https://lore.kernel.org/bpf/canowswkaj1hysw3bxbmg9_nd48fm0mxm5egdtmhu6ysec_g...@mail.gmail.com/T/#m4335b4005da0d60059ba96920fcaaecf2637042a

Re: [PATCH] sched: Make select_idle_sibling search domain configurable

2020-08-03 Thread Xi Wang

On Tue, Jul 28, 2020 at 10:54 AM Xi Wang  wrote:
>
> On Tue, Jul 28, 2020 at 3:39 AM  wrote:
> >
> > On Tue, Jul 28, 2020 at 12:01:31AM -0700, Xi Wang wrote:
> > > The scope of select_idle_sibling idle cpu search is LLC. This
> > > becomes a problem for the AMD CCX architecture, as the sd_llc is only
> > > 4 cores. On a many core machine, the range of search is too small to
> > > reach a satisfactory level of statistical multiplexing / efficient
> > > utilization of short idle time slices.
> > >
> > > With this patch idle sibling search is detached from LLC and it
> > > becomes run time configurable. To reduce search and migration
> > > overheads, a presearch domain is added. The presearch domain will be
> > > searched first before the "main search" domain, e.g.:
> > >
> > > sysctl_sched_wake_idle_domain == 2 ("MC" domain)
> > > sysctl_sched_wake_idle_presearch_domain == 1 ("DIE" domain)
> > >
> > > Presearch will go through 4 cores of a CCX. If no idle cpu is found
> > > during presearch, full search will go through the remaining cores of
> > > a cpu socket.
> >
> > *groan*, this is horrific :-(
> >
> > It is also in direct conflict with people wanting to make it smaller.
> >
> > On top of that, a domain number is a terrible terrible interface. They
> > aren't even available without SCHED_DEBUG on.
> >
> > What is the inter-L3 latency? Going by this that had better be awesome.
> > And if this Infinity Fabric stuff if highly effective in effectively
> > merging L3s -- analogous to what Intel does with it's cache slices, then
> > should we not change the AMD topology setup instead of this 'thing'?
> >
> > Also, this commit:
> >
> >   051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain")
> >
> > seems to suggest L3 is actually bigger. Suravee, can you please comment?
>
> I think 051f3ca02e46 was still saying 4 cores sharing an L3 but there
> is another numa layer which is 8 cores or 2 * 4 core L3 groups. This
> should be the chiplet layer.
>
> I don't have precise data but some anecdotes are: The latency
> difference between inter 4 core group access and inter 8 core group
> access is not huge. Also my experience from Intel machines was that
> accessing L3 data across numa domains (also across cpu socket) was not
> too bad until the link bandwidth was saturated. I am hoping the
> bandwidth situation is better for AMD as L3 groups are smaller.
> Another factor is sometimes the trade off is spending 10s of us of
> sched overhead vs time slicing at ~12.5ms latency.
>
> What makes the decision trickly is the configuration can depend on
> applications and the scale of the system. For a system with 8 cores,
> running it the old way with 2 * 4 core LLCs might be the best
> decision. For a system with a lot more cores, the number of threads on
> the machine would also scale up, which means more potential to create
> a dynamic imbalance. I have another (even more horrific) patch for
> auto configuring the sysctls, which has (nnext is the size of the next
> higher sched domain):
>
> /*
> * Widen the range of idle core search if llc domain is too small, both in
> * absolute sense and when compared to the next higher level domain.
> */
> if (nllc < min(24, nnext / 4)) {
> sysctl_sched_wake_idle_domain = next_level;
> sysctl_sched_wake_idle_presearch_domain = llc_level;
> /* Also make new idle search domain params more like default llc */
> sysctl_sched_wake_idle_domain_tune_flags = 1;
> }
>
> -Xi


Overall I think the current numa defaults should be considered broken
for large AMD machines due to latency and core utilization problems.
We need to have some solution for it. Simply Moving up the llc domain
should work, but we likely also want to avoid some of the obvious
overheads. Maybe switching to a CCX local first search order inside
select_idle_core would simplify it? It appears the configuration part
is still difficult to clean up.

-Xi

Re: [PATCH] sched: Make select_idle_sibling search domain configurable

2020-07-28 Thread Xi Wang

On Tue, Jul 28, 2020 at 3:39 AM  wrote:
>
> On Tue, Jul 28, 2020 at 12:01:31AM -0700, Xi Wang wrote:
> > The scope of select_idle_sibling idle cpu search is LLC. This
> > becomes a problem for the AMD CCX architecture, as the sd_llc is only
> > 4 cores. On a many core machine, the range of search is too small to
> > reach a satisfactory level of statistical multiplexing / efficient
> > utilization of short idle time slices.
> >
> > With this patch idle sibling search is detached from LLC and it
> > becomes run time configurable. To reduce search and migration
> > overheads, a presearch domain is added. The presearch domain will be
> > searched first before the "main search" domain, e.g.:
> >
> > sysctl_sched_wake_idle_domain == 2 ("MC" domain)
> > sysctl_sched_wake_idle_presearch_domain == 1 ("DIE" domain)
> >
> > Presearch will go through 4 cores of a CCX. If no idle cpu is found
> > during presearch, full search will go through the remaining cores of
> > a cpu socket.
>
> *groan*, this is horrific :-(
>
> It is also in direct conflict with people wanting to make it smaller.
>
> On top of that, a domain number is a terrible terrible interface. They
> aren't even available without SCHED_DEBUG on.
>
> What is the inter-L3 latency? Going by this that had better be awesome.
> And if this Infinity Fabric stuff if highly effective in effectively
> merging L3s -- analogous to what Intel does with it's cache slices, then
> should we not change the AMD topology setup instead of this 'thing'?
>
> Also, this commit:
>
>   051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain")
>
> seems to suggest L3 is actually bigger. Suravee, can you please comment?

I think 051f3ca02e46 was still saying 4 cores sharing an L3 but there
is another numa layer which is 8 cores or 2 * 4 core L3 groups. This
should be the chiplet layer.

I don't have precise data but some anecdotes are: The latency
difference between inter 4 core group access and inter 8 core group
access is not huge. Also my experience from Intel machines was that
accessing L3 data across numa domains (also across cpu socket) was not
too bad until the link bandwidth was saturated. I am hoping the
bandwidth situation is better for AMD as L3 groups are smaller.
Another factor is sometimes the trade off is spending 10s of us of
sched overhead vs time slicing at ~12.5ms latency.

What makes the decision trickly is the configuration can depend on
applications and the scale of the system. For a system with 8 cores,
running it the old way with 2 * 4 core LLCs might be the best
decision. For a system with a lot more cores, the number of threads on
the machine would also scale up, which means more potential to create
a dynamic imbalance. I have another (even more horrific) patch for
auto configuring the sysctls, which has (nnext is the size of the next
higher sched domain):

/*
* Widen the range of idle core search if llc domain is too small, both in
* absolute sense and when compared to the next higher level domain.
*/
if (nllc < min(24, nnext / 4)) {
sysctl_sched_wake_idle_domain = next_level;
sysctl_sched_wake_idle_presearch_domain = llc_level;
/* Also make new idle search domain params more like default llc */
sysctl_sched_wake_idle_domain_tune_flags = 1;
}

-Xi

[PATCH] sched: Make select_idle_sibling search domain configurable

2020-07-28 Thread Xi Wang

The scope of select_idle_sibling idle cpu search is LLC. This
becomes a problem for the AMD CCX architecture, as the sd_llc is only
4 cores. On a many core machine, the range of search is too small to
reach a satisfactory level of statistical multiplexing / efficient
utilization of short idle time slices.

With this patch idle sibling search is detached from LLC and it
becomes run time configurable. To reduce search and migration
overheads, a presearch domain is added. The presearch domain will be
searched first before the "main search" domain, e.g.:

sysctl_sched_wake_idle_domain == 2 ("MC" domain)
sysctl_sched_wake_idle_presearch_domain == 1 ("DIE" domain)

Presearch will go through 4 cores of a CCX. If no idle cpu is found
during presearch, full search will go through the remaining cores of
a cpu socket.

Heuristics including sd->avg_scan_cost and sds->have_idle_cores
are only active for the main search.

On a 128 core (2 socket * 64 core, 256 hw threads) AMD machine ran
hackbench as

hackbench -g 20 -f 20 --loops 1

A snapshot of run time was

Baseline: 11.8
With the patch: 7.6(configured as in the example above)

Signed-off-by: Xi Wang 
---
 block/blk-mq.c |   2 +-
 block/blk-softirq.c|   2 +-
 include/linux/cpuset.h |  10 +-
 include/linux/sched/topology.h |  11 +-
 kernel/cgroup/cpuset.c |  32 --
 kernel/sched/core.c|  10 +-
 kernel/sched/fair.c| 191 +
 kernel/sched/sched.h   |   9 +-
 kernel/sched/topology.c|  87 ++-
 kernel/sysctl.c|  25 +
 10 files changed, 256 insertions(+), 123 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e0d173beaa3..20aee9f047e2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -626,7 +626,7 @@ void blk_mq_force_complete_rq(struct request *rq)
 
cpu = get_cpu();
if (!test_bit(QUEUE_FLAG_SAME_FORCE, >queue_flags))
-   shared = cpus_share_cache(cpu, ctx->cpu);
+   shared = cpus_share_sis(cpu, ctx->cpu);
 
if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
rq->csd.func = __blk_mq_complete_request_remote;
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 6e7ec87d49fa..dd38ac0e1f2e 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -108,7 +108,7 @@ void __blk_complete_request(struct request *req)
 */
if (test_bit(QUEUE_FLAG_SAME_COMP, >queue_flags) && ccpu != -1) {
if (!test_bit(QUEUE_FLAG_SAME_FORCE, >queue_flags))
-   shared = cpus_share_cache(cpu, ccpu);
+   shared = cpus_share_sis(cpu, ccpu);
} else
ccpu = cpu;
 
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 04c20de66afc..8b243aa8462e 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -117,6 +117,7 @@ static inline int cpuset_do_slab_mem_spread(void)
 extern bool current_cpuset_is_being_rebound(void);
 
 extern void rebuild_sched_domains(void);
+extern void rebuild_sched_domains_force(void);
 
 extern void cpuset_print_current_mems_allowed(void);
 
@@ -173,7 +174,7 @@ static inline void cpuset_force_rebuild(void) { }
 
 static inline void cpuset_update_active_cpus(void)
 {
-   partition_sched_domains(1, NULL, NULL);
+   partition_sched_domains(1, NULL, NULL, 0);
 }
 
 static inline void cpuset_wait_for_hotplug(void) { }
@@ -259,7 +260,12 @@ static inline bool current_cpuset_is_being_rebound(void)
 
 static inline void rebuild_sched_domains(void)
 {
-   partition_sched_domains(1, NULL, NULL);
+   partition_sched_domains(1, NULL, NULL, 0);
+}
+
+static inline void rebuild_sched_domains_force(void)
+{
+   partition_sched_domains(1, NULL, NULL, 1);
 }
 
 static inline void cpuset_print_current_mems_allowed(void)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index fb11091129b3..aff9739cf516 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -151,16 +151,17 @@ static inline struct cpumask *sched_domain_span(struct 
sched_domain *sd)
 
 extern void partition_sched_domains_locked(int ndoms_new,
   cpumask_var_t doms_new[],
-  struct sched_domain_attr *dattr_new);
+  struct sched_domain_attr *dattr_new,
+  int force_update);
 
 extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
-   struct sched_domain_attr *dattr_new);
+   struct sched_domain_attr *dattr_new, int 
force_update);
 
 /* Allocate an array of sched domains, for partition_sched_domains(). */
 cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
 void free_s

Re: [PATCH bpf-next] bpf, riscv: Fix stack layout of JITed code on RV32

2020-04-29 Thread Xi Wang

On Wed, Apr 29, 2020 at 5:51 PM Luke Nelson  wrote:
>
> This patch fixes issues with stackframe unwinding and alignment in the
> current stack layout for BPF programs on RV32.
>
> In the current layout, RV32 fp points to the JIT scratch registers, rather
> than to the callee-saved registers. This breaks stackframe unwinding,
> which expects fp to point just above the saved ra and fp registers.
>
> This patch fixes the issue by moving the callee-saved registers to be
> stored on the top of the stack, pointed to by fp. This satisfies the
> assumptions of stackframe unwinding.
>
> This patch also fixes an issue with the old layout that the stack was
> not aligned to 16 bytes.
>
> Stacktrace from JITed code using the old stack layout:
>
>   [   12.196249 ] [] walk_stackframe+0x0/0x96
>
> Stacktrace using the new stack layout:
>
>   [   13.062888 ] [] walk_stackframe+0x0/0x96
>   [   13.063028 ] [] show_stack+0x28/0x32
>   [   13.063253 ] [] bpf_prog_82b916b2dfa00464+0x80/0x908
>   [   13.063417 ] [] bpf_test_run+0x124/0x39a
>   [   13.063553 ] [] bpf_prog_test_run_skb+0x234/0x448
>   [   13.063704 ] [] __do_sys_bpf+0x766/0x13b4
>   [   13.063840 ] [] sys_bpf+0xc/0x14
>   [   13.063961 ] [] ret_from_syscall+0x0/0x2
>
> The new code is also simpler to understand and includes an ASCII diagram
> of the stack layout.
>
> Tested on riscv32 QEMU virt machine.
>
> Signed-off-by: Luke Nelson 

Thanks for the fix!

Acked-by: Xi Wang

Re: [PATCH] bpf: add mod default A and X test cases

2015-11-04 Thread Xi Wang

On Wed, Nov 4, 2015 at 11:36 AM, Yang Shi  wrote:
> When running "mod X" operation, if X is 0 the filter has to be halt.
> Add new test cases to cover A = A mod X if X is 0, and A = A mod 1.
>
> CC: Xi Wang 
> CC: Zi Shen Lim 
> Signed-off-by: Yang Shi 

Acked-by: Xi Wang 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] bpf: add mod default A and X test cases

2015-11-04 Thread Xi Wang

On Wed, Nov 4, 2015 at 11:36 AM, Yang Shi <yang@linaro.org> wrote:
> When running "mod X" operation, if X is 0 the filter has to be halt.
> Add new test cases to cover A = A mod X if X is 0, and A = A mod 1.
>
> CC: Xi Wang <xi.w...@gmail.com>
> CC: Zi Shen Lim <zlim@gmail.com>
> Signed-off-by: Yang Shi <yang@linaro.org>

Acked-by: Xi Wang <xi.w...@gmail.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm64: bpf: fix div-by-zero case

2015-11-03 Thread Xi Wang

On Tue, Nov 3, 2015 at 10:56 PM, Zi Shen Lim  wrote:
> case BPF_ALU | BPF_DIV | BPF_X:
> case BPF_ALU64 | BPF_DIV | BPF_X:
> +   {
> +   const u8 r0 = bpf2a64[BPF_REG_0];
> +
> +   /* if (src == 0) return 0 */
> +   jmp_offset = 3; /* skip ahead to else path */
> +   check_imm19(jmp_offset);
> +   emit(A64_CBNZ(is64, src, jmp_offset), ctx);
> +   emit(A64_MOVZ(1, r0, 0, 0), ctx);
> +   jmp_offset = epilogue_offset(ctx);
> +   check_imm26(jmp_offset);
> +   emit(A64_B(jmp_offset), ctx);
> +   /* else */
> emit(A64_UDIV(is64, dst, dst, src), ctx);
> break;
> +   }
> case BPF_ALU | BPF_MOD | BPF_X:
> case BPF_ALU64 | BPF_MOD | BPF_X:

BPF_MOD might need the same fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm64: bpf: fix div-by-zero case

2015-11-03 Thread Xi Wang

On Tue, Nov 3, 2015 at 10:56 PM, Zi Shen Lim  wrote:
> case BPF_ALU | BPF_DIV | BPF_X:
> case BPF_ALU64 | BPF_DIV | BPF_X:
> +   {
> +   const u8 r0 = bpf2a64[BPF_REG_0];
> +
> +   /* if (src == 0) return 0 */
> +   jmp_offset = 3; /* skip ahead to else path */
> +   check_imm19(jmp_offset);
> +   emit(A64_CBNZ(is64, src, jmp_offset), ctx);
> +   emit(A64_MOVZ(1, r0, 0, 0), ctx);
> +   jmp_offset = epilogue_offset(ctx);
> +   check_imm26(jmp_offset);
> +   emit(A64_B(jmp_offset), ctx);
> +   /* else */
> emit(A64_UDIV(is64, dst, dst, src), ctx);
> break;
> +   }
> case BPF_ALU | BPF_MOD | BPF_X:
> case BPF_ALU64 | BPF_MOD | BPF_X:

BPF_MOD might need the same fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next] test_bpf: extend tests for 32-bit endianness conversion

2015-06-26 Thread Xi Wang

Currently "ALU_END_FROM_BE 32" and "ALU_END_FROM_LE 32" do not test if
the upper bits of the result are zeros (the arm64 JIT had such bugs).
Extend the two tests to catch this.

Cc: Alexei Starovoitov 
Signed-off-by: Xi Wang 
---
See the arm64 JIT bugs:

https://lkml.org/lkml/2015/6/25/721
---
 lib/test_bpf.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 7f58c73..9198f28 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -3674,6 +3674,9 @@ static struct bpf_test tests[] = {
.u.insns_int = {
BPF_LD_IMM64(R0, 0x0123456789abcdefLL),
BPF_ENDIAN(BPF_FROM_BE, R0, 32),
+   BPF_ALU64_REG(BPF_MOV, R1, R0),
+   BPF_ALU64_IMM(BPF_RSH, R1, 32),
+   BPF_ALU32_REG(BPF_ADD, R0, R1), /* R1 = 0 */
BPF_EXIT_INSN(),
},
INTERNAL,
@@ -3708,6 +3711,9 @@ static struct bpf_test tests[] = {
.u.insns_int = {
BPF_LD_IMM64(R0, 0x0123456789abcdefLL),
BPF_ENDIAN(BPF_FROM_LE, R0, 32),
+   BPF_ALU64_REG(BPF_MOV, R1, R0),
+   BPF_ALU64_IMM(BPF_RSH, R1, 32),
+   BPF_ALU32_REG(BPF_ADD, R0, R1), /* R1 = 0 */
BPF_EXIT_INSN(),
},
INTERNAL,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next] test_bpf: extend tests for 32-bit endianness conversion

2015-06-26 Thread Xi Wang

Currently ALU_END_FROM_BE 32 and ALU_END_FROM_LE 32 do not test if
the upper bits of the result are zeros (the arm64 JIT had such bugs).
Extend the two tests to catch this.

Cc: Alexei Starovoitov a...@plumgrid.com
Signed-off-by: Xi Wang xi.w...@gmail.com
---
See the arm64 JIT bugs:

https://lkml.org/lkml/2015/6/25/721
---
 lib/test_bpf.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 7f58c73..9198f28 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -3674,6 +3674,9 @@ static struct bpf_test tests[] = {
.u.insns_int = {
BPF_LD_IMM64(R0, 0x0123456789abcdefLL),
BPF_ENDIAN(BPF_FROM_BE, R0, 32),
+   BPF_ALU64_REG(BPF_MOV, R1, R0),
+   BPF_ALU64_IMM(BPF_RSH, R1, 32),
+   BPF_ALU32_REG(BPF_ADD, R0, R1), /* R1 = 0 */
BPF_EXIT_INSN(),
},
INTERNAL,
@@ -3708,6 +3711,9 @@ static struct bpf_test tests[] = {
.u.insns_int = {
BPF_LD_IMM64(R0, 0x0123456789abcdefLL),
BPF_ENDIAN(BPF_FROM_LE, R0, 32),
+   BPF_ALU64_REG(BPF_MOV, R1, R0),
+   BPF_ALU64_IMM(BPF_RSH, R1, 32),
+   BPF_ALU32_REG(BPF_ADD, R0, R1), /* R1 = 0 */
BPF_EXIT_INSN(),
},
INTERNAL,
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] arm64: bpf: fix endianness conversion bugs

2015-06-25 Thread Xi Wang

Upper bits should be zeroed in endianness conversion:

- even when there's no need to change endianness (i.e., BPF_FROM_BE
  on big endian or BPF_FROM_LE on little endian);

- after rev16.

This patch fixes such bugs by emitting extra instructions to clear
upper bits.

Cc: Zi Shen Lim 
Cc: Alexei Starovoitov 
Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
Signed-off-by: Xi Wang 
---
The current testsuite catches the 16-bit bugs.  I'll send a separate
patch that extends test_bpf to catch the 32-bit ones.
---
 arch/arm64/net/bpf_jit.h  |  4 
 arch/arm64/net/bpf_jit_comp.c | 22 --
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index de0a81a..98a26ce 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -110,6 +110,10 @@
 /* Rd = Rn >> shift; signed */
 #define A64_ASR(sf, Rd, Rn, shift) A64_SBFM(sf, Rd, Rn, shift, (sf) ? 63 : 31)
 
+/* Zero extend */
+#define A64_UXTH(sf, Rd, Rn) A64_UBFM(sf, Rd, Rn, 0, 15)
+#define A64_UXTW(sf, Rd, Rn) A64_UBFM(sf, Rd, Rn, 0, 31)
+
 /* Move wide (immediate) */
 #define A64_MOVEW(sf, Rd, imm16, shift, type) \
aarch64_insn_gen_movewide(Rd, imm16, shift, \
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index c81ddd4..c047598 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -289,23 +289,41 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
case BPF_ALU | BPF_END | BPF_FROM_BE:
 #ifdef CONFIG_CPU_BIG_ENDIAN
if (BPF_SRC(code) == BPF_FROM_BE)
-   break;
+   goto emit_bswap_uxt;
 #else /* !CONFIG_CPU_BIG_ENDIAN */
if (BPF_SRC(code) == BPF_FROM_LE)
-   break;
+   goto emit_bswap_uxt;
 #endif
switch (imm) {
case 16:
emit(A64_REV16(is64, dst, dst), ctx);
+   /* zero-extend 16 bits into 64 bits */
+   emit(A64_UXTH(is64, dst, dst), ctx);
break;
case 32:
emit(A64_REV32(is64, dst, dst), ctx);
+   /* upper 32 bits already cleared */
break;
case 64:
emit(A64_REV64(dst, dst), ctx);
break;
}
break;
+emit_bswap_uxt:
+   switch (imm) {
+   case 16:
+   /* zero-extend 16 bits into 64 bits */
+   emit(A64_UXTH(is64, dst, dst), ctx);
+   break;
+   case 32:
+   /* zero-extend 32 bits into 64 bits */
+   emit(A64_UXTW(is64, dst, dst), ctx);
+   break;
+   case 64:
+   /* nop */
+   break;
+   }
+   break;
/* dst = imm */
case BPF_ALU | BPF_MOV | BPF_K:
case BPF_ALU64 | BPF_MOV | BPF_K:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arm64: bpf: fix out-of-bounds read in bpf2a64_offset()

2015-06-25 Thread Xi Wang

Problems occur when bpf_to or bpf_from has value prog->len - 1 (e.g.,
"Very long jump backwards" in test_bpf where the last instruction is a
jump): since ctx->offset has length prog->len, ctx->offset[bpf_to + 1]
or ctx->offset[bpf_from + 1] will cause an out-of-bounds read, leading
to a bogus jump offset and kernel panic.

This patch moves updating ctx->offset to after calling build_insn(),
and changes indexing to use bpf_to and bpf_from without + 1.

Cc: Zi Shen Lim 
Cc: Alexei Starovoitov 
Cc: Will Deacon 
Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
Signed-off-by: Xi Wang 
---
 arch/arm64/net/bpf_jit_comp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index dc6a484..c81ddd4 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -113,9 +113,9 @@ static inline void emit_a64_mov_i(const int is64, const int 
reg,
 static inline int bpf2a64_offset(int bpf_to, int bpf_from,
 const struct jit_ctx *ctx)
 {
-   int to = ctx->offset[bpf_to + 1];
+   int to = ctx->offset[bpf_to];
/* -1 to account for the Branch instruction */
-   int from = ctx->offset[bpf_from + 1] - 1;
+   int from = ctx->offset[bpf_from] - 1;
 
return to - from;
 }
@@ -640,10 +640,11 @@ static int build_body(struct jit_ctx *ctx)
const struct bpf_insn *insn = >insnsi[i];
int ret;
 
+   ret = build_insn(insn, ctx);
+
if (ctx->image == NULL)
ctx->offset[i] = ctx->idx;
 
-   ret = build_insn(insn, ctx);
if (ret > 0) {
i++;
continue;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arm64: bpf: fix out-of-bounds read in bpf2a64_offset()

2015-06-25 Thread Xi Wang

Problems occur when bpf_to or bpf_from has value prog-len - 1 (e.g.,
Very long jump backwards in test_bpf where the last instruction is a
jump): since ctx-offset has length prog-len, ctx-offset[bpf_to + 1]
or ctx-offset[bpf_from + 1] will cause an out-of-bounds read, leading
to a bogus jump offset and kernel panic.

This patch moves updating ctx-offset to after calling build_insn(),
and changes indexing to use bpf_to and bpf_from without + 1.

Cc: Zi Shen Lim zlim@gmail.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Will Deacon will.dea...@arm.com
Fixes: e54bcde3d69d (arm64: eBPF JIT compiler)
Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/arm64/net/bpf_jit_comp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index dc6a484..c81ddd4 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -113,9 +113,9 @@ static inline void emit_a64_mov_i(const int is64, const int 
reg,
 static inline int bpf2a64_offset(int bpf_to, int bpf_from,
 const struct jit_ctx *ctx)
 {
-   int to = ctx-offset[bpf_to + 1];
+   int to = ctx-offset[bpf_to];
/* -1 to account for the Branch instruction */
-   int from = ctx-offset[bpf_from + 1] - 1;
+   int from = ctx-offset[bpf_from] - 1;
 
return to - from;
 }
@@ -640,10 +640,11 @@ static int build_body(struct jit_ctx *ctx)
const struct bpf_insn *insn = prog-insnsi[i];
int ret;
 
+   ret = build_insn(insn, ctx);
+
if (ctx-image == NULL)
ctx-offset[i] = ctx-idx;
 
-   ret = build_insn(insn, ctx);
if (ret  0) {
i++;
continue;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] arm64: bpf: fix endianness conversion bugs

2015-06-25 Thread Xi Wang

Upper bits should be zeroed in endianness conversion:

- even when there's no need to change endianness (i.e., BPF_FROM_BE
  on big endian or BPF_FROM_LE on little endian);

- after rev16.

This patch fixes such bugs by emitting extra instructions to clear
upper bits.

Cc: Zi Shen Lim zlim@gmail.com
Cc: Alexei Starovoitov a...@plumgrid.com
Fixes: e54bcde3d69d (arm64: eBPF JIT compiler)
Signed-off-by: Xi Wang xi.w...@gmail.com
---
The current testsuite catches the 16-bit bugs.  I'll send a separate
patch that extends test_bpf to catch the 32-bit ones.
---
 arch/arm64/net/bpf_jit.h  |  4 
 arch/arm64/net/bpf_jit_comp.c | 22 --
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index de0a81a..98a26ce 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -110,6 +110,10 @@
 /* Rd = Rn  shift; signed */
 #define A64_ASR(sf, Rd, Rn, shift) A64_SBFM(sf, Rd, Rn, shift, (sf) ? 63 : 31)
 
+/* Zero extend */
+#define A64_UXTH(sf, Rd, Rn) A64_UBFM(sf, Rd, Rn, 0, 15)
+#define A64_UXTW(sf, Rd, Rn) A64_UBFM(sf, Rd, Rn, 0, 31)
+
 /* Move wide (immediate) */
 #define A64_MOVEW(sf, Rd, imm16, shift, type) \
aarch64_insn_gen_movewide(Rd, imm16, shift, \
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index c81ddd4..c047598 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -289,23 +289,41 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
case BPF_ALU | BPF_END | BPF_FROM_BE:
 #ifdef CONFIG_CPU_BIG_ENDIAN
if (BPF_SRC(code) == BPF_FROM_BE)
-   break;
+   goto emit_bswap_uxt;
 #else /* !CONFIG_CPU_BIG_ENDIAN */
if (BPF_SRC(code) == BPF_FROM_LE)
-   break;
+   goto emit_bswap_uxt;
 #endif
switch (imm) {
case 16:
emit(A64_REV16(is64, dst, dst), ctx);
+   /* zero-extend 16 bits into 64 bits */
+   emit(A64_UXTH(is64, dst, dst), ctx);
break;
case 32:
emit(A64_REV32(is64, dst, dst), ctx);
+   /* upper 32 bits already cleared */
break;
case 64:
emit(A64_REV64(dst, dst), ctx);
break;
}
break;
+emit_bswap_uxt:
+   switch (imm) {
+   case 16:
+   /* zero-extend 16 bits into 64 bits */
+   emit(A64_UXTH(is64, dst, dst), ctx);
+   break;
+   case 32:
+   /* zero-extend 32 bits into 64 bits */
+   emit(A64_UXTW(is64, dst, dst), ctx);
+   break;
+   case 64:
+   /* nop */
+   break;
+   }
+   break;
/* dst = imm */
case BPF_ALU | BPF_MOV | BPF_K:
case BPF_ALU64 | BPF_MOV | BPF_K:
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next] test: bpf: extend "load 64-bit immediate" testcase

2015-05-09 Thread Xi Wang

Extend the testcase to catch a signedness bug in the arm64 JIT:

test_bpf: #58 load 64-bit immediate jited:1 ret -1 != 1 FAIL (1 times)

This is useful to ensure other JITs won't have a similar bug.

Link: https://lkml.org/lkml/2015/5/8/458
Cc: Alexei Starovoitov 
Cc: Will Deacon 
Signed-off-by: Xi Wang 
---
 lib/test_bpf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index f2c23ffaa6d7..3c41049d72d8 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1755,7 +1755,8 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
BPF_JMP_IMM(BPF_JEQ, R3, 0x1234, 1),
BPF_EXIT_INSN(),
-   BPF_ALU64_IMM(BPF_MOV, R0, 1),
+   BPF_LD_IMM64(R0, 0x1LL),
+   BPF_ALU64_IMM(BPF_RSH, R0, 32), /* R0 = 1 */
BPF_EXIT_INSN(),
},
INTERNAL,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next] test: bpf: extend load 64-bit immediate testcase

2015-05-09 Thread Xi Wang

Extend the testcase to catch a signedness bug in the arm64 JIT:

test_bpf: #58 load 64-bit immediate jited:1 ret -1 != 1 FAIL (1 times)

This is useful to ensure other JITs won't have a similar bug.

Link: https://lkml.org/lkml/2015/5/8/458
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Xi Wang xi.w...@gmail.com
---
 lib/test_bpf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index f2c23ffaa6d7..3c41049d72d8 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1755,7 +1755,8 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
BPF_JMP_IMM(BPF_JEQ, R3, 0x1234, 1),
BPF_EXIT_INSN(),
-   BPF_ALU64_IMM(BPF_MOV, R0, 1),
+   BPF_LD_IMM64(R0, 0x1LL),
+   BPF_ALU64_IMM(BPF_RSH, R0, 32), /* R0 = 1 */
BPF_EXIT_INSN(),
},
INTERNAL,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm64: bpf: fix signedness bug in loading 64-bit immediate

2015-05-08 Thread Xi Wang

On Fri, May 8, 2015 at 1:38 AM, Will Deacon  wrote:
>> - imm64 = (u64)insn1.imm << 32 | imm;
>> + imm64 = ((u64)(u32)insn1.imm) << 32 | (u64)(u32)imm;
>
> This seems a bit convoluted to me. Don't you just need to add a (u32)
> cast to imm and that's it? The (u64)(u32) looks redundant.

You're right -  the second (u64) is redundant; the hope was to make
the code easier to understand.  It's from the interpreter code in
kernel/core/bpf.c, which uses (u64)(u32) as well.

>> - BPF_ALU64_IMM(BPF_MOV, R0, 1),
>> + BPF_LD_IMM64(R0, 0x1LL),
>> + BPF_ALU64_IMM(BPF_RSH, R0, 32), /* R0 = 1 */
>>   BPF_EXIT_INSN(),
>
> This hunk should probably be a separate patch, unless you get Alexei's ack
> for me to take it via the arm64 tree too.

I would be happy to split this into a separate patch if that works
better, or simply drop this part.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm64: bpf: fix signedness bug in loading 64-bit immediate

2015-05-08 Thread Xi Wang

On Fri, May 8, 2015 at 1:38 AM, Will Deacon will.dea...@arm.com wrote:
 - imm64 = (u64)insn1.imm  32 | imm;
 + imm64 = ((u64)(u32)insn1.imm)  32 | (u64)(u32)imm;

 This seems a bit convoluted to me. Don't you just need to add a (u32)
 cast to imm and that's it? The (u64)(u32) looks redundant.

You're right -  the second (u64) is redundant; the hope was to make
the code easier to understand.  It's from the interpreter code in
kernel/core/bpf.c, which uses (u64)(u32) as well.

 - BPF_ALU64_IMM(BPF_MOV, R0, 1),
 + BPF_LD_IMM64(R0, 0x1LL),
 + BPF_ALU64_IMM(BPF_RSH, R0, 32), /* R0 = 1 */
   BPF_EXIT_INSN(),

 This hunk should probably be a separate patch, unless you get Alexei's ack
 for me to take it via the arm64 tree too.

I would be happy to split this into a separate patch if that works
better, or simply drop this part.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arm64: bpf: fix signedness bug in loading 64-bit immediate

2015-05-07 Thread Xi Wang

Consider "(u64)insn1.imm << 32 | imm" in the arm64 JIT.  Since imm is
signed 32-bit, it is sign-extended to 64-bit, losing the high 32 bits.
The fix is to convert imm to u32 first and zero-extend it to u64.

Also extend test_bpf to catch this JIT bug; the interpreter is correct.

Before:
test_bpf: #58 load 64-bit immediate ret -1 != 1 FAIL (1 times)

After:
test_bpf: #58 load 64-bit immediate 74 PASS

Fixes: 30d3d94cc3d5 ("arm64: bpf: add 'load 64-bit immediate' instruction")
Cc: Zi Shen Lim 
Cc: Alexei Starovoitov 
Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Xi Wang 
---
 arch/arm64/net/bpf_jit_comp.c | 2 +-
 lib/test_bpf.c| 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index edba042b2325..14cdc099fda0 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -487,7 +487,7 @@ emit_cond_jmp:
return -EINVAL;
}
 
-   imm64 = (u64)insn1.imm << 32 | imm;
+   imm64 = ((u64)(u32)insn1.imm) << 32 | (u64)(u32)imm;
emit_a64_mov_i64(dst, imm64, ctx);
 
return 1;
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 80d78c51f65f..9f6849891b5f 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1755,7 +1755,8 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
BPF_JMP_IMM(BPF_JEQ, R3, 0x1234, 1),
BPF_EXIT_INSN(),
-   BPF_ALU64_IMM(BPF_MOV, R0, 1),
+   BPF_LD_IMM64(R0, 0x1LL),
+   BPF_ALU64_IMM(BPF_RSH, R0, 32), /* R0 = 1 */
BPF_EXIT_INSN(),
},
INTERNAL,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arm64: bpf: fix signedness bug in loading 64-bit immediate

2015-05-07 Thread Xi Wang

Consider (u64)insn1.imm  32 | imm in the arm64 JIT.  Since imm is
signed 32-bit, it is sign-extended to 64-bit, losing the high 32 bits.
The fix is to convert imm to u32 first and zero-extend it to u64.

Also extend test_bpf to catch this JIT bug; the interpreter is correct.

Before:
test_bpf: #58 load 64-bit immediate ret -1 != 1 FAIL (1 times)

After:
test_bpf: #58 load 64-bit immediate 74 PASS

Fixes: 30d3d94cc3d5 (arm64: bpf: add 'load 64-bit immediate' instruction)
Cc: Zi Shen Lim zlim@gmail.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/arm64/net/bpf_jit_comp.c | 2 +-
 lib/test_bpf.c| 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index edba042b2325..14cdc099fda0 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -487,7 +487,7 @@ emit_cond_jmp:
return -EINVAL;
}
 
-   imm64 = (u64)insn1.imm  32 | imm;
+   imm64 = ((u64)(u32)insn1.imm)  32 | (u64)(u32)imm;
emit_a64_mov_i64(dst, imm64, ctx);
 
return 1;
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 80d78c51f65f..9f6849891b5f 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1755,7 +1755,8 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
BPF_JMP_IMM(BPF_JEQ, R3, 0x1234, 1),
BPF_EXIT_INSN(),
-   BPF_ALU64_IMM(BPF_MOV, R0, 1),
+   BPF_LD_IMM64(R0, 0x1LL),
+   BPF_ALU64_IMM(BPF_RSH, R0, 32), /* R0 = 1 */
BPF_EXIT_INSN(),
},
INTERNAL,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 -next 2/2] x86: bpf_jit_comp: optimize BPF_S_ANC_SECCOMP_LD_W

2013-05-02 Thread Xi Wang

This patch further optimizes JIT for seccomp filters.  It removes the
call to seccomp_bpf_load() and directly emits instructions instead.

Signed-off-by: Xi Wang 
Cc: Daniel Borkmann 
Cc: Heiko Carstens 
Cc: Will Drewry 
Cc: Eric Dumazet 
Cc: Russell King 
Cc: David Laight 
Cc: "David S. Miller" 
Cc: Andrew Morton 
Cc: Nicolas Schichan 
---
 arch/x86/net/bpf_jit_comp.c | 126 
 1 file changed, 116 insertions(+), 10 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 64c72aa..08b024b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -8,10 +8,11 @@
  * of the License.
  */
 #include 
-#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Conventions :
@@ -113,7 +114,7 @@ do {
\
 #define SEEN_SKBREF  (1 << 3) /* use pointer to skb */
 #define SEEN_SECCOMP (1 << 4) /* seccomp filters */
 
-#define NEED_PERILOGUE(_seen) ((_seen) & (SEEN_XREG | SEEN_MEM | SEEN_DATAREF 
| SEEN_SECCOMP))
+#define NEED_PERILOGUE(_seen) ((_seen) & (SEEN_XREG | SEEN_MEM | SEEN_DATAREF))
 
 static inline void bpf_flush_icache(void *start, void *end)
 {
@@ -148,6 +149,25 @@ static int pkt_type_offset(void)
return -1;
 }
 
+/* helper to find the offset in struct seccomp_data */
+#define BPF_DATA(_name) offsetof(struct seccomp_data, _name)
+
+/* helper to find the negative offset from the end of struct pt_regs */
+#define roffsetof(_type, _member) ((int)(offsetof(_type, _member) - 
sizeof(_type)))
+#define PT_REGS(_name)  roffsetof(struct pt_regs, _name)
+
+#define EMIT_REGS_LOAD(offset) \
+do {   \
+   if (is_imm8(offset)) {  \
+   /* mov off8(%r8),%eax */\
+   EMIT4(0x41, 0x8b, 0x40, offset);\
+   } else {\
+   /* mov off32(%r8),%eax */   \
+   EMIT3(0x41, 0x8b, 0x80);\
+   EMIT(offset, 4);\
+   }   \
+} while (0)
+
 static void *__bpf_jit_compile(struct sock_filter *filter, unsigned int flen, 
u8 seen_all)
 {
u8 temp[64];
@@ -229,12 +249,44 @@ static void *__bpf_jit_compile(struct sock_filter 
*filter, unsigned int flen, u8
}
 
 #ifdef CONFIG_SECCOMP_FILTER_JIT
+   /* For seccomp filters, load :
+*  r9  = current
+*  r8  = current->thread.sp0
+*  edi = task_thread_info(current)->status & TS_COMPAT
+*
+* r8 points to the end of struct pt_regs, 
task_pt_regs(current) + 1
+*/
if (seen_or_pass0 & SEEN_SECCOMP) {
/* seccomp filters: skb must be NULL */
if (seen_or_pass0 & (SEEN_SKBREF | SEEN_DATAREF)) {
pr_err_once("seccomp filters shouldn't use 
skb");
goto out;
}
+
+   /* r9 = current */
+   EMIT1(0x65);EMIT4(0x4c, 0x8b, 0x0c, 0x25); /* mov 
%gs:imm32,%r9 */
+   EMIT((u32)(unsigned long)_task, 4);
+
+   /* r8 = current->thread.sp0 */
+   EMIT3(0x4d, 0x8b, 0x81); /* mov off32(%r9),%r8 */
+   EMIT(offsetof(struct task_struct, thread.sp0), 4);
+
+   /* edi = task_thread_info(current)->status & TS_COMPAT 
*/
+#ifdef CONFIG_IA32_EMULATION
+   /* task_thread_info(current): current->stack */
+   BUILD_BUG_ON(!is_imm8(offsetof(struct task_struct, 
stack)));
+   /* mov off8(%r9),%rdi */
+   EMIT4(0x49, 0x8b, 0x79, offsetof(struct task_struct, 
stack));
+   /* task_thread_info(current)->status */
+   BUILD_BUG_ON(!is_imm8(offsetof(struct thread_info, 
status)));
+   BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info, status) 
!= 4);
+   /* mov off8(%rdi),%edi */
+   EMIT3(0x8b, 0x7f, offsetof(struct thread_info, status));
+   /* task_thread_info(current)->status & TS_COMPAT */
+   BUILD_BUG_ON(!is_imm8(TS_COMPAT));
+   /* and imm8,%edi */
+   EMIT3(0x83, 0xe7, TS_COMPAT);
+#endif /* CONFIG_IA32_EMULATION */
}
 #endif /* CONFIG_SECCOMP_FILTER_JIT */
 
@@ -709,14 +761,68 @@ cond_branch:  f_offset = addrs[i + 
filter[i].jf] - addrs[i];
 #ifdef CONFIG_SECCOMP_FILTER_JIT
case BPF_S_ANC_SECCOMP_LD_W:

[PATCH v3 -next 1/2] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W

2013-05-02 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in x86 JIT, by simply calling seccomp_bpf_load().

SEEN_SKBREF was suggested by Eric Dumazet.  SEEN_SKBREF shouldn't be
set in seccomp filters.

Signed-off-by: Xi Wang 
Cc: Daniel Borkmann 
Cc: Heiko Carstens 
Cc: Will Drewry 
Cc: Eric Dumazet 
Cc: Russell King 
Cc: David Laight 
Cc: "David S. Miller" 
Cc: Andrew Morton 
Cc: Nicolas Schichan 
---
 arch/x86/Kconfig|   1 +
 arch/x86/net/bpf_jit_comp.c | 112 +++-
 2 files changed, 91 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e8fff2f4..f7e1848 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,6 +93,7 @@ config X86
select IRQ_FORCED_THREADING
select USE_GENERIC_SMP_HELPERS if SMP
select HAVE_BPF_JIT if X86_64
+   select HAVE_SECCOMP_FILTER_JIT if X86_64
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select CLKEVT_I8253
select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9659817..64c72aa 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -107,9 +107,13 @@ do {   
\
goto cond_branch
 
 
-#define SEEN_DATAREF 1 /* might call external helpers */
-#define SEEN_XREG2 /* ebx is used */
-#define SEEN_MEM 4 /* use mem[] for temporary storage */
+#define SEEN_DATAREF (1 << 0) /* might call external skb helpers */
+#define SEEN_XREG(1 << 1) /* ebx is used */
+#define SEEN_MEM (1 << 2) /* use mem[] for temporary storage */
+#define SEEN_SKBREF  (1 << 3) /* use pointer to skb */
+#define SEEN_SECCOMP (1 << 4) /* seccomp filters */
+
+#define NEED_PERILOGUE(_seen) ((_seen) & (SEEN_XREG | SEEN_MEM | SEEN_DATAREF 
| SEEN_SECCOMP))
 
 static inline void bpf_flush_icache(void *start, void *end)
 {
@@ -144,7 +148,7 @@ static int pkt_type_offset(void)
return -1;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+static void *__bpf_jit_compile(struct sock_filter *filter, unsigned int flen, 
u8 seen_all)
 {
u8 temp[64];
u8 *prog;
@@ -157,15 +161,14 @@ void bpf_jit_compile(struct sk_filter *fp)
int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */
unsigned int cleanup_addr; /* epilogue code offset */
unsigned int *addrs;
-   const struct sock_filter *filter = fp->insns;
-   int flen = fp->len;
+   void *bpf_func = NULL;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kmalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/* Before first pass, make a rough estimation of addrs[]
 * each bpf instruction is translated to less than 64 bytes
@@ -177,12 +180,12 @@ void bpf_jit_compile(struct sk_filter *fp)
cleanup_addr = proglen; /* epilogue address */
 
for (pass = 0; pass < 10; pass++) {
-   u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF | 
SEEN_MEM) : seen;
+   u8 seen_or_pass0 = (pass == 0) ? seen_all : seen;
/* no prologue/epilogue for trivial filters (RET something) */
proglen = 0;
prog = temp;
 
-   if (seen_or_pass0) {
+   if (NEED_PERILOGUE(seen_or_pass0)) {
EMIT4(0x55, 0x48, 0x89, 0xe5); /* push %rbp; mov 
%rsp,%rbp */
EMIT4(0x48, 0x83, 0xec, 96);/* subq  $96,%rsp   
*/
/* note : must save %rbx in case bpf_error is hit */
@@ -225,6 +228,16 @@ void bpf_jit_compile(struct sk_filter *fp)
}
}
 
+#ifdef CONFIG_SECCOMP_FILTER_JIT
+   if (seen_or_pass0 & SEEN_SECCOMP) {
+   /* seccomp filters: skb must be NULL */
+   if (seen_or_pass0 & (SEEN_SKBREF | SEEN_DATAREF)) {
+   pr_err_once("seccomp filters shouldn't use 
skb");
+   goto out;
+   }
+   }
+#endif /* CONFIG_SECCOMP_FILTER_JIT */
+
switch (filter[0].code) {
case BPF_S_RET_K:
case BPF_S_LD_W_LEN:
@@ -237,6 +250,7 @@ void bpf_jit_compile(struct sk_filter *fp)
case BPF_S_ANC_VLAN_TAG_PRESENT:
case BPF_S_ANC_QUEUE:
case BPF_S_ANC_PKTTYPE:
+   case BPF_S_ANC_SECCOMP_LD_W:
case BPF_S_LD_W_ABS:
case BPF_S_LD_H_ABS:
case BPF_S_LD_B_ABS:
@@ -408,7 +422,7 @@ void bpf_jit_compile(struct sk_filter *fp)
}
/* fallinto */

[PATCH v3 -next 0/2] seccomp filter JIT for x86

2013-05-02 Thread Xi Wang

This patchset is rebased against linux-next, on top of the seccomp JIT
interface from Nicolas Schichan:

  00afda30 "bpf: add comments explaining the schedule_work() operation"
  c2cf3192 "ARM: net: bpf_jit: add support for jitted seccomp filters."
  8ef6cd9d "ARM: net: bpf_jit: make code generation less dependent on
struct sk_filter."
  747416f4 "seccomp: add generic code for jitted seccomp filters."

Patch 1/2 implements JIT by calling seccomp_bpf_load().  It introduces
SEEN_SKBREF, as suggested by Eric Dumazet, to make ensure that seccomp
filters don't use skb (%rdi).

Patch 2/2 eliminates the call to seccomp_bpf_load() and instead emits
instructions directly.  It doesn't have to emit prologue/epilogue either.

Both patches have been tested using vsftpd, openssh, and the seccomp
filter examples.

My main worry about 2/2 is that it's way harder to maintain compared
to 1/2.  I'm posting it here for completeness, but I'd suggest to consider
patch 1/2 only for now.

Xi Wang (2):
  x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W
  x86: bpf_jit_comp: optimize BPF_S_ANC_SECCOMP_LD_W

 arch/x86/Kconfig|   1 +
 arch/x86/net/bpf_jit_comp.c | 220 +++-
 2 files changed, 198 insertions(+), 23 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 -next 0/2] seccomp filter JIT for x86

2013-05-02 Thread Xi Wang

This patchset is rebased against linux-next, on top of the seccomp JIT
interface from Nicolas Schichan:

  00afda30 bpf: add comments explaining the schedule_work() operation
  c2cf3192 ARM: net: bpf_jit: add support for jitted seccomp filters.
  8ef6cd9d ARM: net: bpf_jit: make code generation less dependent on
struct sk_filter.
  747416f4 seccomp: add generic code for jitted seccomp filters.

Patch 1/2 implements JIT by calling seccomp_bpf_load().  It introduces
SEEN_SKBREF, as suggested by Eric Dumazet, to make ensure that seccomp
filters don't use skb (%rdi).

Patch 2/2 eliminates the call to seccomp_bpf_load() and instead emits
instructions directly.  It doesn't have to emit prologue/epilogue either.

Both patches have been tested using vsftpd, openssh, and the seccomp
filter examples.

My main worry about 2/2 is that it's way harder to maintain compared
to 1/2.  I'm posting it here for completeness, but I'd suggest to consider
patch 1/2 only for now.

Xi Wang (2):
  x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W
  x86: bpf_jit_comp: optimize BPF_S_ANC_SECCOMP_LD_W

 arch/x86/Kconfig|   1 +
 arch/x86/net/bpf_jit_comp.c | 220 +++-
 2 files changed, 198 insertions(+), 23 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 -next 2/2] x86: bpf_jit_comp: optimize BPF_S_ANC_SECCOMP_LD_W

2013-05-02 Thread Xi Wang

This patch further optimizes JIT for seccomp filters.  It removes the
call to seccomp_bpf_load() and directly emits instructions instead.

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: Daniel Borkmann dbork...@redhat.com
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Will Drewry w...@chromium.org
Cc: Eric Dumazet eduma...@google.com
Cc: Russell King li...@arm.linux.org.uk
Cc: David Laight david.lai...@aculab.com
Cc: David S. Miller da...@davemloft.net
Cc: Andrew Morton a...@linux-foundation.org
Cc: Nicolas Schichan nschic...@freebox.fr
---
 arch/x86/net/bpf_jit_comp.c | 126 
 1 file changed, 116 insertions(+), 10 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 64c72aa..08b024b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -8,10 +8,11 @@
  * of the License.
  */
 #include linux/moduleloader.h
-#include asm/cacheflush.h
 #include linux/netdevice.h
 #include linux/filter.h
 #include linux/if_vlan.h
+#include asm/cacheflush.h
+#include asm/syscall.h
 
 /*
  * Conventions :
@@ -113,7 +114,7 @@ do {
\
 #define SEEN_SKBREF  (1  3) /* use pointer to skb */
 #define SEEN_SECCOMP (1  4) /* seccomp filters */
 
-#define NEED_PERILOGUE(_seen) ((_seen)  (SEEN_XREG | SEEN_MEM | SEEN_DATAREF 
| SEEN_SECCOMP))
+#define NEED_PERILOGUE(_seen) ((_seen)  (SEEN_XREG | SEEN_MEM | SEEN_DATAREF))
 
 static inline void bpf_flush_icache(void *start, void *end)
 {
@@ -148,6 +149,25 @@ static int pkt_type_offset(void)
return -1;
 }
 
+/* helper to find the offset in struct seccomp_data */
+#define BPF_DATA(_name) offsetof(struct seccomp_data, _name)
+
+/* helper to find the negative offset from the end of struct pt_regs */
+#define roffsetof(_type, _member) ((int)(offsetof(_type, _member) - 
sizeof(_type)))
+#define PT_REGS(_name)  roffsetof(struct pt_regs, _name)
+
+#define EMIT_REGS_LOAD(offset) \
+do {   \
+   if (is_imm8(offset)) {  \
+   /* mov off8(%r8),%eax */\
+   EMIT4(0x41, 0x8b, 0x40, offset);\
+   } else {\
+   /* mov off32(%r8),%eax */   \
+   EMIT3(0x41, 0x8b, 0x80);\
+   EMIT(offset, 4);\
+   }   \
+} while (0)
+
 static void *__bpf_jit_compile(struct sock_filter *filter, unsigned int flen, 
u8 seen_all)
 {
u8 temp[64];
@@ -229,12 +249,44 @@ static void *__bpf_jit_compile(struct sock_filter 
*filter, unsigned int flen, u8
}
 
 #ifdef CONFIG_SECCOMP_FILTER_JIT
+   /* For seccomp filters, load :
+*  r9  = current
+*  r8  = current-thread.sp0
+*  edi = task_thread_info(current)-status  TS_COMPAT
+*
+* r8 points to the end of struct pt_regs, 
task_pt_regs(current) + 1
+*/
if (seen_or_pass0  SEEN_SECCOMP) {
/* seccomp filters: skb must be NULL */
if (seen_or_pass0  (SEEN_SKBREF | SEEN_DATAREF)) {
pr_err_once(seccomp filters shouldn't use 
skb);
goto out;
}
+
+   /* r9 = current */
+   EMIT1(0x65);EMIT4(0x4c, 0x8b, 0x0c, 0x25); /* mov 
%gs:imm32,%r9 */
+   EMIT((u32)(unsigned long)current_task, 4);
+
+   /* r8 = current-thread.sp0 */
+   EMIT3(0x4d, 0x8b, 0x81); /* mov off32(%r9),%r8 */
+   EMIT(offsetof(struct task_struct, thread.sp0), 4);
+
+   /* edi = task_thread_info(current)-status  TS_COMPAT 
*/
+#ifdef CONFIG_IA32_EMULATION
+   /* task_thread_info(current): current-stack */
+   BUILD_BUG_ON(!is_imm8(offsetof(struct task_struct, 
stack)));
+   /* mov off8(%r9),%rdi */
+   EMIT4(0x49, 0x8b, 0x79, offsetof(struct task_struct, 
stack));
+   /* task_thread_info(current)-status */
+   BUILD_BUG_ON(!is_imm8(offsetof(struct thread_info, 
status)));
+   BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info, status) 
!= 4);
+   /* mov off8(%rdi),%edi */
+   EMIT3(0x8b, 0x7f, offsetof(struct thread_info, status));
+   /* task_thread_info(current)-status  TS_COMPAT */
+   BUILD_BUG_ON(!is_imm8(TS_COMPAT));
+   /* and imm8,%edi */
+   EMIT3(0x83, 0xe7, TS_COMPAT);
+#endif /* CONFIG_IA32_EMULATION */
}
 #endif /* CONFIG_SECCOMP_FILTER_JIT

[PATCH v3 -next 1/2] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W

2013-05-02 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in x86 JIT, by simply calling seccomp_bpf_load().

SEEN_SKBREF was suggested by Eric Dumazet.  SEEN_SKBREF shouldn't be
set in seccomp filters.

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: Daniel Borkmann dbork...@redhat.com
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Will Drewry w...@chromium.org
Cc: Eric Dumazet eduma...@google.com
Cc: Russell King li...@arm.linux.org.uk
Cc: David Laight david.lai...@aculab.com
Cc: David S. Miller da...@davemloft.net
Cc: Andrew Morton a...@linux-foundation.org
Cc: Nicolas Schichan nschic...@freebox.fr
---
 arch/x86/Kconfig|   1 +
 arch/x86/net/bpf_jit_comp.c | 112 +++-
 2 files changed, 91 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e8fff2f4..f7e1848 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,6 +93,7 @@ config X86
select IRQ_FORCED_THREADING
select USE_GENERIC_SMP_HELPERS if SMP
select HAVE_BPF_JIT if X86_64
+   select HAVE_SECCOMP_FILTER_JIT if X86_64
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select CLKEVT_I8253
select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9659817..64c72aa 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -107,9 +107,13 @@ do {   
\
goto cond_branch
 
 
-#define SEEN_DATAREF 1 /* might call external helpers */
-#define SEEN_XREG2 /* ebx is used */
-#define SEEN_MEM 4 /* use mem[] for temporary storage */
+#define SEEN_DATAREF (1  0) /* might call external skb helpers */
+#define SEEN_XREG(1  1) /* ebx is used */
+#define SEEN_MEM (1  2) /* use mem[] for temporary storage */
+#define SEEN_SKBREF  (1  3) /* use pointer to skb */
+#define SEEN_SECCOMP (1  4) /* seccomp filters */
+
+#define NEED_PERILOGUE(_seen) ((_seen)  (SEEN_XREG | SEEN_MEM | SEEN_DATAREF 
| SEEN_SECCOMP))
 
 static inline void bpf_flush_icache(void *start, void *end)
 {
@@ -144,7 +148,7 @@ static int pkt_type_offset(void)
return -1;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+static void *__bpf_jit_compile(struct sock_filter *filter, unsigned int flen, 
u8 seen_all)
 {
u8 temp[64];
u8 *prog;
@@ -157,15 +161,14 @@ void bpf_jit_compile(struct sk_filter *fp)
int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */
unsigned int cleanup_addr; /* epilogue code offset */
unsigned int *addrs;
-   const struct sock_filter *filter = fp-insns;
-   int flen = fp-len;
+   void *bpf_func = NULL;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kmalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/* Before first pass, make a rough estimation of addrs[]
 * each bpf instruction is translated to less than 64 bytes
@@ -177,12 +180,12 @@ void bpf_jit_compile(struct sk_filter *fp)
cleanup_addr = proglen; /* epilogue address */
 
for (pass = 0; pass  10; pass++) {
-   u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF | 
SEEN_MEM) : seen;
+   u8 seen_or_pass0 = (pass == 0) ? seen_all : seen;
/* no prologue/epilogue for trivial filters (RET something) */
proglen = 0;
prog = temp;
 
-   if (seen_or_pass0) {
+   if (NEED_PERILOGUE(seen_or_pass0)) {
EMIT4(0x55, 0x48, 0x89, 0xe5); /* push %rbp; mov 
%rsp,%rbp */
EMIT4(0x48, 0x83, 0xec, 96);/* subq  $96,%rsp   
*/
/* note : must save %rbx in case bpf_error is hit */
@@ -225,6 +228,16 @@ void bpf_jit_compile(struct sk_filter *fp)
}
}
 
+#ifdef CONFIG_SECCOMP_FILTER_JIT
+   if (seen_or_pass0  SEEN_SECCOMP) {
+   /* seccomp filters: skb must be NULL */
+   if (seen_or_pass0  (SEEN_SKBREF | SEEN_DATAREF)) {
+   pr_err_once(seccomp filters shouldn't use 
skb);
+   goto out;
+   }
+   }
+#endif /* CONFIG_SECCOMP_FILTER_JIT */
+
switch (filter[0].code) {
case BPF_S_RET_K:
case BPF_S_LD_W_LEN:
@@ -237,6 +250,7 @@ void bpf_jit_compile(struct sk_filter *fp)
case BPF_S_ANC_VLAN_TAG_PRESENT:
case BPF_S_ANC_QUEUE:
case BPF_S_ANC_PKTTYPE:
+   case BPF_S_ANC_SECCOMP_LD_W:
case BPF_S_LD_W_ABS:
case BPF_S_LD_H_ABS:
case BPF_S_LD_B_ABS:
@@ -408,7 +422,7 @@ void bpf_jit_compile(struct sk_filter *fp

Re: [PATCH v2 net-next 2/3] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-29 Thread Xi Wang

On Sat, Apr 27, 2013 at 9:21 PM, Eric Dumazet  wrote:
> I would feel more comfortable if you added :
>
> if (seen & SEEN_DATAREF) {
> pr_err_once("SECCOMP_LD_W assertion failed\n"):
> goto out;
> }
>
> This way, if BPF is changed in the future, but not the x86 JIT, we
> can have a working kernel.
>
> Ideally, we should add a SEEN_SKBREF to make sure rdi value can be
> scratched, or you just push %rdi/pop %rdi, its only one byte
> instructions.

Adding SEEN_SKBREF sounds like a good idea. :)

> Or completely optimize the thing and not call seccomp_bpf_load() at all.

This would be cool.

> (current would be loaded once in r9, task_pt_regs() would be loaded once
> in r8)

Both syscall_get_arch() and syscall_get_arguments() need to test for
the TS_COMPAT bit (in task_thread_info(current)->status); we should
load that once, too.

Thanks for the comments.  Will try a v3.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 net-next 2/3] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-29 Thread Xi Wang

On Sat, Apr 27, 2013 at 9:21 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 I would feel more comfortable if you added :

 if (seen  SEEN_DATAREF) {
 pr_err_once(SECCOMP_LD_W assertion failed\n):
 goto out;
 }

 This way, if BPF is changed in the future, but not the x86 JIT, we
 can have a working kernel.

 Ideally, we should add a SEEN_SKBREF to make sure rdi value can be
 scratched, or you just push %rdi/pop %rdi, its only one byte
 instructions.

Adding SEEN_SKBREF sounds like a good idea. :)

 Or completely optimize the thing and not call seccomp_bpf_load() at all.

This would be cool.

 (current would be loaded once in r9, task_pt_regs() would be loaded once
 in r8)

Both syscall_get_arch() and syscall_get_arguments() need to test for
the TS_COMPAT bit (in task_thread_info(current)-status); we should
load that once, too.

Thanks for the comments.  Will try a v3.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 net-next 3/3] ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-27 Thread Xi Wang

On Sat, Apr 27, 2013 at 2:27 AM, Daniel Borkmann  wrote:
> Arent't you doing here a similar thing in terms of getting arch as Eric
> criticized (Nicolas' implementation does not use that part btw.)? Also,
> even if it would be possible here, now your 2 JIT implementations differ
> in behaviour. I think this is unintended.

Eric's comment was about x86, where the audit arch could change on the
fly.  For ARM, the audit arch doesn't change---syscall_get_arch()
always returns AUDIT_ARCH_ARM.

> Besides all that, I think I also pointed you to a patch that already made
> it in for ARM, not sure why you keep posting the ARM JIT implementation?

That's why I asked in the other post if you wanted me to rebase
against linux-next or net-next.  The ARM part 3/3 is not needed if
rebased against linux-next with Nicolas's patches.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 net-next 3/3] ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-27 Thread Xi Wang

On Sat, Apr 27, 2013 at 2:27 AM, Daniel Borkmann dbork...@redhat.com wrote:
 Arent't you doing here a similar thing in terms of getting arch as Eric
 criticized (Nicolas' implementation does not use that part btw.)? Also,
 even if it would be possible here, now your 2 JIT implementations differ
 in behaviour. I think this is unintended.

Eric's comment was about x86, where the audit arch could change on the
fly.  For ARM, the audit arch doesn't change---syscall_get_arch()
always returns AUDIT_ARCH_ARM.

 Besides all that, I think I also pointed you to a patch that already made
 it in for ARM, not sure why you keep posting the ARM JIT implementation?

That's why I asked in the other post if you wanted me to rebase
against linux-next or net-next.  The ARM part 3/3 is not needed if
rebased against linux-next with Nicolas's patches.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 net-next 2/3] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in x86 JIT.

Signed-off-by: Xi Wang 
Cc: Daniel Borkmann 
Cc: Heiko Carstens 
Cc: Will Drewry 
Cc: Eric Dumazet 
Cc: Russell King 
Cc: David Laight 
Cc: "David S. Miller" 
Cc: Andrew Morton 
Cc: Nicolas Schichan 
---
 arch/x86/net/bpf_jit_comp.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8898680..5f1dafb 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -683,6 +683,17 @@ cond_branch:   f_offset = addrs[i + 
filter[i].jf] - addrs[i];
}
EMIT_COND_JMP(f_op, f_offset);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   func = (u8 *)seccomp_bpf_load;
+   t_offset = func - (image + addrs[i]);
+   /* seccomp filters don't use %rdi, %r8, %r9
+* it is safe to not save these registers
+*/
+   EMIT1_off32(0xbf, K); /* mov imm32,%edi */
+   EMIT1_off32(0xe8, t_offset); /* call 
seccomp_bpf_load */
+   break;
+#endif
default:
/* hmm, too complex filter, give up with jit 
compiler */
goto out;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 net-next 3/3] ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in ARM JIT.

Signed-off-by: Xi Wang 
Cc: Daniel Borkmann 
Cc: Heiko Carstens 
Cc: Will Drewry 
Cc: Eric Dumazet 
Cc: Russell King 
Cc: David Laight 
Cc: "David S. Miller" 
Cc: Andrew Morton 
Cc: Nicolas Schichan 
---
 arch/arm/net/bpf_jit_32.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 073b085..9bfce464 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_jit_32.h"
 
@@ -845,6 +846,19 @@ b_epilogue:
off = offsetof(struct sk_buff, queue_mapping);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   if (k == offsetof(struct seccomp_data, arch)) {
+   emit_mov_i(r_A, AUDIT_ARCH_ARM, ctx);
+   break;
+   }
+   ctx->seen |= SEEN_CALL;
+   emit_mov_i(ARM_R3, (u32)seccomp_bpf_load, ctx);
+   emit_mov_i(ARM_R0, k, ctx);
+   emit_blx_r(ARM_R3, ctx);
+   emit(ARM_MOV_R(r_A, ARM_R0), ctx);
+   break;
+#endif
default:
return -1;
}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 net-next 1/3] filter: refactor BPF JIT for seccomp filters

2013-04-26 Thread Xi Wang

Currently, bpf_jit_compile() and bpf_jit_free() take an sk_filter,
which seccomp filters cannot reuse.

Change bpf_jit_compile() to take a pointer to BPF instructions and
an instruction length, and to return a JITted function.

Change bpf_jit_free() to take a JITted function.

Add JIT calls for seccomp filters.

Signed-off-by: Xi Wang 
Cc: Daniel Borkmann 
Cc: Heiko Carstens 
Cc: Will Drewry 
Cc: Eric Dumazet 
Cc: Russell King 
Cc: David Laight 
Cc: "David S. Miller" 
Cc: Andrew Morton 
Cc: Nicolas Schichan 
---
 arch/arm/net/bpf_jit_32.c   | 50 -
 arch/powerpc/net/bpf_jit_comp.c | 36 ++---
 arch/s390/net/bpf_jit_comp.c| 31 -
 arch/sparc/net/bpf_jit_comp.c   | 22 +-
 arch/x86/net/bpf_jit_comp.c | 21 +
 include/linux/filter.h  | 16 -
 kernel/seccomp.c|  6 -
 net/core/filter.c   |  6 ++---
 8 files changed, 97 insertions(+), 91 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 1a643ee..073b085 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -55,7 +55,8 @@
 #define FLAG_NEED_X_RESET  (1 << 0)
 
 struct jit_ctx {
-   const struct sk_filter *skf;
+   struct sock_filter *insns;
+   unsigned len;
unsigned idx;
unsigned prologue_bytes;
int ret0_fp_idx;
@@ -131,8 +132,8 @@ static u16 saved_regs(struct jit_ctx *ctx)
 {
u16 ret = 0;
 
-   if ((ctx->skf->len > 1) ||
-   (ctx->skf->insns[0].code == BPF_S_RET_A))
+   if ((ctx->len > 1) ||
+   (ctx->insns[0].code == BPF_S_RET_A))
ret |= 1 << r_A;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -181,7 +182,7 @@ static inline bool is_load_to_a(u16 inst)
 static void build_prologue(struct jit_ctx *ctx)
 {
u16 reg_set = saved_regs(ctx);
-   u16 first_inst = ctx->skf->insns[0].code;
+   u16 first_inst = ctx->insns[0].code;
u16 off;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -279,7 +280,7 @@ static u16 imm_offset(u32 k, struct jit_ctx *ctx)
ctx->imms[i] = k;
 
/* constants go just after the epilogue */
-   offset =  ctx->offsets[ctx->skf->len];
+   offset =  ctx->offsets[ctx->len];
offset += ctx->prologue_bytes;
offset += ctx->epilogue_bytes;
offset += i * 4;
@@ -419,7 +420,7 @@ static inline void emit_err_ret(u8 cond, struct jit_ctx 
*ctx)
emit(ARM_MOV_R(ARM_R0, ARM_R0), ctx);
} else {
_emit(cond, ARM_MOV_I(ARM_R0, 0), ctx);
-   _emit(cond, ARM_B(b_imm(ctx->skf->len, ctx)), ctx);
+   _emit(cond, ARM_B(b_imm(ctx->len, ctx)), ctx);
}
 }
 
@@ -469,14 +470,13 @@ static inline void update_on_xread(struct jit_ctx *ctx)
 static int build_body(struct jit_ctx *ctx)
 {
void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
-   const struct sk_filter *prog = ctx->skf;
const struct sock_filter *inst;
unsigned i, load_order, off, condt;
int imm12;
u32 k;
 
-   for (i = 0; i < prog->len; i++) {
-   inst = &(prog->insns[i]);
+   for (i = 0; i < ctx->len; i++) {
+   inst = &(ctx->insns[i]);
/* K as an immediate value operand */
k = inst->k;
 
@@ -769,8 +769,8 @@ cmp_x:
ctx->ret0_fp_idx = i;
emit_mov_i(ARM_R0, k, ctx);
 b_epilogue:
-   if (i != ctx->skf->len - 1)
-   emit(ARM_B(b_imm(prog->len, ctx)), ctx);
+   if (i != ctx->len - 1)
+   emit(ARM_B(b_imm(ctx->len, ctx)), ctx);
break;
case BPF_S_MISC_TAX:
/* X = A */
@@ -858,22 +858,24 @@ b_epilogue:
 }
 
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
struct jit_ctx ctx;
unsigned tmp_idx;
unsigned alloc_size;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
memset(, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   ctx.insns   = filter;
+   ctx.len = flen;
+   ctx.ret0_fp_idx = -1;
 
-   ctx.offsets = kzalloc(4 * (ctx.skf->len + 1), GFP_KERNEL);
+   ctx.offsets = kzalloc(4 * (ctx.len + 1), GFP_KERNEL);
if (ctx.offsets == NULL)
-   return;
+   return bpf_func;
 
/* fake pass to fill in the ctx->seen */
if (unlikely(build_body()))
@@ -919,12 +921,12 @@ void bpf_jit_compile(struct sk_filter *fp)

[PATCH v2 net-next 0/3] seccomp filter JIT

2013-04-26 Thread Xi Wang

The first patch refactors bpf_jit_compile()/bpf_jit_free() to provide
a unified interface for both packet and seccomp filters.

The next two patches implement JIT for seccomp filters on x86 and ARM,
respectively.

Thanks to Heiko Carstens for testing the build on s390 and Eric Dumazet
for the comments on x86 JIT.

Changes:
  make better patch splitting
  remove arch at jit time in x86
  add more comments in x86

Xi Wang (3):
  filter: refactor BPF JIT for seccomp filters
  x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction
  ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

 arch/arm/net/bpf_jit_32.c   | 64 +
 arch/powerpc/net/bpf_jit_comp.c | 36 +++
 arch/s390/net/bpf_jit_comp.c| 31 ++--
 arch/sparc/net/bpf_jit_comp.c   | 22 +++---
 arch/x86/net/bpf_jit_comp.c | 32 ++---
 include/linux/filter.h  | 16 +++
 kernel/seccomp.c|  6 +++-
 net/core/filter.c   |  6 ++--
 8 files changed, 122 insertions(+), 91 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/3] ARM: net: bpf_jit: make code generation less dependent on struct sk_filter.

2013-04-26 Thread Xi Wang

Thanks for CCing.  One way to clean up this would be to refactor the
bpf jit interface as:

  bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen);
  void bpf_jit_free(bpf_func_t bpf_func);

Then both packet and seccomp filters can share the unified interface.
Also, we don't need seccomp_filter_get_len() and other helpers.

Do you want me to rebase my patch against linux-next and see how that goes?

- xi

On Fri, Apr 26, 2013 at 6:01 PM, Daniel Borkmann  wrote:
> On 04/26/2013 10:09 PM, Andrew Morton wrote:
>>
>> On Fri, 26 Apr 2013 21:47:46 +0200 Daniel Borkmann 
>> wrote:
>>>
>>> On 04/26/2013 09:26 PM, Andrew Morton wrote:

 On Fri, 26 Apr 2013 16:04:44 +0200 Arnd Bergmann  wrote:
>
> On Wednesday 24 April 2013 19:27:08 Nicolas Schichan wrote:
>>
>> @@ -858,7 +858,7 @@ b_epilogue:
>>}
>>
>>
>> -void bpf_jit_compile(struct sk_filter *fp)
>> +static void __bpf_jit_compile(struct jit_ctx *out_ctx)
>>{
>>   struct jit_ctx ctx;
>>   unsigned tmp_idx;
>> @@ -867,11 +867,10 @@ void bpf_jit_compile(struct sk_filter *fp)
>>   if (!bpf_jit_enable)
>>   return;
>>
>> -   memset(, 0, sizeof(ctx));
>> -   ctx.skf = fp;
>> +   ctx = *out_ctx;
>>   ctx.ret0_fp_idx = -1;
>>
>> -   ctx.offsets = kzalloc(4 * (ctx.skf->len + 1), GFP_KERNEL);
>> +   ctx.offsets = kzalloc(4 * (ctx.prog_len + 1), GFP_KERNEL);
>>   if (ctx.offsets == NULL)
>>   return;
>>
>> @@ -921,13 +920,26 @@ void bpf_jit_compile(struct sk_filter *fp)
>>   print_hex_dump(KERN_INFO, "BPF JIT code: ",
>>  DUMP_PREFIX_ADDRESS, 16, 4,
>> ctx.target,
>>  alloc_size, false);
>> -
>> -   fp->bpf_func = (void *)ctx.target;
>>out:
>>   kfree(ctx.offsets);
>> +
>> +   *out_ctx = ctx;
>>   return;
>
>
> This part of the patch, in combination with 79617801e "filter:
> bpf_jit_comp:
> refactor and unify BPF JIT image dump output" is now causing build
> errors
> in linux-next:
>
> arch/arm/net/bpf_jit_32.c: In function '__bpf_jit_compile':
> arch/arm/net/bpf_jit_32.c:930:16: error: 'fp' undeclared (first use in
> this function)
>  bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);


 Thanks, I did this.  There may be a smarter way...
>>>
>>>
>>> I think also seccomp_jit_compile() would need this change then, otherwise
>>> the build
>>> with CONFIG_SECCOMP_FILTER_JIT might break.
>>
>>
>> urgh, that tears it.
>>
>>> I can fix this up for you if not already applied. I presume it's against
>>> linux-next tree?
>>
>>
>> Yup, please send something.
>
>
> Patch is attached. However, I currently don't have an ARM toolchain at hand,
> so
> uncompiled, untested.
>
> @Nicolas, Xi (cc, ref: http://thread.gmane.org/gmane.linux.kernel/1481464):
>
> If there is someday support for other archs as well, it would be nice if we
> do not have each time duplicated seccomp_jit_compile() etc functions in each
> JIT implementation, i.e. because they do basically the same. So follow-up
> {fix,clean}up is appreciated.
>
> Also, I find it a bit weird that seccomp_filter_get_len() and some other
> _one-line_ functions from kernel/seccomp.c are not placed into the
> corresponding header file as inlines.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

Not sure how many you are speaking for when you say "We are very dumb".  :)

Thanks for catching this.  I'l remove this arch thing in v2.

To address your other concern about registers, I'll add some comments
to the code, something like:

"%rdi,%r8,%r9 are not used by seccomp filters; it's safe to not save them."

- xi

On Fri, Apr 26, 2013 at 12:14 PM, Eric Dumazet  wrote:
> On Fri, 2013-04-26 at 12:02 -0400, Xi Wang wrote:
>> On Fri, Apr 26, 2013 at 11:11 AM, Eric Dumazet  
>> wrote:
>> > 1) 'current' at the time the code is jitted (compiled) is not the
>> > 'current' at the time the filter will be evaluated.
>> >
>> > On x86_64, if CONFIG_IA32_EMULATION=y, syscall_get_arch() evaluates to :
>> >
>> > if (task_thread_info(task)->status & TS_COMPAT)
>> > return AUDIT_ARCH_I386;
>> > return AUDIT_ARCH_X86_64;
>> >
>> > So your code is completely wrong.
>>
>> Just to be clear, are you worrying about a process changing its
>> personality after installing seccomp filters?
>
> You didn't explained how things worked.
>
> Are you assuming we network guys know everything ?
>
> Just to make it very clear :
>
> We are very dumb and you must explain us everything.
>
> If process would not change personality, why do we have get_arch() at
> all ? Why isn't it optimized outside of the JIT itself, in the generic
> seccomp checker, its a single "A = K" instruction after all.
>
> Why this part is even in the x86 BPF JIT ?
>
> To me it looks like _if_ get_arch() is provided in BPF, its for a
> reason, and your implementation looks very suspicious, if not buggy.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 11:11 AM, Eric Dumazet  wrote:
> 1) 'current' at the time the code is jitted (compiled) is not the
> 'current' at the time the filter will be evaluated.
>
> On x86_64, if CONFIG_IA32_EMULATION=y, syscall_get_arch() evaluates to :
>
> if (task_thread_info(task)->status & TS_COMPAT)
> return AUDIT_ARCH_I386;
> return AUDIT_ARCH_X86_64;
>
> So your code is completely wrong.

Just to be clear, are you worrying about a process changing its
personality after installing seccomp filters?

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 11:43 AM, Eric Dumazet  wrote:
> I do not know.
>
> This is not explained in your changelog or in any comment.
>
> You have to make the full analysis yourself and make us comfortable with
> the results.
>
> You send patches and ask us to spend hours on it, this is not how it
> works.

"do we really need that for seccomp?" is not asking.  I just tried to
explain in a gentle way.

%rdi,%r8,%r9 are not used by seccomp filters so I removed the push/pop
part.  I agree that I should explain the details in the code comments
or logs.  Thanks for catching that.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 11:11 AM, Eric Dumazet  wrote:
> 2) Calling a function potentially destroys some registers.
>%rdi,%r8,%r9 for instance, so we are going to crash very easily.
>
> I dont know, I feel a bit uncomfortable having to explain this to
> someone sending security related patches...

My old code did save these registers.  But, do we really need that for
seccomp?  For example, %rdi (skb) is always NULL and never used by
seccomp filters.  Did I miss anything?

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 10:18 AM, Eric Dumazet  wrote:
> On Fri, 2013-04-26 at 03:51 -0400, Xi Wang wrote:
>
>> +#ifdef CONFIG_SECCOMP_FILTER
>> + case BPF_S_ANC_SECCOMP_LD_W:
>> + if (K == offsetof(struct seccomp_data, arch)) {
>> + int arch = syscall_get_arch(current, 
>> NULL);
>> +
>> + EMIT1_off32(0xb8, arch); /* mov 
>> arch,%eax */
>> + break;
>> + }
>> + func = (u8 *)seccomp_bpf_load;
>> + t_offset = func - (image + addrs[i]);
>> + EMIT1_off32(0xbf, K); /* mov imm32,%edi */
>> + EMIT1_off32(0xe8, t_offset); /* call 
>> seccomp_bpf_load */
>> + break;
>> +#endif
>
> This seems seriously wrong to me.

Can you elaborate?

> This cannot have been tested at all.

Thanks to QEMU for hiding bugs then. :)

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 0/6] seccomp filter JIT

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 7:46 AM, Daniel Borkmann  wrote:
> I think BPF JIT for seccomp on ARM recently got applied to -mm tree
> if I'm not mistaken. It was from Nicolas Schichan (cc):
>
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/233416/

Thanks for the pointer.

For the ARM part, looks like Nicolas's patch requires to implement two
wrappers for each arch:

  void seccomp_jit_compile(struct seccomp_filter *fp);
  void seccomp_jit_free(struct seccomp_filter *fp);

The implementation of these wrappers is almost identical to:

  void bpf_jit_compile(struct sk_filter *fp);
  void bpf_jit_free(struct sk_filter *fp);

While this patch uses a unified interface for both packet & seccomp filters.

  bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen);
  void bpf_jit_free(bpf_func_t bpf_func);

Shouldn't be hard to merge though.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 0/6] seccomp filter JIT

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 7:46 AM, Heiko Carstens
 wrote:
> And build fine on s390.

Thanks!

>> Btw. are there any test cases around for BPF JIT?
>> Not only for the new seccomp but also netfilter?
>
> This however is still a valid question.

Not sure about test cases for BPF JIT in general.  I used the examples
under "samples/seccomp/" for testing; they are seccomp filters though.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 1/6] filter: refactor BPF JIT for seccomp filters

2013-04-26 Thread Xi Wang

Currently, bpf_jit_compile() and bpf_jit_free() takes an sk_filter,
which seccomp filters cannot reuse.

Change bpf_jit_compile() to take a pointer to BPF instructions and
the length, and to return a JITted function.

Change bpf_jit_free() to take a JITted function.

Add JIT calls for seccomp filters.

Signed-off-by: Xi Wang 
---
 include/linux/filter.h | 16 ++--
 kernel/seccomp.c   |  6 +-
 net/core/filter.c  |  6 ++
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index d1248f4..8743093 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -21,12 +21,14 @@ struct compat_sock_fprog {
 struct sk_buff;
 struct sock;
 
+typedef unsigned int (*bpf_func_t)(const struct sk_buff *skb,
+  const struct sock_filter *filter);
+
 struct sk_filter
 {
atomic_trefcnt;
unsigned intlen;/* Number of filter blocks */
-   unsigned int(*bpf_func)(const struct sk_buff *skb,
-   const struct sock_filter *filter);
+   bpf_func_t  bpf_func;
struct rcu_head rcu;
struct sock_filter  insns[0];
 };
@@ -48,11 +50,12 @@ extern int sk_chk_filter(struct sock_filter *filter, 
unsigned int flen);
 extern int sk_get_filter(struct sock *sk, struct sock_filter __user *filter, 
unsigned len);
 
 #ifdef CONFIG_BPF_JIT
+#include 
 #include 
 #include 
 
-extern void bpf_jit_compile(struct sk_filter *fp);
-extern void bpf_jit_free(struct sk_filter *fp);
+extern bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int 
flen);
+extern void bpf_jit_free(bpf_func_t bpf_func);
 
 static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
u32 pass, void *image)
@@ -65,10 +68,11 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned 
int proglen,
 }
 #define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
 #else
-static inline void bpf_jit_compile(struct sk_filter *fp)
+static inline bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned 
int flen)
 {
+   return sk_run_filter;
 }
-static inline void bpf_jit_free(struct sk_filter *fp)
+static inline void bpf_jit_free(bpf_func_t bpf_func)
 {
 }
 #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5af44b5..f784feb 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -55,6 +55,7 @@ struct seccomp_filter {
atomic_t usage;
struct seccomp_filter *prev;
unsigned short len;  /* Instruction count */
+   bpf_func_t bpf_func;
struct sock_filter insns[];
 };
 
@@ -211,7 +212,7 @@ static u32 seccomp_run_filters(int syscall)
 * value always takes priority (ignoring the DATA).
 */
for (f = current->seccomp.filter; f; f = f->prev) {
-   u32 cur_ret = sk_run_filter(NULL, f->insns);
+   u32 cur_ret = SK_RUN_FILTER(f, NULL);
if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
ret = cur_ret;
}
@@ -273,6 +274,8 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
if (ret)
goto fail;
 
+   filter->bpf_func = bpf_jit_compile(filter->insns, filter->len);
+
/*
 * If there is an existing filter, make it the prev and don't drop its
 * task reference.
@@ -330,6 +333,7 @@ void put_seccomp_filter(struct task_struct *tsk)
while (orig && atomic_dec_and_test(>usage)) {
struct seccomp_filter *freeme = orig;
orig = orig->prev;
+   bpf_jit_free(freeme->bpf_func);
kfree(freeme);
}
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index dad2a17..0a7900b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -643,7 +643,7 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
 {
struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
-   bpf_jit_free(fp);
+   bpf_jit_free(fp->bpf_func);
kfree(fp);
 }
 EXPORT_SYMBOL(sk_filter_release_rcu);
@@ -652,13 +652,11 @@ static int __sk_prepare_filter(struct sk_filter *fp)
 {
int err;
 
-   fp->bpf_func = sk_run_filter;
-
err = sk_chk_filter(fp->insns, fp->len);
if (err)
return err;
 
-   bpf_jit_compile(fp);
+   fp->bpf_func = bpf_jit_compile(fp->insns, fp->len);
return 0;
 }
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 4/6] PPC: net: bpf_jit_comp: refactor the BPF JIT interface

2013-04-26 Thread Xi Wang

Implement the refactored bpf_jit_compile() and bpf_jit_free().

Signed-off-by: Xi Wang 
---
 arch/powerpc/net/bpf_jit_comp.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c427ae3..a82e400 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -31,11 +31,11 @@ static inline void bpf_flush_icache(void *start, void *end)
flush_icache_range((unsigned long)start, (unsigned long)end);
 }
 
-static void bpf_jit_build_prologue(struct sk_filter *fp, u32 *image,
+static void bpf_jit_build_prologue(struct sock_filter *filter,
+  u32 *image,
   struct codegen_context *ctx)
 {
int i;
-   const struct sock_filter *filter = fp->insns;
 
if (ctx->seen & (SEEN_MEM | SEEN_DATAREF)) {
/* Make stackframe */
@@ -135,12 +135,12 @@ static void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : 
func##_positive_offset)
 
 /* Assemble the body code between the prologue & epilogue. */
-static int bpf_jit_build_body(struct sk_filter *fp, u32 *image,
+static int bpf_jit_build_body(struct sock_filter *filter,
+ unsigned int flen,
+ u32 *image,
  struct codegen_context *ctx,
  unsigned int *addrs)
 {
-   const struct sock_filter *filter = fp->insns;
-   int flen = fp->len;
u8 *func;
unsigned int true_cond;
int i;
@@ -564,7 +564,7 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 
*image,
return 0;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
unsigned int proglen;
unsigned int alloclen;
@@ -573,14 +573,14 @@ void bpf_jit_compile(struct sk_filter *fp)
unsigned int *addrs;
struct codegen_context cgctx;
int pass;
-   int flen = fp->len;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/*
 * There are multiple assembly passes as the generated code will change
@@ -636,7 +636,7 @@ void bpf_jit_compile(struct sk_filter *fp)
cgctx.seen = 0;
cgctx.pc_ret0 = -1;
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, , addrs))
+   if (bpf_jit_build_body(filter, flen, 0, , addrs))
/* We hit something illegal or unsupported. */
goto out;
 
@@ -645,7 +645,7 @@ void bpf_jit_compile(struct sk_filter *fp)
 * update ctgtx.idx as it pretends to output instructions, then we can
 * calculate total size from idx.
 */
-   bpf_jit_build_prologue(fp, 0, );
+   bpf_jit_build_prologue(filter, 0, );
bpf_jit_build_epilogue(0, );
 
proglen = cgctx.idx * 4;
@@ -661,8 +661,8 @@ void bpf_jit_compile(struct sk_filter *fp)
for (pass = 1; pass < 3; pass++) {
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
-   bpf_jit_build_prologue(fp, code_base, );
-   bpf_jit_build_body(fp, code_base, , addrs);
+   bpf_jit_build_prologue(filter, code_base, );
+   bpf_jit_build_body(filter, flen, code_base, , addrs);
bpf_jit_build_epilogue(code_base, );
 
if (bpf_jit_enable > 1)
@@ -681,11 +681,11 @@ void bpf_jit_compile(struct sk_filter *fp)
/* Function descriptor nastiness: Address + TOC */
((u64 *)image)[0] = (u64)code_base;
((u64 *)image)[1] = local_paca->kernel_toc;
-   fp->bpf_func = (void *)image;
+   bpf_func = (void *)image;
}
 out:
kfree(addrs);
-   return;
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -696,10 +696,10 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
-   if (fp->bpf_func != sk_run_filter) {
-   struct work_struct *work = (struct work_struct *)fp->bpf_func;
+   if (bpf_func != sk_run_filter) {
+   struct work_struct *work = (struct work_struct *)bpf_func;
 
INIT_WORK(work, jit_free_defer);
schedule_work(work);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line

[RFC PATCH net-next 5/6] sparc: bpf_jit_comp: refactor the BPF JIT interface

2013-04-26 Thread Xi Wang

Implement the refactored bpf_jit_compile() and bpf_jit_free().

Signed-off-by: Xi Wang 
---
 arch/sparc/net/bpf_jit_comp.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index d36a85e..15e6513 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -354,21 +354,21 @@ do {  *prog++ = BR_OPC | WDISP22(OFF);
\
  * emit_jump() calls with adjusted offsets.
  */
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
unsigned int cleanup_addr, proglen, oldproglen = 0;
u32 temp[8], *prog, *func, seen = 0, pass;
-   const struct sock_filter *filter = fp->insns;
-   int i, flen = fp->len, pc_ret0 = -1;
+   int i, pc_ret0 = -1;
unsigned int *addrs;
void *image;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kmalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/* Before first pass, make a rough estimation of addrs[]
 * each bpf instruction is translated to less than 64 bytes
@@ -763,7 +763,7 @@ cond_branch:f_offset = addrs[i + 
filter[i].jf];
pr_err("bpb_jit_compile fatal error\n");
kfree(addrs);
module_free(NULL, image);
-   return;
+   return bpf_func;
}
memcpy(image + proglen, temp, ilen);
}
@@ -799,11 +799,11 @@ cond_branch:  f_offset = addrs[i + 
filter[i].jf];
 
if (image) {
bpf_flush_icache(image, image + proglen);
-   fp->bpf_func = (void *)image;
+   bpf_func = (void *)image;
}
 out:
kfree(addrs);
-   return;
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -814,10 +814,10 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
-   if (fp->bpf_func != sk_run_filter) {
-   struct work_struct *work = (struct work_struct *)fp->bpf_func;
+   if (bpf_func != sk_run_filter) {
+   struct work_struct *work = (struct work_struct *)bpf_func;
 
INIT_WORK(work, jit_free_defer);
schedule_work(work);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 3/6] ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in ARM JIT.

Signed-off-by: Xi Wang 
---
 arch/arm/net/bpf_jit_32.c | 64 +--
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 1a643ee..9bfce464 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_jit_32.h"
 
@@ -55,7 +56,8 @@
 #define FLAG_NEED_X_RESET  (1 << 0)
 
 struct jit_ctx {
-   const struct sk_filter *skf;
+   struct sock_filter *insns;
+   unsigned len;
unsigned idx;
unsigned prologue_bytes;
int ret0_fp_idx;
@@ -131,8 +133,8 @@ static u16 saved_regs(struct jit_ctx *ctx)
 {
u16 ret = 0;
 
-   if ((ctx->skf->len > 1) ||
-   (ctx->skf->insns[0].code == BPF_S_RET_A))
+   if ((ctx->len > 1) ||
+   (ctx->insns[0].code == BPF_S_RET_A))
ret |= 1 << r_A;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -181,7 +183,7 @@ static inline bool is_load_to_a(u16 inst)
 static void build_prologue(struct jit_ctx *ctx)
 {
u16 reg_set = saved_regs(ctx);
-   u16 first_inst = ctx->skf->insns[0].code;
+   u16 first_inst = ctx->insns[0].code;
u16 off;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -279,7 +281,7 @@ static u16 imm_offset(u32 k, struct jit_ctx *ctx)
ctx->imms[i] = k;
 
/* constants go just after the epilogue */
-   offset =  ctx->offsets[ctx->skf->len];
+   offset =  ctx->offsets[ctx->len];
offset += ctx->prologue_bytes;
offset += ctx->epilogue_bytes;
offset += i * 4;
@@ -419,7 +421,7 @@ static inline void emit_err_ret(u8 cond, struct jit_ctx 
*ctx)
emit(ARM_MOV_R(ARM_R0, ARM_R0), ctx);
} else {
_emit(cond, ARM_MOV_I(ARM_R0, 0), ctx);
-   _emit(cond, ARM_B(b_imm(ctx->skf->len, ctx)), ctx);
+   _emit(cond, ARM_B(b_imm(ctx->len, ctx)), ctx);
}
 }
 
@@ -469,14 +471,13 @@ static inline void update_on_xread(struct jit_ctx *ctx)
 static int build_body(struct jit_ctx *ctx)
 {
void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
-   const struct sk_filter *prog = ctx->skf;
const struct sock_filter *inst;
unsigned i, load_order, off, condt;
int imm12;
u32 k;
 
-   for (i = 0; i < prog->len; i++) {
-   inst = &(prog->insns[i]);
+   for (i = 0; i < ctx->len; i++) {
+   inst = &(ctx->insns[i]);
/* K as an immediate value operand */
k = inst->k;
 
@@ -769,8 +770,8 @@ cmp_x:
ctx->ret0_fp_idx = i;
emit_mov_i(ARM_R0, k, ctx);
 b_epilogue:
-   if (i != ctx->skf->len - 1)
-   emit(ARM_B(b_imm(prog->len, ctx)), ctx);
+   if (i != ctx->len - 1)
+   emit(ARM_B(b_imm(ctx->len, ctx)), ctx);
break;
case BPF_S_MISC_TAX:
/* X = A */
@@ -845,6 +846,19 @@ b_epilogue:
off = offsetof(struct sk_buff, queue_mapping);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   if (k == offsetof(struct seccomp_data, arch)) {
+   emit_mov_i(r_A, AUDIT_ARCH_ARM, ctx);
+   break;
+   }
+   ctx->seen |= SEEN_CALL;
+   emit_mov_i(ARM_R3, (u32)seccomp_bpf_load, ctx);
+   emit_mov_i(ARM_R0, k, ctx);
+   emit_blx_r(ARM_R3, ctx);
+   emit(ARM_MOV_R(r_A, ARM_R0), ctx);
+   break;
+#endif
default:
return -1;
}
@@ -858,22 +872,24 @@ b_epilogue:
 }
 
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
struct jit_ctx ctx;
unsigned tmp_idx;
unsigned alloc_size;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
memset(, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   ctx.insns   = filter;
+   ctx.len = flen;
+   ctx.ret0_fp_idx = -1;
 
-   ctx.offsets = kzalloc(4 * (ctx.skf->len + 1), GFP_KERNEL);
+   ctx.offsets = kzalloc(4 * (ctx.len + 1), GFP_KERNEL);
if (ctx.offsets == NULL)
-   return;
+   return bp

[RFC PATCH net-next 6/6] s390/bpf,jit: refactor the BPF JIT interface

2013-04-26 Thread Xi Wang

Implement the refactored bpf_jit_compile() and bpf_jit_free().

Signed-off-by: Xi Wang 
---
 arch/s390/net/bpf_jit_comp.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 0972e91..7966e0c 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -738,19 +738,19 @@ out:
return -1;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
unsigned long size, prg_len, lit_len;
struct bpf_jit jit, cjit;
unsigned int *addrs;
int pass, i;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
-   addrs = kmalloc(fp->len * sizeof(*addrs), GFP_KERNEL);
+   return bpf_func;
+   addrs = kzalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
-   memset(addrs, 0, fp->len * sizeof(*addrs));
+   return bpf_func;
memset(, 0, sizeof(cjit));
memset(, 0, sizeof(cjit));
 
@@ -759,10 +759,10 @@ void bpf_jit_compile(struct sk_filter *fp)
jit.lit = jit.mid;
 
bpf_jit_prologue();
-   bpf_jit_noleaks(, fp->insns);
-   for (i = 0; i < fp->len; i++) {
-   if (bpf_jit_insn(, fp->insns + i, addrs, i,
-i == fp->len - 1))
+   bpf_jit_noleaks(, filter);
+   for (i = 0; i < flen; i++) {
+   if (bpf_jit_insn(, filter + i, addrs, i,
+i == flen - 1))
goto out;
}
bpf_jit_epilogue();
@@ -789,8 +789,8 @@ void bpf_jit_compile(struct sk_filter *fp)
cjit = jit;
}
if (bpf_jit_enable > 1) {
-   pr_err("flen=%d proglen=%lu pass=%d image=%p\n",
-  fp->len, jit.end - jit.start, pass, jit.start);
+   pr_err("flen=%u proglen=%lu pass=%d image=%p\n",
+  flen, jit.end - jit.start, pass, jit.start);
if (jit.start) {
printk(KERN_ERR "JIT code:\n");
print_fn_code(jit.start, jit.mid - jit.start);
@@ -800,9 +800,10 @@ void bpf_jit_compile(struct sk_filter *fp)
}
}
if (jit.start)
-   fp->bpf_func = (void *) jit.start;
+   bpf_func = (void *) jit.start;
 out:
kfree(addrs);
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -813,13 +814,13 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
struct work_struct *work;
 
-   if (fp->bpf_func == sk_run_filter)
+   if (bpf_func == sk_run_filter)
return;
-   work = (struct work_struct *)fp->bpf_func;
+   work = (struct work_struct *)bpf_func;
INIT_WORK(work, jit_free_defer);
schedule_work(work);
 }
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in x86 JIT.

Signed-off-by: Xi Wang 
---
 arch/x86/net/bpf_jit_comp.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index f66b540..03c9c81 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -8,10 +8,11 @@
  * of the License.
  */
 #include 
-#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Conventions :
@@ -144,7 +145,7 @@ static int pkt_type_offset(void)
return -1;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
u8 temp[64];
u8 *prog;
@@ -157,15 +158,14 @@ void bpf_jit_compile(struct sk_filter *fp)
int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */
unsigned int cleanup_addr; /* epilogue code offset */
unsigned int *addrs;
-   const struct sock_filter *filter = fp->insns;
-   int flen = fp->len;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kmalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/* Before first pass, make a rough estimation of addrs[]
 * each bpf instruction is translated to less than 64 bytes
@@ -684,6 +684,20 @@ cond_branch:   f_offset = addrs[i + 
filter[i].jf] - addrs[i];
}
EMIT_COND_JMP(f_op, f_offset);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   if (K == offsetof(struct seccomp_data, arch)) {
+   int arch = syscall_get_arch(current, 
NULL);
+
+   EMIT1_off32(0xb8, arch); /* mov 
arch,%eax */
+   break;
+   }
+   func = (u8 *)seccomp_bpf_load;
+   t_offset = func - (image + addrs[i]);
+   EMIT1_off32(0xbf, K); /* mov imm32,%edi */
+   EMIT1_off32(0xe8, t_offset); /* call 
seccomp_bpf_load */
+   break;
+#endif
default:
/* hmm, too complex filter, give up with jit 
compiler */
goto out;
@@ -694,7 +708,7 @@ cond_branch:f_offset = addrs[i + 
filter[i].jf] - addrs[i];
pr_err("bpb_jit_compile fatal error\n");
kfree(addrs);
module_free(NULL, image);
-   return;
+   return bpf_func;
}
memcpy(image + proglen, temp, ilen);
}
@@ -731,11 +745,11 @@ cond_branch:  f_offset = addrs[i + 
filter[i].jf] - addrs[i];
 
if (image) {
bpf_flush_icache(image, image + proglen);
-   fp->bpf_func = (void *)image;
+   bpf_func = (void *)image;
}
 out:
kfree(addrs);
-   return;
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -746,10 +760,10 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
-   if (fp->bpf_func != sk_run_filter) {
-   struct work_struct *work = (struct work_struct *)fp->bpf_func;
+   if (bpf_func != sk_run_filter) {
+   struct work_struct *work = (struct work_struct *)bpf_func;
 
INIT_WORK(work, jit_free_defer);
schedule_work(work);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 0/6] seccomp filter JIT

2013-04-26 Thread Xi Wang

This patchset brings JIT support to seccomp filters for x86_64 and ARM.
It is against the net-next tree.

The current BPF JIT interface only accepts sk_filter, not seccomp_filter.
Patch 1/6 refactors the interface to make it more general.

With the refactored interface, patches 2/6 and 3/6 implement the seccomp
BPF_S_ANC_SECCOMP_LD_W instruction in x86 & ARM JIT.

Status:

* x86_64 & ARM: JIT tested with seccomp examples.

* powerpc [4/6]: no seccomp change - compile checked.

* sparc [5/6] & s390 [6/6]: no seccomp change - untested.

Sorry I have no sparc or s390 build environment here.  Can someone help
check 5/6 and 6/6?  Thanks.

Xi Wang (6):
  filter: refactor BPF JIT for seccomp filters
  x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction
  ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction
  PPC: net: bpf_jit_comp: refactor the BPF JIT interface
  sparc: bpf_jit_comp: refactor the BPF JIT interface
  s390/bpf,jit: refactor the BPF JIT interface

 arch/arm/net/bpf_jit_32.c   | 64 +
 arch/powerpc/net/bpf_jit_comp.c | 36 +++
 arch/s390/net/bpf_jit_comp.c| 31 ++--
 arch/sparc/net/bpf_jit_comp.c   | 22 +++---
 arch/x86/net/bpf_jit_comp.c | 38 
 include/linux/filter.h  | 16 +++
 kernel/seccomp.c|  6 +++-
 net/core/filter.c   |  6 ++--
 8 files changed, 127 insertions(+), 92 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 0/6] seccomp filter JIT

2013-04-26 Thread Xi Wang

This patchset brings JIT support to seccomp filters for x86_64 and ARM.
It is against the net-next tree.

The current BPF JIT interface only accepts sk_filter, not seccomp_filter.
Patch 1/6 refactors the interface to make it more general.

With the refactored interface, patches 2/6 and 3/6 implement the seccomp
BPF_S_ANC_SECCOMP_LD_W instruction in x86  ARM JIT.

Status:

* x86_64  ARM: JIT tested with seccomp examples.

* powerpc [4/6]: no seccomp change - compile checked.

* sparc [5/6]  s390 [6/6]: no seccomp change - untested.

Sorry I have no sparc or s390 build environment here.  Can someone help
check 5/6 and 6/6?  Thanks.

Xi Wang (6):
  filter: refactor BPF JIT for seccomp filters
  x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction
  ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction
  PPC: net: bpf_jit_comp: refactor the BPF JIT interface
  sparc: bpf_jit_comp: refactor the BPF JIT interface
  s390/bpf,jit: refactor the BPF JIT interface

 arch/arm/net/bpf_jit_32.c   | 64 +
 arch/powerpc/net/bpf_jit_comp.c | 36 +++
 arch/s390/net/bpf_jit_comp.c| 31 ++--
 arch/sparc/net/bpf_jit_comp.c   | 22 +++---
 arch/x86/net/bpf_jit_comp.c | 38 
 include/linux/filter.h  | 16 +++
 kernel/seccomp.c|  6 +++-
 net/core/filter.c   |  6 ++--
 8 files changed, 127 insertions(+), 92 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in x86 JIT.

Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/x86/net/bpf_jit_comp.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index f66b540..03c9c81 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -8,10 +8,11 @@
  * of the License.
  */
 #include linux/moduleloader.h
-#include asm/cacheflush.h
 #include linux/netdevice.h
 #include linux/filter.h
 #include linux/if_vlan.h
+#include asm/cacheflush.h
+#include asm/syscall.h
 
 /*
  * Conventions :
@@ -144,7 +145,7 @@ static int pkt_type_offset(void)
return -1;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
u8 temp[64];
u8 *prog;
@@ -157,15 +158,14 @@ void bpf_jit_compile(struct sk_filter *fp)
int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */
unsigned int cleanup_addr; /* epilogue code offset */
unsigned int *addrs;
-   const struct sock_filter *filter = fp-insns;
-   int flen = fp-len;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kmalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/* Before first pass, make a rough estimation of addrs[]
 * each bpf instruction is translated to less than 64 bytes
@@ -684,6 +684,20 @@ cond_branch:   f_offset = addrs[i + 
filter[i].jf] - addrs[i];
}
EMIT_COND_JMP(f_op, f_offset);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   if (K == offsetof(struct seccomp_data, arch)) {
+   int arch = syscall_get_arch(current, 
NULL);
+
+   EMIT1_off32(0xb8, arch); /* mov 
arch,%eax */
+   break;
+   }
+   func = (u8 *)seccomp_bpf_load;
+   t_offset = func - (image + addrs[i]);
+   EMIT1_off32(0xbf, K); /* mov imm32,%edi */
+   EMIT1_off32(0xe8, t_offset); /* call 
seccomp_bpf_load */
+   break;
+#endif
default:
/* hmm, too complex filter, give up with jit 
compiler */
goto out;
@@ -694,7 +708,7 @@ cond_branch:f_offset = addrs[i + 
filter[i].jf] - addrs[i];
pr_err(bpb_jit_compile fatal error\n);
kfree(addrs);
module_free(NULL, image);
-   return;
+   return bpf_func;
}
memcpy(image + proglen, temp, ilen);
}
@@ -731,11 +745,11 @@ cond_branch:  f_offset = addrs[i + 
filter[i].jf] - addrs[i];
 
if (image) {
bpf_flush_icache(image, image + proglen);
-   fp-bpf_func = (void *)image;
+   bpf_func = (void *)image;
}
 out:
kfree(addrs);
-   return;
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -746,10 +760,10 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
-   if (fp-bpf_func != sk_run_filter) {
-   struct work_struct *work = (struct work_struct *)fp-bpf_func;
+   if (bpf_func != sk_run_filter) {
+   struct work_struct *work = (struct work_struct *)bpf_func;
 
INIT_WORK(work, jit_free_defer);
schedule_work(work);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 6/6] s390/bpf,jit: refactor the BPF JIT interface

2013-04-26 Thread Xi Wang

Implement the refactored bpf_jit_compile() and bpf_jit_free().

Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/s390/net/bpf_jit_comp.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 0972e91..7966e0c 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -738,19 +738,19 @@ out:
return -1;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
unsigned long size, prg_len, lit_len;
struct bpf_jit jit, cjit;
unsigned int *addrs;
int pass, i;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
-   addrs = kmalloc(fp-len * sizeof(*addrs), GFP_KERNEL);
+   return bpf_func;
+   addrs = kzalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
-   memset(addrs, 0, fp-len * sizeof(*addrs));
+   return bpf_func;
memset(jit, 0, sizeof(cjit));
memset(cjit, 0, sizeof(cjit));
 
@@ -759,10 +759,10 @@ void bpf_jit_compile(struct sk_filter *fp)
jit.lit = jit.mid;
 
bpf_jit_prologue(jit);
-   bpf_jit_noleaks(jit, fp-insns);
-   for (i = 0; i  fp-len; i++) {
-   if (bpf_jit_insn(jit, fp-insns + i, addrs, i,
-i == fp-len - 1))
+   bpf_jit_noleaks(jit, filter);
+   for (i = 0; i  flen; i++) {
+   if (bpf_jit_insn(jit, filter + i, addrs, i,
+i == flen - 1))
goto out;
}
bpf_jit_epilogue(jit);
@@ -789,8 +789,8 @@ void bpf_jit_compile(struct sk_filter *fp)
cjit = jit;
}
if (bpf_jit_enable  1) {
-   pr_err(flen=%d proglen=%lu pass=%d image=%p\n,
-  fp-len, jit.end - jit.start, pass, jit.start);
+   pr_err(flen=%u proglen=%lu pass=%d image=%p\n,
+  flen, jit.end - jit.start, pass, jit.start);
if (jit.start) {
printk(KERN_ERR JIT code:\n);
print_fn_code(jit.start, jit.mid - jit.start);
@@ -800,9 +800,10 @@ void bpf_jit_compile(struct sk_filter *fp)
}
}
if (jit.start)
-   fp-bpf_func = (void *) jit.start;
+   bpf_func = (void *) jit.start;
 out:
kfree(addrs);
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -813,13 +814,13 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
struct work_struct *work;
 
-   if (fp-bpf_func == sk_run_filter)
+   if (bpf_func == sk_run_filter)
return;
-   work = (struct work_struct *)fp-bpf_func;
+   work = (struct work_struct *)bpf_func;
INIT_WORK(work, jit_free_defer);
schedule_work(work);
 }
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 3/6] ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in ARM JIT.

Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/arm/net/bpf_jit_32.c | 64 +--
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 1a643ee..9bfce464 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -19,6 +19,7 @@
 #include linux/if_vlan.h
 #include asm/cacheflush.h
 #include asm/hwcap.h
+#include asm/syscall.h
 
 #include bpf_jit_32.h
 
@@ -55,7 +56,8 @@
 #define FLAG_NEED_X_RESET  (1  0)
 
 struct jit_ctx {
-   const struct sk_filter *skf;
+   struct sock_filter *insns;
+   unsigned len;
unsigned idx;
unsigned prologue_bytes;
int ret0_fp_idx;
@@ -131,8 +133,8 @@ static u16 saved_regs(struct jit_ctx *ctx)
 {
u16 ret = 0;
 
-   if ((ctx-skf-len  1) ||
-   (ctx-skf-insns[0].code == BPF_S_RET_A))
+   if ((ctx-len  1) ||
+   (ctx-insns[0].code == BPF_S_RET_A))
ret |= 1  r_A;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -181,7 +183,7 @@ static inline bool is_load_to_a(u16 inst)
 static void build_prologue(struct jit_ctx *ctx)
 {
u16 reg_set = saved_regs(ctx);
-   u16 first_inst = ctx-skf-insns[0].code;
+   u16 first_inst = ctx-insns[0].code;
u16 off;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -279,7 +281,7 @@ static u16 imm_offset(u32 k, struct jit_ctx *ctx)
ctx-imms[i] = k;
 
/* constants go just after the epilogue */
-   offset =  ctx-offsets[ctx-skf-len];
+   offset =  ctx-offsets[ctx-len];
offset += ctx-prologue_bytes;
offset += ctx-epilogue_bytes;
offset += i * 4;
@@ -419,7 +421,7 @@ static inline void emit_err_ret(u8 cond, struct jit_ctx 
*ctx)
emit(ARM_MOV_R(ARM_R0, ARM_R0), ctx);
} else {
_emit(cond, ARM_MOV_I(ARM_R0, 0), ctx);
-   _emit(cond, ARM_B(b_imm(ctx-skf-len, ctx)), ctx);
+   _emit(cond, ARM_B(b_imm(ctx-len, ctx)), ctx);
}
 }
 
@@ -469,14 +471,13 @@ static inline void update_on_xread(struct jit_ctx *ctx)
 static int build_body(struct jit_ctx *ctx)
 {
void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
-   const struct sk_filter *prog = ctx-skf;
const struct sock_filter *inst;
unsigned i, load_order, off, condt;
int imm12;
u32 k;
 
-   for (i = 0; i  prog-len; i++) {
-   inst = (prog-insns[i]);
+   for (i = 0; i  ctx-len; i++) {
+   inst = (ctx-insns[i]);
/* K as an immediate value operand */
k = inst-k;
 
@@ -769,8 +770,8 @@ cmp_x:
ctx-ret0_fp_idx = i;
emit_mov_i(ARM_R0, k, ctx);
 b_epilogue:
-   if (i != ctx-skf-len - 1)
-   emit(ARM_B(b_imm(prog-len, ctx)), ctx);
+   if (i != ctx-len - 1)
+   emit(ARM_B(b_imm(ctx-len, ctx)), ctx);
break;
case BPF_S_MISC_TAX:
/* X = A */
@@ -845,6 +846,19 @@ b_epilogue:
off = offsetof(struct sk_buff, queue_mapping);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   if (k == offsetof(struct seccomp_data, arch)) {
+   emit_mov_i(r_A, AUDIT_ARCH_ARM, ctx);
+   break;
+   }
+   ctx-seen |= SEEN_CALL;
+   emit_mov_i(ARM_R3, (u32)seccomp_bpf_load, ctx);
+   emit_mov_i(ARM_R0, k, ctx);
+   emit_blx_r(ARM_R3, ctx);
+   emit(ARM_MOV_R(r_A, ARM_R0), ctx);
+   break;
+#endif
default:
return -1;
}
@@ -858,22 +872,24 @@ b_epilogue:
 }
 
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
struct jit_ctx ctx;
unsigned tmp_idx;
unsigned alloc_size;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
memset(ctx, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   ctx.insns   = filter;
+   ctx.len = flen;
+   ctx.ret0_fp_idx = -1;
 
-   ctx.offsets = kzalloc(4 * (ctx.skf-len + 1), GFP_KERNEL);
+   ctx.offsets = kzalloc(4 * (ctx.len + 1), GFP_KERNEL);
if (ctx.offsets == NULL)
-   return;
+   return bpf_func;
 
/* fake pass to fill in the ctx-seen */
if (unlikely(build_body(ctx)))
@@ -919,12 +935,12 @@ void

[RFC PATCH net-next 5/6] sparc: bpf_jit_comp: refactor the BPF JIT interface

2013-04-26 Thread Xi Wang

Implement the refactored bpf_jit_compile() and bpf_jit_free().

Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/sparc/net/bpf_jit_comp.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index d36a85e..15e6513 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -354,21 +354,21 @@ do {  *prog++ = BR_OPC | WDISP22(OFF);
\
  * emit_jump() calls with adjusted offsets.
  */
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
unsigned int cleanup_addr, proglen, oldproglen = 0;
u32 temp[8], *prog, *func, seen = 0, pass;
-   const struct sock_filter *filter = fp-insns;
-   int i, flen = fp-len, pc_ret0 = -1;
+   int i, pc_ret0 = -1;
unsigned int *addrs;
void *image;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kmalloc(flen * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/* Before first pass, make a rough estimation of addrs[]
 * each bpf instruction is translated to less than 64 bytes
@@ -763,7 +763,7 @@ cond_branch:f_offset = addrs[i + 
filter[i].jf];
pr_err(bpb_jit_compile fatal error\n);
kfree(addrs);
module_free(NULL, image);
-   return;
+   return bpf_func;
}
memcpy(image + proglen, temp, ilen);
}
@@ -799,11 +799,11 @@ cond_branch:  f_offset = addrs[i + 
filter[i].jf];
 
if (image) {
bpf_flush_icache(image, image + proglen);
-   fp-bpf_func = (void *)image;
+   bpf_func = (void *)image;
}
 out:
kfree(addrs);
-   return;
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -814,10 +814,10 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
-   if (fp-bpf_func != sk_run_filter) {
-   struct work_struct *work = (struct work_struct *)fp-bpf_func;
+   if (bpf_func != sk_run_filter) {
+   struct work_struct *work = (struct work_struct *)bpf_func;
 
INIT_WORK(work, jit_free_defer);
schedule_work(work);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 4/6] PPC: net: bpf_jit_comp: refactor the BPF JIT interface

2013-04-26 Thread Xi Wang

Implement the refactored bpf_jit_compile() and bpf_jit_free().

Signed-off-by: Xi Wang xi.w...@gmail.com
---
 arch/powerpc/net/bpf_jit_comp.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c427ae3..a82e400 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -31,11 +31,11 @@ static inline void bpf_flush_icache(void *start, void *end)
flush_icache_range((unsigned long)start, (unsigned long)end);
 }
 
-static void bpf_jit_build_prologue(struct sk_filter *fp, u32 *image,
+static void bpf_jit_build_prologue(struct sock_filter *filter,
+  u32 *image,
   struct codegen_context *ctx)
 {
int i;
-   const struct sock_filter *filter = fp-insns;
 
if (ctx-seen  (SEEN_MEM | SEEN_DATAREF)) {
/* Make stackframe */
@@ -135,12 +135,12 @@ static void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
((int)K  0 ? ((int)K = SKF_LL_OFF ? func##_negative_offset : func) : 
func##_positive_offset)
 
 /* Assemble the body code between the prologue  epilogue. */
-static int bpf_jit_build_body(struct sk_filter *fp, u32 *image,
+static int bpf_jit_build_body(struct sock_filter *filter,
+ unsigned int flen,
+ u32 *image,
  struct codegen_context *ctx,
  unsigned int *addrs)
 {
-   const struct sock_filter *filter = fp-insns;
-   int flen = fp-len;
u8 *func;
unsigned int true_cond;
int i;
@@ -564,7 +564,7 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 
*image,
return 0;
 }
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
unsigned int proglen;
unsigned int alloclen;
@@ -573,14 +573,14 @@ void bpf_jit_compile(struct sk_filter *fp)
unsigned int *addrs;
struct codegen_context cgctx;
int pass;
-   int flen = fp-len;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL)
-   return;
+   return bpf_func;
 
/*
 * There are multiple assembly passes as the generated code will change
@@ -636,7 +636,7 @@ void bpf_jit_compile(struct sk_filter *fp)
cgctx.seen = 0;
cgctx.pc_ret0 = -1;
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, cgctx, addrs))
+   if (bpf_jit_build_body(filter, flen, 0, cgctx, addrs))
/* We hit something illegal or unsupported. */
goto out;
 
@@ -645,7 +645,7 @@ void bpf_jit_compile(struct sk_filter *fp)
 * update ctgtx.idx as it pretends to output instructions, then we can
 * calculate total size from idx.
 */
-   bpf_jit_build_prologue(fp, 0, cgctx);
+   bpf_jit_build_prologue(filter, 0, cgctx);
bpf_jit_build_epilogue(0, cgctx);
 
proglen = cgctx.idx * 4;
@@ -661,8 +661,8 @@ void bpf_jit_compile(struct sk_filter *fp)
for (pass = 1; pass  3; pass++) {
/* Now build the prologue, body code  epilogue for real. */
cgctx.idx = 0;
-   bpf_jit_build_prologue(fp, code_base, cgctx);
-   bpf_jit_build_body(fp, code_base, cgctx, addrs);
+   bpf_jit_build_prologue(filter, code_base, cgctx);
+   bpf_jit_build_body(filter, flen, code_base, cgctx, addrs);
bpf_jit_build_epilogue(code_base, cgctx);
 
if (bpf_jit_enable  1)
@@ -681,11 +681,11 @@ void bpf_jit_compile(struct sk_filter *fp)
/* Function descriptor nastiness: Address + TOC */
((u64 *)image)[0] = (u64)code_base;
((u64 *)image)[1] = local_paca-kernel_toc;
-   fp-bpf_func = (void *)image;
+   bpf_func = (void *)image;
}
 out:
kfree(addrs);
-   return;
+   return bpf_func;
 }
 
 static void jit_free_defer(struct work_struct *arg)
@@ -696,10 +696,10 @@ static void jit_free_defer(struct work_struct *arg)
 /* run from softirq, we must use a work_struct to call
  * module_free() from process context
  */
-void bpf_jit_free(struct sk_filter *fp)
+void bpf_jit_free(bpf_func_t bpf_func)
 {
-   if (fp-bpf_func != sk_run_filter) {
-   struct work_struct *work = (struct work_struct *)fp-bpf_func;
+   if (bpf_func != sk_run_filter) {
+   struct work_struct *work = (struct work_struct *)bpf_func;
 
INIT_WORK(work, jit_free_defer);
schedule_work(work);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line

[RFC PATCH net-next 1/6] filter: refactor BPF JIT for seccomp filters

2013-04-26 Thread Xi Wang

Currently, bpf_jit_compile() and bpf_jit_free() takes an sk_filter,
which seccomp filters cannot reuse.

Change bpf_jit_compile() to take a pointer to BPF instructions and
the length, and to return a JITted function.

Change bpf_jit_free() to take a JITted function.

Add JIT calls for seccomp filters.

Signed-off-by: Xi Wang xi.w...@gmail.com
---
 include/linux/filter.h | 16 ++--
 kernel/seccomp.c   |  6 +-
 net/core/filter.c  |  6 ++
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index d1248f4..8743093 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -21,12 +21,14 @@ struct compat_sock_fprog {
 struct sk_buff;
 struct sock;
 
+typedef unsigned int (*bpf_func_t)(const struct sk_buff *skb,
+  const struct sock_filter *filter);
+
 struct sk_filter
 {
atomic_trefcnt;
unsigned intlen;/* Number of filter blocks */
-   unsigned int(*bpf_func)(const struct sk_buff *skb,
-   const struct sock_filter *filter);
+   bpf_func_t  bpf_func;
struct rcu_head rcu;
struct sock_filter  insns[0];
 };
@@ -48,11 +50,12 @@ extern int sk_chk_filter(struct sock_filter *filter, 
unsigned int flen);
 extern int sk_get_filter(struct sock *sk, struct sock_filter __user *filter, 
unsigned len);
 
 #ifdef CONFIG_BPF_JIT
+#include stdarg.h
 #include linux/linkage.h
 #include linux/printk.h
 
-extern void bpf_jit_compile(struct sk_filter *fp);
-extern void bpf_jit_free(struct sk_filter *fp);
+extern bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int 
flen);
+extern void bpf_jit_free(bpf_func_t bpf_func);
 
 static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
u32 pass, void *image)
@@ -65,10 +68,11 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned 
int proglen,
 }
 #define SK_RUN_FILTER(FILTER, SKB) (*FILTER-bpf_func)(SKB, FILTER-insns)
 #else
-static inline void bpf_jit_compile(struct sk_filter *fp)
+static inline bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned 
int flen)
 {
+   return sk_run_filter;
 }
-static inline void bpf_jit_free(struct sk_filter *fp)
+static inline void bpf_jit_free(bpf_func_t bpf_func)
 {
 }
 #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER-insns)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5af44b5..f784feb 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -55,6 +55,7 @@ struct seccomp_filter {
atomic_t usage;
struct seccomp_filter *prev;
unsigned short len;  /* Instruction count */
+   bpf_func_t bpf_func;
struct sock_filter insns[];
 };
 
@@ -211,7 +212,7 @@ static u32 seccomp_run_filters(int syscall)
 * value always takes priority (ignoring the DATA).
 */
for (f = current-seccomp.filter; f; f = f-prev) {
-   u32 cur_ret = sk_run_filter(NULL, f-insns);
+   u32 cur_ret = SK_RUN_FILTER(f, NULL);
if ((cur_ret  SECCOMP_RET_ACTION)  (ret  SECCOMP_RET_ACTION))
ret = cur_ret;
}
@@ -273,6 +274,8 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
if (ret)
goto fail;
 
+   filter-bpf_func = bpf_jit_compile(filter-insns, filter-len);
+
/*
 * If there is an existing filter, make it the prev and don't drop its
 * task reference.
@@ -330,6 +333,7 @@ void put_seccomp_filter(struct task_struct *tsk)
while (orig  atomic_dec_and_test(orig-usage)) {
struct seccomp_filter *freeme = orig;
orig = orig-prev;
+   bpf_jit_free(freeme-bpf_func);
kfree(freeme);
}
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index dad2a17..0a7900b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -643,7 +643,7 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
 {
struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
-   bpf_jit_free(fp);
+   bpf_jit_free(fp-bpf_func);
kfree(fp);
 }
 EXPORT_SYMBOL(sk_filter_release_rcu);
@@ -652,13 +652,11 @@ static int __sk_prepare_filter(struct sk_filter *fp)
 {
int err;
 
-   fp-bpf_func = sk_run_filter;
-
err = sk_chk_filter(fp-insns, fp-len);
if (err)
return err;
 
-   bpf_jit_compile(fp);
+   fp-bpf_func = bpf_jit_compile(fp-insns, fp-len);
return 0;
 }
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 0/6] seccomp filter JIT

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 7:46 AM, Heiko Carstens
heiko.carst...@de.ibm.com wrote:
 And build fine on s390.

Thanks!

 Btw. are there any test cases around for BPF JIT?
 Not only for the new seccomp but also netfilter?

 This however is still a valid question.

Not sure about test cases for BPF JIT in general.  I used the examples
under samples/seccomp/ for testing; they are seccomp filters though.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 0/6] seccomp filter JIT

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 7:46 AM, Daniel Borkmann dbork...@redhat.com wrote:
 I think BPF JIT for seccomp on ARM recently got applied to -mm tree
 if I'm not mistaken. It was from Nicolas Schichan (cc):

 http://thread.gmane.org/gmane.linux.ports.arm.kernel/233416/

Thanks for the pointer.

For the ARM part, looks like Nicolas's patch requires to implement two
wrappers for each arch:

  void seccomp_jit_compile(struct seccomp_filter *fp);
  void seccomp_jit_free(struct seccomp_filter *fp);

The implementation of these wrappers is almost identical to:

  void bpf_jit_compile(struct sk_filter *fp);
  void bpf_jit_free(struct sk_filter *fp);

While this patch uses a unified interface for both packet  seccomp filters.

  bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen);
  void bpf_jit_free(bpf_func_t bpf_func);

Shouldn't be hard to merge though.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 10:18 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Fri, 2013-04-26 at 03:51 -0400, Xi Wang wrote:

 +#ifdef CONFIG_SECCOMP_FILTER
 + case BPF_S_ANC_SECCOMP_LD_W:
 + if (K == offsetof(struct seccomp_data, arch)) {
 + int arch = syscall_get_arch(current, 
 NULL);
 +
 + EMIT1_off32(0xb8, arch); /* mov 
 arch,%eax */
 + break;
 + }
 + func = (u8 *)seccomp_bpf_load;
 + t_offset = func - (image + addrs[i]);
 + EMIT1_off32(0xbf, K); /* mov imm32,%edi */
 + EMIT1_off32(0xe8, t_offset); /* call 
 seccomp_bpf_load */
 + break;
 +#endif

 This seems seriously wrong to me.

Can you elaborate?

 This cannot have been tested at all.

Thanks to QEMU for hiding bugs then. :)

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 11:11 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 2) Calling a function potentially destroys some registers.
%rdi,%r8,%r9 for instance, so we are going to crash very easily.

 I dont know, I feel a bit uncomfortable having to explain this to
 someone sending security related patches...

My old code did save these registers.  But, do we really need that for
seccomp?  For example, %rdi (skb) is always NULL and never used by
seccomp filters.  Did I miss anything?

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 11:43 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 I do not know.

 This is not explained in your changelog or in any comment.

 You have to make the full analysis yourself and make us comfortable with
 the results.

 You send patches and ask us to spend hours on it, this is not how it
 works.

do we really need that for seccomp? is not asking.  I just tried to
explain in a gentle way.

%rdi,%r8,%r9 are not used by seccomp filters so I removed the push/pop
part.  I agree that I should explain the details in the code comments
or logs.  Thanks for catching that.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

On Fri, Apr 26, 2013 at 11:11 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 1) 'current' at the time the code is jitted (compiled) is not the
 'current' at the time the filter will be evaluated.

 On x86_64, if CONFIG_IA32_EMULATION=y, syscall_get_arch() evaluates to :

 if (task_thread_info(task)-status  TS_COMPAT)
 return AUDIT_ARCH_I386;
 return AUDIT_ARCH_X86_64;

 So your code is completely wrong.

Just to be clear, are you worrying about a process changing its
personality after installing seccomp filters?

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 2/6] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

Not sure how many you are speaking for when you say We are very dumb.  :)

Thanks for catching this.  I'l remove this arch thing in v2.

To address your other concern about registers, I'll add some comments
to the code, something like:

%rdi,%r8,%r9 are not used by seccomp filters; it's safe to not save them.

- xi

On Fri, Apr 26, 2013 at 12:14 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Fri, 2013-04-26 at 12:02 -0400, Xi Wang wrote:
 On Fri, Apr 26, 2013 at 11:11 AM, Eric Dumazet eric.duma...@gmail.com 
 wrote:
  1) 'current' at the time the code is jitted (compiled) is not the
  'current' at the time the filter will be evaluated.
 
  On x86_64, if CONFIG_IA32_EMULATION=y, syscall_get_arch() evaluates to :
 
  if (task_thread_info(task)-status  TS_COMPAT)
  return AUDIT_ARCH_I386;
  return AUDIT_ARCH_X86_64;
 
  So your code is completely wrong.

 Just to be clear, are you worrying about a process changing its
 personality after installing seccomp filters?

 You didn't explained how things worked.

 Are you assuming we network guys know everything ?

 Just to make it very clear :

 We are very dumb and you must explain us everything.

 If process would not change personality, why do we have get_arch() at
 all ? Why isn't it optimized outside of the JIT itself, in the generic
 seccomp checker, its a single A = K instruction after all.

 Why this part is even in the x86 BPF JIT ?

 To me it looks like _if_ get_arch() is provided in BPF, its for a
 reason, and your implementation looks very suspicious, if not buggy.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/3] ARM: net: bpf_jit: make code generation less dependent on struct sk_filter.

2013-04-26 Thread Xi Wang

Thanks for CCing.  One way to clean up this would be to refactor the
bpf jit interface as:

  bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen);
  void bpf_jit_free(bpf_func_t bpf_func);

Then both packet and seccomp filters can share the unified interface.
Also, we don't need seccomp_filter_get_len() and other helpers.

Do you want me to rebase my patch against linux-next and see how that goes?

- xi

On Fri, Apr 26, 2013 at 6:01 PM, Daniel Borkmann dbork...@redhat.com wrote:
 On 04/26/2013 10:09 PM, Andrew Morton wrote:

 On Fri, 26 Apr 2013 21:47:46 +0200 Daniel Borkmann dbork...@redhat.com
 wrote:

 On 04/26/2013 09:26 PM, Andrew Morton wrote:

 On Fri, 26 Apr 2013 16:04:44 +0200 Arnd Bergmann a...@arndb.de wrote:

 On Wednesday 24 April 2013 19:27:08 Nicolas Schichan wrote:

 @@ -858,7 +858,7 @@ b_epilogue:
}


 -void bpf_jit_compile(struct sk_filter *fp)
 +static void __bpf_jit_compile(struct jit_ctx *out_ctx)
{
   struct jit_ctx ctx;
   unsigned tmp_idx;
 @@ -867,11 +867,10 @@ void bpf_jit_compile(struct sk_filter *fp)
   if (!bpf_jit_enable)
   return;

 -   memset(ctx, 0, sizeof(ctx));
 -   ctx.skf = fp;
 +   ctx = *out_ctx;
   ctx.ret0_fp_idx = -1;

 -   ctx.offsets = kzalloc(4 * (ctx.skf-len + 1), GFP_KERNEL);
 +   ctx.offsets = kzalloc(4 * (ctx.prog_len + 1), GFP_KERNEL);
   if (ctx.offsets == NULL)
   return;

 @@ -921,13 +920,26 @@ void bpf_jit_compile(struct sk_filter *fp)
   print_hex_dump(KERN_INFO, BPF JIT code: ,
  DUMP_PREFIX_ADDRESS, 16, 4,
 ctx.target,
  alloc_size, false);
 -
 -   fp-bpf_func = (void *)ctx.target;
out:
   kfree(ctx.offsets);
 +
 +   *out_ctx = ctx;
   return;


 This part of the patch, in combination with 79617801e filter:
 bpf_jit_comp:
 refactor and unify BPF JIT image dump output is now causing build
 errors
 in linux-next:

 arch/arm/net/bpf_jit_32.c: In function '__bpf_jit_compile':
 arch/arm/net/bpf_jit_32.c:930:16: error: 'fp' undeclared (first use in
 this function)
  bpf_jit_dump(fp-len, alloc_size, 2, ctx.target);


 Thanks, I did this.  There may be a smarter way...


 I think also seccomp_jit_compile() would need this change then, otherwise
 the build
 with CONFIG_SECCOMP_FILTER_JIT might break.


 urgh, that tears it.

 I can fix this up for you if not already applied. I presume it's against
 linux-next tree?


 Yup, please send something.


 Patch is attached. However, I currently don't have an ARM toolchain at hand,
 so
 uncompiled, untested.

 @Nicolas, Xi (cc, ref: http://thread.gmane.org/gmane.linux.kernel/1481464):

 If there is someday support for other archs as well, it would be nice if we
 do not have each time duplicated seccomp_jit_compile() etc functions in each
 JIT implementation, i.e. because they do basically the same. So follow-up
 {fix,clean}up is appreciated.

 Also, I find it a bit weird that seccomp_filter_get_len() and some other
 _one-line_ functions from kernel/seccomp.c are not placed into the
 corresponding header file as inlines.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 net-next 0/3] seccomp filter JIT

2013-04-26 Thread Xi Wang

The first patch refactors bpf_jit_compile()/bpf_jit_free() to provide
a unified interface for both packet and seccomp filters.

The next two patches implement JIT for seccomp filters on x86 and ARM,
respectively.

Thanks to Heiko Carstens for testing the build on s390 and Eric Dumazet
for the comments on x86 JIT.

Changes:
  make better patch splitting
  remove arch at jit time in x86
  add more comments in x86

Xi Wang (3):
  filter: refactor BPF JIT for seccomp filters
  x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction
  ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

 arch/arm/net/bpf_jit_32.c   | 64 +
 arch/powerpc/net/bpf_jit_comp.c | 36 +++
 arch/s390/net/bpf_jit_comp.c| 31 ++--
 arch/sparc/net/bpf_jit_comp.c   | 22 +++---
 arch/x86/net/bpf_jit_comp.c | 32 ++---
 include/linux/filter.h  | 16 +++
 kernel/seccomp.c|  6 +++-
 net/core/filter.c   |  6 ++--
 8 files changed, 122 insertions(+), 91 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 net-next 1/3] filter: refactor BPF JIT for seccomp filters

2013-04-26 Thread Xi Wang

Currently, bpf_jit_compile() and bpf_jit_free() take an sk_filter,
which seccomp filters cannot reuse.

Change bpf_jit_compile() to take a pointer to BPF instructions and
an instruction length, and to return a JITted function.

Change bpf_jit_free() to take a JITted function.

Add JIT calls for seccomp filters.

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: Daniel Borkmann dbork...@redhat.com
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Will Drewry w...@chromium.org
Cc: Eric Dumazet eduma...@google.com
Cc: Russell King li...@arm.linux.org.uk
Cc: David Laight david.lai...@aculab.com
Cc: David S. Miller da...@davemloft.net
Cc: Andrew Morton a...@linux-foundation.org
Cc: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c   | 50 -
 arch/powerpc/net/bpf_jit_comp.c | 36 ++---
 arch/s390/net/bpf_jit_comp.c| 31 -
 arch/sparc/net/bpf_jit_comp.c   | 22 +-
 arch/x86/net/bpf_jit_comp.c | 21 +
 include/linux/filter.h  | 16 -
 kernel/seccomp.c|  6 -
 net/core/filter.c   |  6 ++---
 8 files changed, 97 insertions(+), 91 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 1a643ee..073b085 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -55,7 +55,8 @@
 #define FLAG_NEED_X_RESET  (1  0)
 
 struct jit_ctx {
-   const struct sk_filter *skf;
+   struct sock_filter *insns;
+   unsigned len;
unsigned idx;
unsigned prologue_bytes;
int ret0_fp_idx;
@@ -131,8 +132,8 @@ static u16 saved_regs(struct jit_ctx *ctx)
 {
u16 ret = 0;
 
-   if ((ctx-skf-len  1) ||
-   (ctx-skf-insns[0].code == BPF_S_RET_A))
+   if ((ctx-len  1) ||
+   (ctx-insns[0].code == BPF_S_RET_A))
ret |= 1  r_A;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -181,7 +182,7 @@ static inline bool is_load_to_a(u16 inst)
 static void build_prologue(struct jit_ctx *ctx)
 {
u16 reg_set = saved_regs(ctx);
-   u16 first_inst = ctx-skf-insns[0].code;
+   u16 first_inst = ctx-insns[0].code;
u16 off;
 
 #ifdef CONFIG_FRAME_POINTER
@@ -279,7 +280,7 @@ static u16 imm_offset(u32 k, struct jit_ctx *ctx)
ctx-imms[i] = k;
 
/* constants go just after the epilogue */
-   offset =  ctx-offsets[ctx-skf-len];
+   offset =  ctx-offsets[ctx-len];
offset += ctx-prologue_bytes;
offset += ctx-epilogue_bytes;
offset += i * 4;
@@ -419,7 +420,7 @@ static inline void emit_err_ret(u8 cond, struct jit_ctx 
*ctx)
emit(ARM_MOV_R(ARM_R0, ARM_R0), ctx);
} else {
_emit(cond, ARM_MOV_I(ARM_R0, 0), ctx);
-   _emit(cond, ARM_B(b_imm(ctx-skf-len, ctx)), ctx);
+   _emit(cond, ARM_B(b_imm(ctx-len, ctx)), ctx);
}
 }
 
@@ -469,14 +470,13 @@ static inline void update_on_xread(struct jit_ctx *ctx)
 static int build_body(struct jit_ctx *ctx)
 {
void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
-   const struct sk_filter *prog = ctx-skf;
const struct sock_filter *inst;
unsigned i, load_order, off, condt;
int imm12;
u32 k;
 
-   for (i = 0; i  prog-len; i++) {
-   inst = (prog-insns[i]);
+   for (i = 0; i  ctx-len; i++) {
+   inst = (ctx-insns[i]);
/* K as an immediate value operand */
k = inst-k;
 
@@ -769,8 +769,8 @@ cmp_x:
ctx-ret0_fp_idx = i;
emit_mov_i(ARM_R0, k, ctx);
 b_epilogue:
-   if (i != ctx-skf-len - 1)
-   emit(ARM_B(b_imm(prog-len, ctx)), ctx);
+   if (i != ctx-len - 1)
+   emit(ARM_B(b_imm(ctx-len, ctx)), ctx);
break;
case BPF_S_MISC_TAX:
/* X = A */
@@ -858,22 +858,24 @@ b_epilogue:
 }
 
 
-void bpf_jit_compile(struct sk_filter *fp)
+bpf_func_t bpf_jit_compile(struct sock_filter *filter, unsigned int flen)
 {
struct jit_ctx ctx;
unsigned tmp_idx;
unsigned alloc_size;
+   bpf_func_t bpf_func = sk_run_filter;
 
if (!bpf_jit_enable)
-   return;
+   return bpf_func;
 
memset(ctx, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   ctx.insns   = filter;
+   ctx.len = flen;
+   ctx.ret0_fp_idx = -1;
 
-   ctx.offsets = kzalloc(4 * (ctx.skf-len + 1), GFP_KERNEL);
+   ctx.offsets = kzalloc(4 * (ctx.len + 1), GFP_KERNEL);
if (ctx.offsets == NULL)
-   return;
+   return bpf_func;
 
/* fake pass to fill in the ctx-seen */
if (unlikely(build_body(ctx)))
@@ -919,12 +921,12 @@ void bpf_jit_compile(struct sk_filter *fp

[PATCH v2 net-next 3/3] ARM: net: bpf_jit_32: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in ARM JIT.

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: Daniel Borkmann dbork...@redhat.com
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Will Drewry w...@chromium.org
Cc: Eric Dumazet eduma...@google.com
Cc: Russell King li...@arm.linux.org.uk
Cc: David Laight david.lai...@aculab.com
Cc: David S. Miller da...@davemloft.net
Cc: Andrew Morton a...@linux-foundation.org
Cc: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 073b085..9bfce464 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -19,6 +19,7 @@
 #include linux/if_vlan.h
 #include asm/cacheflush.h
 #include asm/hwcap.h
+#include asm/syscall.h
 
 #include bpf_jit_32.h
 
@@ -845,6 +846,19 @@ b_epilogue:
off = offsetof(struct sk_buff, queue_mapping);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   if (k == offsetof(struct seccomp_data, arch)) {
+   emit_mov_i(r_A, AUDIT_ARCH_ARM, ctx);
+   break;
+   }
+   ctx-seen |= SEEN_CALL;
+   emit_mov_i(ARM_R3, (u32)seccomp_bpf_load, ctx);
+   emit_mov_i(ARM_R0, k, ctx);
+   emit_blx_r(ARM_R3, ctx);
+   emit(ARM_MOV_R(r_A, ARM_R0), ctx);
+   break;
+#endif
default:
return -1;
}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 net-next 2/3] x86: bpf_jit_comp: support BPF_S_ANC_SECCOMP_LD_W instruction

2013-04-26 Thread Xi Wang

This patch implements the seccomp BPF_S_ANC_SECCOMP_LD_W instruction
in x86 JIT.

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: Daniel Borkmann dbork...@redhat.com
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Will Drewry w...@chromium.org
Cc: Eric Dumazet eduma...@google.com
Cc: Russell King li...@arm.linux.org.uk
Cc: David Laight david.lai...@aculab.com
Cc: David S. Miller da...@davemloft.net
Cc: Andrew Morton a...@linux-foundation.org
Cc: Nicolas Schichan nschic...@freebox.fr
---
 arch/x86/net/bpf_jit_comp.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8898680..5f1dafb 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -683,6 +683,17 @@ cond_branch:   f_offset = addrs[i + 
filter[i].jf] - addrs[i];
}
EMIT_COND_JMP(f_op, f_offset);
break;
+#ifdef CONFIG_SECCOMP_FILTER
+   case BPF_S_ANC_SECCOMP_LD_W:
+   func = (u8 *)seccomp_bpf_load;
+   t_offset = func - (image + addrs[i]);
+   /* seccomp filters don't use %rdi, %r8, %r9
+* it is safe to not save these registers
+*/
+   EMIT1_off32(0xbf, K); /* mov imm32,%edi */
+   EMIT1_off32(0xe8, t_offset); /* call 
seccomp_bpf_load */
+   break;
+#endif
default:
/* hmm, too complex filter, give up with jit 
compiler */
goto out;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sysctl: fix null checking in bin_dn_node_address()

2013-01-22 Thread Xi Wang

The null check of `strchr() + 1' is broken, which is always non-null,
leading to OOB read.  Instead, check the result of strchr().

Signed-off-by: Xi Wang 
Cc: sta...@vger.kernel.org
---
 kernel/sysctl_binary.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 5a63844..0ddf3a0 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -1194,9 +1194,10 @@ static ssize_t bin_dn_node_address(struct file *file,
 
/* Convert the decnet address to binary */
result = -EIO;
-   nodep = strchr(buf, '.') + 1;
+   nodep = strchr(buf, '.');
if (!nodep)
goto out;
+   ++nodep;
 
area = simple_strtoul(buf, NULL, 10);
node = simple_strtoul(nodep, NULL, 10);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sysctl: fix null checking in bin_dn_node_address()

2013-01-22 Thread Xi Wang

The null check of `strchr() + 1' is broken, which is always non-null,
leading to OOB read.  Instead, check the result of strchr().

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: sta...@vger.kernel.org
---
 kernel/sysctl_binary.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 5a63844..0ddf3a0 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -1194,9 +1194,10 @@ static ssize_t bin_dn_node_address(struct file *file,
 
/* Convert the decnet address to binary */
result = -EIO;
-   nodep = strchr(buf, '.') + 1;
+   nodep = strchr(buf, '.');
if (!nodep)
goto out;
+   ++nodep;
 
area = simple_strtoul(buf, NULL, 10);
node = simple_strtoul(nodep, NULL, 10);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] exec: avoid possible undefined behavior in count()

2013-01-16 Thread Xi Wang

On 1/7/13 4:44 PM, Andrew Morton wrote:
> I have no problem working around a compiler bug when the workaround is
> so small and simple.  For clarity and accuracy I renamed the patch to
> "fs/exec.c: work around icc miscompilation".

Thanks!

> However I'd also like to be able to add "this bug has been reported to
> the icc developers and will be fixed in version X.Y"?

The icc developers have confirmed this bug and filed a defect ticket.
I'll let you know if there's any further update.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] exec: avoid possible undefined behavior in count()

2013-01-16 Thread Xi Wang

On 1/7/13 4:44 PM, Andrew Morton wrote:
 I have no problem working around a compiler bug when the workaround is
 so small and simple.  For clarity and accuracy I renamed the patch to
 fs/exec.c: work around icc miscompilation.

Thanks!

 However I'd also like to be able to add this bug has been reported to
 the icc developers and will be fixed in version X.Y?

The icc developers have confirmed this bug and filed a defect ticket.
I'll let you know if there's any further update.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC] exec: avoid possible undefined behavior in count()

2013-01-05 Thread Xi Wang

The tricky problem is this check:

if (i++ >= max)

icc (mis)optimizes this check as:

if (++i > max)

The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFF).

This is "allowed" by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior.  This optimization
effectively reverts the previous commit 362e6663ef ("exec.c, compat.c:
fix count(), compat_count() bounds checking") that tries to fix the check.

This patch simply moves ++ after the check.

Signed-off-by: Xi Wang 
---
Not sure how many people are using Intel's icc to compiled the kernel.
Some projects like LinuxDNA did.

The kernel uses gcc's -fno-strict-overflow to disable this optimization.
icc probably doesn't recognize the option.

To illustrate the problem, try this simple program:

int count(int i, int max)
{
if (i++ >= max) {
__builtin_trap();
return -1;
}
return i;
}

#include 
#include 

int main(int argc, char **argv)
{
int x = atoi(argv[1]);
int max = atoi(argv[2]);
printf("%d %d %d\n", x, max, count(x, max));
}

$ gcc -O2 t.c
$ ./a.out 2147483647 2147483647
Illegal instruction (core dumped)

$ icc -O2 t.c
$ ./a.out 2147483647 2147483647
2147483647 2147483647 -2147483648

There's no difference whether we add -fno-strict-overflow or not.
---
 fs/exec.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 18c45ca..20df02c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -434,8 +434,9 @@ static int count(struct user_arg_ptr argv, int max)
if (IS_ERR(p))
return -EFAULT;
 
-   if (i++ >= max)
+   if (i >= max)
return -E2BIG;
+   ++i;
 
if (fatal_signal_pending(current))
return -ERESTARTNOHAND;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC] exec: avoid possible undefined behavior in count()

2013-01-05 Thread Xi Wang

The tricky problem is this check:

if (i++ = max)

icc (mis)optimizes this check as:

if (++i  max)

The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFF).

This is allowed by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior.  This optimization
effectively reverts the previous commit 362e6663ef (exec.c, compat.c:
fix count(), compat_count() bounds checking) that tries to fix the check.

This patch simply moves ++ after the check.

Signed-off-by: Xi Wang xi.w...@gmail.com
---
Not sure how many people are using Intel's icc to compiled the kernel.
Some projects like LinuxDNA did.

The kernel uses gcc's -fno-strict-overflow to disable this optimization.
icc probably doesn't recognize the option.

To illustrate the problem, try this simple program:

int count(int i, int max)
{
if (i++ = max) {
__builtin_trap();
return -1;
}
return i;
}

#include stdio.h
#include stdlib.h

int main(int argc, char **argv)
{
int x = atoi(argv[1]);
int max = atoi(argv[2]);
printf(%d %d %d\n, x, max, count(x, max));
}

$ gcc -O2 t.c
$ ./a.out 2147483647 2147483647
Illegal instruction (core dumped)

$ icc -O2 t.c
$ ./a.out 2147483647 2147483647
2147483647 2147483647 -2147483648

There's no difference whether we add -fno-strict-overflow or not.
---
 fs/exec.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 18c45ca..20df02c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -434,8 +434,9 @@ static int count(struct user_arg_ptr argv, int max)
if (IS_ERR(p))
return -EFAULT;
 
-   if (i++ = max)
+   if (i = max)
return -E2BIG;
+   ++i;
 
if (fatal_signal_pending(current))
return -ERESTARTNOHAND;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] nfs: fix null checking in nfs_get_option_str()

2013-01-04 Thread Xi Wang

The following null pointer check is broken.

*option = match_strdup(args);
return !option;

The pointer `option' must be non-null, and thus `!option' is always false.
Use `!*option' instead.

The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some
cut-and-paste code.").

Signed-off-by: Xi Wang 
Cc: sta...@vger.kernel.org
---
 fs/nfs/super.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index c25cadf8..2e7e8c8 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1152,7 +1152,7 @@ static int nfs_get_option_str(substring_t args[], char 
**option)
 {
kfree(*option);
*option = match_strdup(args);
-   return !option;
+   return !*option;
 }
 
 static int nfs_get_option_ul(substring_t args[], unsigned long *option)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] nfs: fix null checking in nfs_get_option_str()

2013-01-04 Thread Xi Wang

The following null pointer check is broken.

*option = match_strdup(args);
return !option;

The pointer `option' must be non-null, and thus `!option' is always false.
Use `!*option' instead.

The bug was introduced in commit c5cb09b6f8 (Cleanup: Factor out some
cut-and-paste code.).

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: sta...@vger.kernel.org
---
 fs/nfs/super.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index c25cadf8..2e7e8c8 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1152,7 +1152,7 @@ static int nfs_get_option_str(substring_t args[], char 
**option)
 {
kfree(*option);
*option = match_strdup(args);
-   return !option;
+   return !*option;
 }
 
 static int nfs_get_option_ul(substring_t args[], unsigned long *option)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mvsas regression since 3.5

2012-12-22 Thread Xi Wang

On 12/22/12 10:33 AM, Drunkard Zhang wrote:
> I'm using Asus PIKE 6480 SAS card, whose chipset is "RAID bus
> controller: Marvell Technology Group Ltd. MV64460/64461/64462 System
> Controller, Revision B", with latest stable branch 3.7.1 only 1 of 8
> ports works, to get others works I got to pull & plug back. While with
> 3.5.7 it's all good, so it must be a regression.

Can you try to revert commit beecadea1b8d67f591b13f7099559f32f3fd601d?

Particularly, can you first change

  #define bit(n) ((u64)1 << n)

in drivers/scsi/mvsas/mv_sas.h back to

  #define bit(n) ((u32)1 << n)

and see if it works for you?  Thanks.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mvsas regression since 3.5

2012-12-22 Thread Xi Wang

On 12/22/12 10:33 AM, Drunkard Zhang wrote:
 I'm using Asus PIKE 6480 SAS card, whose chipset is RAID bus
 controller: Marvell Technology Group Ltd. MV64460/64461/64462 System
 Controller, Revision B, with latest stable branch 3.7.1 only 1 of 8
 ports works, to get others works I got to pull  plug back. While with
 3.5.7 it's all good, so it must be a regression.

Can you try to revert commit beecadea1b8d67f591b13f7099559f32f3fd601d?

Particularly, can you first change

  #define bit(n) ((u64)1  n)

in drivers/scsi/mvsas/mv_sas.h back to

  #define bit(n) ((u32)1  n)

and see if it works for you?  Thanks.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] [SCSI] mvsas: fix undefined bit shift

2012-11-16 Thread Xi Wang

The macro bit(n) is defined as ((u32)1 << n), and thus it doesn't work
with n >= 32, such as in mvs_94xx_assign_reg_set():

if (i >= 32) {
mvi->sata_reg_set |= bit(i);
...
}

The shift ((u32)1 << n) with n >= 32 also leads to undefined behavior.
The result varies depending on the architecture.

This patch changes bit(n) to do a 64-bit shift.  It also simplifies
mv_ffc64() using __ffs64(), since invoking ffz() with ~0 is undefined.

Signed-off-by: Xi Wang 
Cc: Xiangliang Yu 
Cc: James Bottomley 
Cc: sta...@vger.kernel.org
---
As suggested by James, v2 is a single patch with a stable tag.
---
 drivers/scsi/mvsas/mv_94xx.h |   14 ++
 drivers/scsi/mvsas/mv_sas.h  |2 +-
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_94xx.h b/drivers/scsi/mvsas/mv_94xx.h
index 8f7eb4f..487aa6f 100644
--- a/drivers/scsi/mvsas/mv_94xx.h
+++ b/drivers/scsi/mvsas/mv_94xx.h
@@ -258,21 +258,11 @@ enum sas_sata_phy_regs {
 #define SPI_ADDR_VLD_94XX  (1U << 1)
 #define SPI_CTRL_SpiStart_94XX (1U << 0)
 
-#define mv_ffc(x)   ffz(x)
-
 static inline int
 mv_ffc64(u64 v)
 {
-   int i;
-   i = mv_ffc((u32)v);
-   if (i >= 0)
-   return i;
-   i = mv_ffc((u32)(v>>32));
-
-   if (i != 0)
-   return 32 + i;
-
-   return -1;
+   u64 x = ~v;
+   return x ? __ffs64(x) : -1;
 }
 
 #define r_reg_set_enable(i) \
diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
index c04a4f5..da24955 100644
--- a/drivers/scsi/mvsas/mv_sas.h
+++ b/drivers/scsi/mvsas/mv_sas.h
@@ -69,7 +69,7 @@ extern struct kmem_cache *mvs_task_list_cache;
 #define DEV_IS_EXPANDER(type)  \
((type == EDGE_DEV) || (type == FANOUT_DEV))
 
-#define bit(n) ((u32)1 << n)
+#define bit(n) ((u64)1 << n)
 
 #define for_each_phy(__lseq_mask, __mc, __lseq)\
for ((__mc) = (__lseq_mask), (__lseq) = 0;  \
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] [SCSI] mvsas: fix undefined bit shift

2012-11-16 Thread Xi Wang

The macro bit(n) is defined as ((u32)1  n), and thus it doesn't work
with n = 32, such as in mvs_94xx_assign_reg_set():

if (i = 32) {
mvi-sata_reg_set |= bit(i);
...
}

The shift ((u32)1  n) with n = 32 also leads to undefined behavior.
The result varies depending on the architecture.

This patch changes bit(n) to do a 64-bit shift.  It also simplifies
mv_ffc64() using __ffs64(), since invoking ffz() with ~0 is undefined.

Signed-off-by: Xi Wang xi.w...@gmail.com
Cc: Xiangliang Yu yuxia...@marvell.com
Cc: James Bottomley jbottom...@parallels.com
Cc: sta...@vger.kernel.org
---
As suggested by James, v2 is a single patch with a stable tag.
---
 drivers/scsi/mvsas/mv_94xx.h |   14 ++
 drivers/scsi/mvsas/mv_sas.h  |2 +-
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_94xx.h b/drivers/scsi/mvsas/mv_94xx.h
index 8f7eb4f..487aa6f 100644
--- a/drivers/scsi/mvsas/mv_94xx.h
+++ b/drivers/scsi/mvsas/mv_94xx.h
@@ -258,21 +258,11 @@ enum sas_sata_phy_regs {
 #define SPI_ADDR_VLD_94XX  (1U  1)
 #define SPI_CTRL_SpiStart_94XX (1U  0)
 
-#define mv_ffc(x)   ffz(x)
-
 static inline int
 mv_ffc64(u64 v)
 {
-   int i;
-   i = mv_ffc((u32)v);
-   if (i = 0)
-   return i;
-   i = mv_ffc((u32)(v32));
-
-   if (i != 0)
-   return 32 + i;
-
-   return -1;
+   u64 x = ~v;
+   return x ? __ffs64(x) : -1;
 }
 
 #define r_reg_set_enable(i) \
diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
index c04a4f5..da24955 100644
--- a/drivers/scsi/mvsas/mv_sas.h
+++ b/drivers/scsi/mvsas/mv_sas.h
@@ -69,7 +69,7 @@ extern struct kmem_cache *mvs_task_list_cache;
 #define DEV_IS_EXPANDER(type)  \
((type == EDGE_DEV) || (type == FANOUT_DEV))
 
-#define bit(n) ((u32)1  n)
+#define bit(n) ((u64)1  n)
 
 #define for_each_phy(__lseq_mask, __mc, __lseq)\
for ((__mc) = (__lseq_mask), (__lseq) = 0;  \
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm: fix null dev in dma_pool_create()

2012-11-13 Thread Xi Wang

On 11/13/12 7:58 PM, Andrew Morton wrote:
> I'm not sure that I really suggested doing this :(

You suggested WARN_ON_ONCE(!dev); I changed it to WARN_ON(!dev) and
kept the NULL check..
 
> We know there are a few scruffy drivers which are passing in dev==0.
> 
> Those drivers don't oops because nobody is testing them on NUMA
> systems.
> 
> With this patch, the kernel will now cause runtime warnings to be
> emitted from those drivers.  Even on non-NUMA systems.
> 
> This is a problem!  What will happen is that this code will get
> released by Linus and will propagate to users mainly via distros and
> eventually end-user bug reports will trickle back saying "hey, I got
> this warning".  Slowly people will fix the scruffy drivers and those
> fixes will propagate out from Linus's tree into -stable and then into
> distros and then into the end-users hands.
> 
> This is *terribly* inefficient!  It's a lot of work for a lot of people
> and it involves long delays.
> 
> So let's not do any of that!  Let us try to get those scruffy drivers
> fixed up *before* we add this warning.
> 
> As a nice side-effect of that work, we can then clean up the dmapool
> code so it doesn't need to worry about handling the dev==0 special
> case.
> 
> So.  To start this off, can you please generate a list of the offending
> drivers?  Then we can hunt down the maintainers and we'll see what can be
> done.

I like this plan.

Here's the list of drivers that invoke dma_pool_create() with NULL dev,
as well as possible fixes, from previous emails.

* arch/arm/mach-s3c64xx/dma.c

Use dmac->dev for dma_pool_create() in s3c64xx_dma_init1()?  Probably
need to add ->dma_pool to struct s3c64xx_dmac.

* drivers/usb/gadget/amd5536udc.c (2)

Use dev->gadget.dev or dev->pdev->dev for dma_pool_create()?  Also move
the init_dma_pools() call after the assignments in udc_pci_probe().

* drivers/net/wan/ixp4xx_hss.c
* drivers/net/ethernet/xscale/ixp4xx_eth.c

Use port->netdev->dev for dma_pool_create()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mm: fix null dev in dma_pool_create()

2012-11-13 Thread Xi Wang

A few drivers invoke dma_pool_create() with a null dev.  Note that dev
is dereferenced in dev_to_node(dev), causing a null pointer dereference.

A long term solution is to disallow null dev.  Once the drivers are
fixed, we can simplify the core code here.  For now we add WARN_ON(!dev)
to notify the driver maintainers and avoid the null pointer dereference.

Suggested-by: Andrew Morton 
Signed-off-by: Xi Wang 
---
 mm/dmapool.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index c5ab33b..bf7f8f0 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -135,6 +135,7 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
 {
struct dma_pool *retval;
size_t allocation;
+   int node;
 
if (align == 0) {
align = 1;
@@ -159,7 +160,9 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
return NULL;
}
 
-   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
+   node = WARN_ON(!dev) ? -1 : dev_to_node(dev);
+
+   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, node);
if (!retval)
return retval;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mm: fix null dev in dma_pool_create()

2012-11-13 Thread Xi Wang

A few drivers invoke dma_pool_create() with a null dev.  Note that dev
is dereferenced in dev_to_node(dev), causing a null pointer dereference.

A long term solution is to disallow null dev.  Once the drivers are
fixed, we can simplify the core code here.  For now we add WARN_ON(!dev)
to notify the driver maintainers and avoid the null pointer dereference.

Suggested-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Xi Wang xi.w...@gmail.com
---
 mm/dmapool.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index c5ab33b..bf7f8f0 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -135,6 +135,7 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
 {
struct dma_pool *retval;
size_t allocation;
+   int node;
 
if (align == 0) {
align = 1;
@@ -159,7 +160,9 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
return NULL;
}
 
-   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
+   node = WARN_ON(!dev) ? -1 : dev_to_node(dev);
+
+   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, node);
if (!retval)
return retval;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm: fix null dev in dma_pool_create()

2012-11-13 Thread Xi Wang

On 11/13/12 7:58 PM, Andrew Morton wrote:
 I'm not sure that I really suggested doing this :(

You suggested WARN_ON_ONCE(!dev); I changed it to WARN_ON(!dev) and
kept the NULL check..
 
 We know there are a few scruffy drivers which are passing in dev==0.
 
 Those drivers don't oops because nobody is testing them on NUMA
 systems.
 
 With this patch, the kernel will now cause runtime warnings to be
 emitted from those drivers.  Even on non-NUMA systems.
 
 This is a problem!  What will happen is that this code will get
 released by Linus and will propagate to users mainly via distros and
 eventually end-user bug reports will trickle back saying hey, I got
 this warning.  Slowly people will fix the scruffy drivers and those
 fixes will propagate out from Linus's tree into -stable and then into
 distros and then into the end-users hands.
 
 This is *terribly* inefficient!  It's a lot of work for a lot of people
 and it involves long delays.
 
 So let's not do any of that!  Let us try to get those scruffy drivers
 fixed up *before* we add this warning.
 
 As a nice side-effect of that work, we can then clean up the dmapool
 code so it doesn't need to worry about handling the dev==0 special
 case.
 
 So.  To start this off, can you please generate a list of the offending
 drivers?  Then we can hunt down the maintainers and we'll see what can be
 done.

I like this plan.

Here's the list of drivers that invoke dma_pool_create() with NULL dev,
as well as possible fixes, from previous emails.

* arch/arm/mach-s3c64xx/dma.c

Use dmac-dev for dma_pool_create() in s3c64xx_dma_init1()?  Probably
need to add -dma_pool to struct s3c64xx_dmac.

* drivers/usb/gadget/amd5536udc.c (2)

Use dev-gadget.dev or dev-pdev-dev for dma_pool_create()?  Also move
the init_dma_pools() call after the assignments in udc_pci_probe().

* drivers/net/wan/ixp4xx_hss.c
* drivers/net/ethernet/xscale/ixp4xx_eth.c

Use port-netdev-dev for dma_pool_create()?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] [SCSI] mvsas: fix shift in mvs_94xx_free_reg_set()

2012-11-09 Thread Xi Wang


On 11/9/12 2:30 AM, Xiangliang Yu wrote:

Agree with James, and just need to do NOT operation one time


Thanks for reviewing the patches.

Okay I'll remove patch 2 in v2 then.


About patch 3, I check the ffz code and found it will check ~0 conditions.


Can you point me to the ~0 check in ffz code?  I also feel like using
__ffs64 makes the code simpler.

Does patch 1 look good to you?  Thanks.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] [SCSI] mvsas: fix shift in mvs_94xx_free_reg_set()

2012-11-09 Thread Xi Wang


On 11/9/12 2:30 AM, Xiangliang Yu wrote:

Agree with James, and just need to do NOT operation one time


Thanks for reviewing the patches.

Okay I'll remove patch 2 in v2 then.


About patch 3, I check the ffz code and found it will check ~0 conditions.


Can you point me to the ~0 check in ffz code?  I also feel like using
__ffs64 makes the code simpler.

Does patch 1 look good to you?  Thanks.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] [SCSI] mvsas: fix shift in mvs_94xx_free_reg_set()

2012-11-06 Thread Xi Wang


On 11/6/12 7:06 AM, James Bottomley wrote:


Why is this necessary?  As I read the reg set assignment code, it finds
a free bit in the 64 bit register and uses that ... which can never be
greater than 64 so there's no need for the check.


This patch just tries to be more defensive for bit(reg_set) with a
broken reg_set value.  I agree with you that it's not that necessary.


The other two look OK (probably redone as a single patch with a stable
tag), but I'd like the input of the mvs people since it seems with the
current code, we only use 32 bit regsets and probably hang if we go over
that.  The bug fix is either to enable the full 64 if it works, or
possibly cap at 32 ... what works with all released devices?


Thanks for reviewing.  Yeah we'd better to wait for the input from
the mvs people.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix NULL checking in dma_pool_create()

2012-11-06 Thread Xi Wang

On 11/5/12 4:26 PM, Andrew Morton wrote:
> 
> OK, so it seems that those drivers have never been tested on a
> CONFIG_NUMA kernel.  whee.
> 
> So we have a large amount of code here which ostensibly supports
> dev==NULL but which has not been well tested.  Take a look at
> dma_alloc_coherent(), dma_free_coherent() - are they safe?  Unobvious.

It's probably ok to call dma_alloc_coherent()/dma_free_coherent() with a
NULL dev.  Quite a few drivers do that.

> dmam_pool_destroy() will clearly cause an oops:
> 
> devres_destroy()
> ->devres_remove()
>->spin_lock_irqsave(>devres_lock, flags);

Not sure if I missed anything, but I haven't found any use of
dmam_pool_destroy() in the tree..

> I'm thinking we should disallow dev==NULL.  We have a lot of code in
> mm/dmapool.c which _attempts_ to support this case, but is largely
> untested and obviously isn't working.  I don't think it's a good idea
> to try to fix up and then support this case on behalf of a handful of
> scruffy drivers.  It would be better to fix the drivers, then simplify
> the core code.  drivers/usb/gadget/amd5536udc.c can probably use
> dev->gadget.dev and drivers/net/wan/ixp4xx_hss.c can probably use
> port->netdev->dev, etc.

After more search I've still only found 4 files that invoke
dma_pool_create() with a NULL dev.

arch/arm/mach-s3c64xx/dma.c
drivers/usb/gadget/amd5536udc.c
drivers/net/wan/ixp4xx_hss.c
drivers/net/ethernet/xscale/ixp4xx_eth.c

So, yeah, we could fix those drivers instead, such as adding
"WARN_ON_ONCE(dev == NULL)" as you suggested.

- xi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix NULL checking in dma_pool_create()

2012-11-06 Thread Xi Wang

On 11/5/12 4:26 PM, Andrew Morton wrote:
 
 OK, so it seems that those drivers have never been tested on a
 CONFIG_NUMA kernel.  whee.
 
 So we have a large amount of code here which ostensibly supports
 dev==NULL but which has not been well tested.  Take a look at
 dma_alloc_coherent(), dma_free_coherent() - are they safe?  Unobvious.

It's probably ok to call dma_alloc_coherent()/dma_free_coherent() with a
NULL dev.  Quite a few drivers do that.

 dmam_pool_destroy() will clearly cause an oops:
 
 devres_destroy()
 -devres_remove()
-spin_lock_irqsave(dev-devres_lock, flags);

Not sure if I missed anything, but I haven't found any use of
dmam_pool_destroy() in the tree..

 I'm thinking we should disallow dev==NULL.  We have a lot of code in
 mm/dmapool.c which _attempts_ to support this case, but is largely
 untested and obviously isn't working.  I don't think it's a good idea
 to try to fix up and then support this case on behalf of a handful of
 scruffy drivers.  It would be better to fix the drivers, then simplify
 the core code.  drivers/usb/gadget/amd5536udc.c can probably use
 dev-gadget.dev and drivers/net/wan/ixp4xx_hss.c can probably use
 port-netdev-dev, etc.

After more search I've still only found 4 files that invoke
dma_pool_create() with a NULL dev.

arch/arm/mach-s3c64xx/dma.c
drivers/usb/gadget/amd5536udc.c
drivers/net/wan/ixp4xx_hss.c
drivers/net/ethernet/xscale/ixp4xx_eth.c

So, yeah, we could fix those drivers instead, such as adding
WARN_ON_ONCE(dev == NULL) as you suggested.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] [SCSI] mvsas: fix shift in mvs_94xx_free_reg_set()

2012-11-06 Thread Xi Wang


On 11/6/12 7:06 AM, James Bottomley wrote:


Why is this necessary?  As I read the reg set assignment code, it finds
a free bit in the 64 bit register and uses that ... which can never be
greater than 64 so there's no need for the check.


This patch just tries to be more defensive for bit(reg_set) with a
broken reg_set value.  I agree with you that it's not that necessary.


The other two look OK (probably redone as a single patch with a stable
tag), but I'd like the input of the mvs people since it seems with the
current code, we only use 32 bit regsets and probably hang if we go over
that.  The bug fix is either to enable the full 64 if it works, or
possibly cap at 32 ... what works with all released devices?


Thanks for reviewing.  Yeah we'd better to wait for the input from
the mvs people.

- xi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 120 matches

Mail list logo