Re: [PATCH] um: ubd: Fix crash from option parsing

2021-01-21 Thread Hajime Tazaki


Hello,

On Wed, 20 Jan 2021 03:19:45 +0900,
Paul Lawrence wrote:
> 
> Below patch will cause NULL ptr dereferences if the optional filenames
> are not present.
> 
> Fixes: ef3ba87cb7c9 (um: ubd: Set device serial attribute from cmdline)
> Signed-off-by: Paul Lawrence 

This was addressed/fixed by the below patch, though that one doesn't
the first "file" variable check.

http://lists.infradead.org/pipermail/linux-um/2020-December/000983.html

There was another attempt to fix (with the same diff), btw.

http://lists.infradead.org/pipermail/linux-um/2021-January/000998.html

It seems that the patch is already queued but not upstreamed yet.

-- Hajime


Re: [RFC PATCH 00/28] Linux Kernel Library

2015-11-08 Thread Hajime Tazaki

Hello Octavian,

At Tue,  3 Nov 2015 22:20:31 +0200,
Octavian Purdila wrote:
> 
> 
> Q: How is LKL different from LibOS?
> A: LibOS re-implements high-level kernel APIs for timers, softirqs,
> scheduling, sysctl, SLAB/SLUB, etc. LKL behaves like any arch port,
> implementing the arch level operations requested by the Linux kernel. LKL
> also offers a host interface so that support for multiple hosts can be
> easily implemented.

I review most of code with the help of document and paper (2010).

I think LKL and LibOS are essentially the same thing.

I describe the current differences of both features, which I
believe there are no fundamental ones (i.e., both can
improve by putting some efforts).

- LKL
 (beautiful) arch implementation (I like it)
 fully kbuild compliant
 rich fs support
 host support: POSIX, win, haiku, etc

- LibOS
 existing application integration
 (semi-automated) system call table generation
 multiple process support (via system call proxy)
 various network backends (raw socket, DPDK, netmap, tap)
 symbol namespace separation
 host support: == rump hypercall (POSIX, xen,
   qemu/kvm/baremetal(under development)), ns-3 simulator

# I can't find network support within the current patch but
  there is/will be a certain code that LKL can play with
  networking subsystem.

existing application integration is really important when
you want to configure network stack: since the configuration
of file systems is just a mount(), but configurations of
network stack need much userspace applications like iproute2
(ip, ss, tc) etc, which is not trivial to re-implement.


-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/28] Linux Kernel Library

2015-11-08 Thread Hajime Tazaki

Hello Octavian,

At Tue,  3 Nov 2015 22:20:31 +0200,
Octavian Purdila wrote:
> 
> 
> Q: How is LKL different from LibOS?
> A: LibOS re-implements high-level kernel APIs for timers, softirqs,
> scheduling, sysctl, SLAB/SLUB, etc. LKL behaves like any arch port,
> implementing the arch level operations requested by the Linux kernel. LKL
> also offers a host interface so that support for multiple hosts can be
> easily implemented.

I review most of code with the help of document and paper (2010).

I think LKL and LibOS are essentially the same thing.

I describe the current differences of both features, which I
believe there are no fundamental ones (i.e., both can
improve by putting some efforts).

- LKL
 (beautiful) arch implementation (I like it)
 fully kbuild compliant
 rich fs support
 host support: POSIX, win, haiku, etc

- LibOS
 existing application integration
 (semi-automated) system call table generation
 multiple process support (via system call proxy)
 various network backends (raw socket, DPDK, netmap, tap)
 symbol namespace separation
 host support: == rump hypercall (POSIX, xen,
   qemu/kvm/baremetal(under development)), ns-3 simulator

# I can't find network support within the current patch but
  there is/will be a certain code that LKL can play with
  networking subsystem.

existing application integration is really important when
you want to configure network stack: since the configuration
of file systems is just a mount(), but configurations of
network stack need much userspace applications like iproute2
(ip, ss, tc) etc, which is not trivial to re-implement.


-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 04/28] lkl: host interface

2015-11-03 Thread Hajime Tazaki

At Tue,  3 Nov 2015 22:20:35 +0200,
Octavian Purdila wrote:
> 
> This patch introduces the host operations that define the interface
> between the LKL and the host. These operations must be provided either
> by a host library or by the application itself.
(snip)
> +struct lkl_host_operations {
> + const char *virtio_devices;
> +
> + void (*print)(const char *str, int len);
> + void (*panic)(void);
> +
> + void* (*sem_alloc)(int count);
> + void (*sem_free)(void *sem);
> + void (*sem_up)(void *sem);
> + void (*sem_down)(void *sem);
> +
> + int (*thread_create)(void (*f)(void *), void *arg);
> + void (*thread_exit)(void);
> +
> + void* (*mem_alloc)(unsigned long);
> + void (*mem_free)(void *);
> +
> + unsigned long long (*time)(void);
> +
> + void* (*timer_alloc)(void (*fn)(void *), void *arg);
> + int (*timer_set_oneshot)(void *timer, unsigned long delta);
> + void (*timer_free)(void *timer);
> +
> + void* (*ioremap)(long addr, int size);
> + int (*iomem_access)(const volatile void *addr, void *val, int size,
> + int write);
> +
> +};

this is related to the thing that I'm improving libos right now.
my current conclusion is using rump hypercall interfaces,
which I'm currently working on. 

we (libos and lkl) may have matured interface as well as
reduce/share the effort to have more underlying (host)
calls.

-- Hajime

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/28] Linux Kernel Library

2015-11-03 Thread Hajime Tazaki

At Tue, 3 Nov 2015 22:45:45 +,
Richard W.M. Jones wrote:

> > > * cptofs/cpfromfs - a tool that copies files to/from a filesystem image
> > 
> > Seeing forward to have a libguestfs port. :-)
> 
> Thanks - I was keeping an eye on libos (and on the NetBSD rump kernel
> stuff before), ready to integrate them into libguestfs as soon as they
> offered filesystem access.

I've been working on fs support on libos recently during
the integration with rump kernel _hypercall_, though it's
still in the middle (but open(2) in a specific condition
works fine at least).

https://github.com/libos-nuse/net-next-nuse/tree/rump-hypcall

I would expect to see more concrete patchset in near future.

-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/28] Linux Kernel Library

2015-11-03 Thread Hajime Tazaki

At Tue, 3 Nov 2015 22:45:45 +,
Richard W.M. Jones wrote:

> > > * cptofs/cpfromfs - a tool that copies files to/from a filesystem image
> > 
> > Seeing forward to have a libguestfs port. :-)
> 
> Thanks - I was keeping an eye on libos (and on the NetBSD rump kernel
> stuff before), ready to integrate them into libguestfs as soon as they
> offered filesystem access.

I've been working on fs support on libos recently during
the integration with rump kernel _hypercall_, though it's
still in the middle (but open(2) in a specific condition
works fine at least).

https://github.com/libos-nuse/net-next-nuse/tree/rump-hypcall

I would expect to see more concrete patchset in near future.

-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 04/28] lkl: host interface

2015-11-03 Thread Hajime Tazaki

At Tue,  3 Nov 2015 22:20:35 +0200,
Octavian Purdila wrote:
> 
> This patch introduces the host operations that define the interface
> between the LKL and the host. These operations must be provided either
> by a host library or by the application itself.
(snip)
> +struct lkl_host_operations {
> + const char *virtio_devices;
> +
> + void (*print)(const char *str, int len);
> + void (*panic)(void);
> +
> + void* (*sem_alloc)(int count);
> + void (*sem_free)(void *sem);
> + void (*sem_up)(void *sem);
> + void (*sem_down)(void *sem);
> +
> + int (*thread_create)(void (*f)(void *), void *arg);
> + void (*thread_exit)(void);
> +
> + void* (*mem_alloc)(unsigned long);
> + void (*mem_free)(void *);
> +
> + unsigned long long (*time)(void);
> +
> + void* (*timer_alloc)(void (*fn)(void *), void *arg);
> + int (*timer_set_oneshot)(void *timer, unsigned long delta);
> + void (*timer_free)(void *timer);
> +
> + void* (*ioremap)(long addr, int size);
> + int (*iomem_access)(const volatile void *addr, void *val, int size,
> + int write);
> +
> +};

this is related to the thing that I'm improving libos right now.
my current conclusion is using rump hypercall interfaces,
which I'm currently working on. 

we (libos and lkl) may have matured interface as well as
reduce/share the effort to have more underlying (host)
calls.

-- Hajime

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 04/10] lib: time handling (kernel glue code)

2015-09-03 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/hrtimer.c | 117 ++
 arch/lib/tasklet-hrtimer.c |  57 +
 arch/lib/time.c| 116 ++
 arch/lib/timer.c   | 299 +
 4 files changed, 589 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index ..6a99bad6c5b7
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,117 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim-assert.h"
+#include "sim.h"
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer->function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer->_softexpires),
+ , timer);
+   timer->base = event;
+   } else {
+   /* mark as completed. */
+   timer->base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   "slack" range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer->_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, , timer);
+   timer->base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer->base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer->base);
+   timer->base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret >= 0);
+   return ret;
+}
+void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index ..fef4902d4938
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim.h"
+#include "sim-assert.h"
+
+static enum hrtimer_restart __hrtimer_tas

[PATCH v6 08/10] lib: auxiliary files for auto-generated asm-generic files of libos

2015-09-03 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/include/asm/Kbuild   | 57 
 arch/lib/include/asm/atomic.h | 62 +++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 +
 arch/lib/include/asm/current.h|  7 
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 
 arch/lib/include/asm/pgtable.h| 30 +
 arch/lib/include/asm/processor.h  | 19 +++
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 
 arch/lib/include/asm/thread_info.h| 36 
 arch/lib/include/asm/uaccess.h| 14 
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 334 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index ..c647b1ca8cca
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index ..f72c3a8ca48c
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,62 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include 
+#include 
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)&(v)->counter)
+static inline void atomic64_add(long i, atomic64_t *v)
+{
+   v->counter += i;
+}
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v->counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v->counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_su

[PATCH v6 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-09-03 Thread Hajime Tazaki
context primitives of kernel such as soft interrupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 238 ++
 4 files changed, 828 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index ..98a568a16903
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "lib.h"
+#include "sim.h"
+#include "sim-assert.h"
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(>kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid->numbers[0].nr = pid;
+   cred->fsuid = make_kuid(current_user_ns(), 0);
+   cred->fsgid = make_kgid(current_user_ns(), 0);
+   cred->user = user;
+   atomic_set(>usage, 1);
+   info->task = >kernel_task;
+   info->preempt_count = 0;
+   info->flags = 0;
+   atomic_set(>count, 1);
+   ns->uts_ns = 0;
+   ns->ipc_ns = 0;
+   ns->mnt_ns = 0;
+   ns->pid_ns_for_children = 0;
+   ns->net_ns = _net;
+   task->kernel_task.cred = cred;
+   task->kernel_task.pid = pid;
+   task->kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task->kernel_task.nsproxy = ns;
+   task->kernel_task.stack = info;
+   /* this is a hack. */
+   task->kernel_task.group_leader = >kernel_task;
+   task->private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task->kernel_task.nsproxy);
+   lib_free((void *)task->kernel_task.cred);
+   lib_free((void *)task->kernel_task.cred->user);
+   free_thread_info(task->kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task->private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task->kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return _task->kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   list_add(>task_list, >task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(>task_list, >task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait->task_list.prev != LIST_POISON2)
+   list_del(>task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(>task_list))

[PATCH v6 09/10] lib: libos build scripts and documentation

2015-09-03 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 235 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 655 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1265 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index ..fbf7946f42ef
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a shared part of
+applications (NUSE/DCE). Pure

[PATCH v6 07/10] lib: other kernel glue layer code

2015-09-03 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Christoph Paasch 
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 +
 arch/lib/glue.c   | 284 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 ++
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  36 +++
 arch/lib/random.c |  54 ++
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 731 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index ..3a1f30129fb7
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "linux/capability.h"
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index ..ce424ffae8c2
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ *     Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index ..33efe5f1da32
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include 
+
+#include "sim-assert.h"
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = _mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return _mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type->name;
+
+   if (flags & MS_KERNMOUNT)
+   mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type->mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt->mnt.mnt_root = root;
+   mnt->mnt.mnt_sb = root->d_sb;
+   mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+   mnt->mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(>mnt_instance, >d_sb->s_mounts);
+
+   return >mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 65535;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index ..bdbed913ee9e
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,284 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include /* loff_t */
+#include /* ESPIPE */
+#include

[PATCH v6 10/10] lib: tools used for test scripts

2015-09-03 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki 
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index ..57a74a05482c
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index ..a27eb84e7712
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo "bzr is missing" 
&& ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo "autotools 
is missing" && ./bake/bake.py check && exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
"pygoocanvas|graphviz|python-dev" | grep Missing && (echo "required packages 
are missing") || echo ""
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build $(BAKEBUILD_PARAMS)
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index ..51ac5a52336e
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index ..9377ac3214c1
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM=" -s dce-umip"
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index ..e81e2d84c156
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE="-v"
+VALGRIND=""
+FAULT_INJECTION=""
+
+if [ "$1" = "-g" ] ; then
+ VALGRIND="-g"
+# Not implemneted yet.
+#elif [ "$1" = "-f" ] ; then
+# FAULT_INJECTION="-f"
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index ..198e7e4c66ac
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/"`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.con

[PATCH v6 06/10] lib: sysctl handling (kernel glue code)

2015-09-03 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registered via
lib_init() API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index ..5f08f9f97103
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sim-assert.h"
+#include "sim-types.h"
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int nr_pdflush_threads = 0;
+unsigned long scan_unevictable_pages = 0;
+int suid_dumpable = 0;
+int pa

[PATCH v6 00/10] an introduction of Linux library operating system (LibOS)

2015-09-03 Thread Hajime Tazaki
stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild .cmd files

patchset history
-
[v5] : https://lkml.org/lkml/2015/5/13/25
[v4] : https://lkml.org/lkml/2015/4/26/279
[v3] : https://lkml.org/lkml/2015/4/19/63
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of Linux library operating system (LibOS).

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personality and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as an API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-linus-upstream-libos-v6


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools


Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for  arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxiliary files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 235 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  25 ++
 arch/lib/defconfig| 655 ++
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/glue.c   | 284 +++
 arch/lib/hrtimer.c| 117 ++
 arch/lib/include/asm/Kbuild   |  57 +++
 arch/lib/include/asm/atomic.h |  62 
 arch/lib/include/asm/barrier.h|   8 +
 arch/lib/include/asm/bitsperlong.h|  16 +
 arch/lib/include/asm/current.h|   7 +
 arch/lib/include/asm/elf.h|  10 +
 arch/lib/include/asm/hardirq.h|   8 +
 arch/lib/include/asm/page.h   |  14 +
 arch/lib/include/asm/pgtable.h|  30 ++
 arch/lib/include/asm/processor.h  |  19 +
 arch/lib/include/asm/ptrace.h |   4 +
 arch/lib/include/asm/segment.h|   6 +
 arch/lib/include/asm/sembuf.h |   4 +
 arch/lib/include/asm/shmbuf.h |   4 +
 arch/lib/include/asm/shmparam.h   |   4 +
 arch/lib/include/asm/sigcontext.h |   6 +
 arch/lib/include/asm/stat.h   |   4 +
 arch/lib/include/asm/statfs.h |   4 +
 arch/lib/include/asm/swab.h   |   7 +
 arch/lib/include/asm/thread_info.h|  36 ++
 arch/lib/include/asm/uaccess.h|  

[PATCH v6 03/10] lib: public headers and API implementations for userspace programs

2015-09-03 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 +++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +
 arch/lib/lib-socket.c | 370 ++
 arch/lib/lib.c| 296 +
 arch/lib/lib.h|  21 +++
 9 files changed, 1148 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index ..974122c3a0f1
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include "sim-printf.h"
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf("Assert failed %s:%u \"" #v "\"\n",  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index ..e871a594b82c
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include 
+#include "sim-types.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struc

[PATCH v6 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-09-03 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki 
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 209 +++
 5 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a99f0e5243e1..104c1aeec560 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index ..d9fe7d59bd4e
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eaeabdcb..7d8314f95ce3 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 8da63e4e470f..2cf4f0f67a19 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include 
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#endif
+
 #include 
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index ..974c8aed0275
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,209 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+#include 
+#include 
+#include 
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0 && (flags & __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, , sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret && p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache->name = name;
+   cache->size = size;
+   cache->align = align;
+   cache->flags = flags;
+   cache->ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{
+   kfree(cache);
+}
+int kmem_cache_shrink(struct kmem_cache *cache)
+{
+   retu

[PATCH v6 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-09-03 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki 
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fdda62e6115e..e1003cf51d22 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -57,7 +57,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -99,8 +99,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -335,7 +336,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(_lock);
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (entry && use_table(head))
*phead = head;
else
@@ -356,7 +357,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -374,7 +375,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -707,7 +708,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, , ); h; next_entry(, )) {
+   for (ctl_table_first_entry(ctl_dir, , ); h;
+ctl_table_next_entry(, )) {
if (!scan(h, entry, , file, ctx)) {
sysctl_head_finish(h);
break;
@@ -865,7 +867,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry->mode))
@@ -961,13 +963,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir->header.parent)
return >dir;
-   parent = xlate_dir(set, dir->header.parent);
+   parent = ctl_table_xlate_dir(set, dir->header.parent);
if (IS_ERR(parent))
return parent;
procname = dir->header.ctl_table[0].procname;
@@ -988,13 +990,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(_lock);
root = (*pentry)->data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)->parent);
+   dir = ctl_table_xlate_dir(set, (*phead)->parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)->procname;
head = NULL;
-   entry = find_entry(, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry && use_table(head)) {
unuse_table(*phead);
@@ -1106,7 +1108,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry->procname; entry++) {
const char *procname = entry->procname;
-   link = find_entry(, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(, dir, procname, 
strlen(procname));
if (!link)
return false;
if (S_ISDIR(link->mode

[PATCH v6 09/10] lib: libos build scripts and documentation

2015-09-03 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
Signed-off-by: Ryo Nakamura <u...@haeena.net>
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 235 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 655 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1265 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index ..fbf7946f42ef
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a

[PATCH v6 08/10] lib: auxiliary files for auto-generated asm-generic files of libos

2015-09-03 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 arch/lib/include/asm/Kbuild   | 57 
 arch/lib/include/asm/atomic.h | 62 +++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 +
 arch/lib/include/asm/current.h|  7 
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 
 arch/lib/include/asm/pgtable.h| 30 +
 arch/lib/include/asm/processor.h  | 19 +++
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 
 arch/lib/include/asm/thread_info.h| 36 
 arch/lib/include/asm/uaccess.h| 14 
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 334 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index ..c647b1ca8cca
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index ..f72c3a8ca48c
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,62 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include 
+#include 
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)&(v)->counter)
+static inline void atomic64_add(long i, atomic64_t *v)
+{
+   v->counter += i;
+}
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v->counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v->counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)  

[PATCH v6 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-09-03 Thread Hajime Tazaki
context primitives of kernel such as soft interrupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 238 ++
 4 files changed, 828 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index ..98a568a16903
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ *     Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "lib.h"
+#include "sim.h"
+#include "sim-assert.h"
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(>kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid->numbers[0].nr = pid;
+   cred->fsuid = make_kuid(current_user_ns(), 0);
+   cred->fsgid = make_kgid(current_user_ns(), 0);
+   cred->user = user;
+   atomic_set(>usage, 1);
+   info->task = >kernel_task;
+   info->preempt_count = 0;
+   info->flags = 0;
+   atomic_set(>count, 1);
+   ns->uts_ns = 0;
+   ns->ipc_ns = 0;
+   ns->mnt_ns = 0;
+   ns->pid_ns_for_children = 0;
+   ns->net_ns = _net;
+   task->kernel_task.cred = cred;
+   task->kernel_task.pid = pid;
+   task->kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task->kernel_task.nsproxy = ns;
+   task->kernel_task.stack = info;
+   /* this is a hack. */
+   task->kernel_task.group_leader = >kernel_task;
+   task->private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task->kernel_task.nsproxy);
+   lib_free((void *)task->kernel_task.cred);
+   lib_free((void *)task->kernel_task.cred->user);
+   free_thread_info(task->kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task->private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task->kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return _task->kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   list_add(>task_list, >task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(>task_list, >task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait->task_list.prev != LIST_POISON2)
+   list_del(>task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)

[PATCH v6 10/10] lib: tools used for test scripts

2015-09-03 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index ..57a74a05482c
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index ..a27eb84e7712
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo "bzr is missing" 
&& ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo "autotools 
is missing" && ./bake/bake.py check && exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
"pygoocanvas|graphviz|python-dev" | grep Missing && (echo "required packages 
are missing") || echo ""
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build $(BAKEBUILD_PARAMS)
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index ..51ac5a52336e
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index ..9377ac3214c1
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM=" -s dce-umip"
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index ..e81e2d84c156
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE="-v"
+VALGRIND=""
+FAULT_INJECTION=""
+
+if [ "$1" = "-g" ] ; then
+ VALGRIND="-g"
+# Not implemneted yet.
+#elif [ "$1" = "-f" ] ; then
+# FAULT_INJECTION="-f"
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index ..198e7e4c66ac
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/"`
+
+# ip route
+# ip address
+# ip link
+
+N

[PATCH v6 07/10] lib: other kernel glue layer code

2015-09-03 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
Signed-off-by: Christoph Paasch <christoph.paa...@gmail.com>
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 +
 arch/lib/glue.c   | 284 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 ++
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  36 +++
 arch/lib/random.c |  54 ++
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 731 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index ..3a1f30129fb7
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ * Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#include "linux/capability.h"
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index ..ce424ffae8c2
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ * Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ * Frederic Urbani
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index ..33efe5f1da32
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ * Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ * Frederic Urbani
+ */
+
+#include 
+
+#include "sim-assert.h"
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = _mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return _mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type->name;
+
+   if (flags & MS_KERNMOUNT)
+   mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type->mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt->mnt.mnt_root = root;
+   mnt->mnt.mnt_sb = root->d_sb;
+   mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+   mnt->mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(>mnt_instance, >d_sb->s_mounts);
+
+   return >mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 65535;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index ..bdbed913ee9e
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,284 @@
+/*
+ * glue code for l

[PATCH v6 06/10] lib: sysctl handling (kernel glue code)

2015-09-03 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registered via
lib_init() API.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index ..5f08f9f97103
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ *     Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sim-assert.h"
+#include "sim-types.h"
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int 

[PATCH v6 03/10] lib: public headers and API implementations for userspace programs

2015-09-03 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
Signed-off-by: Ryo Nakamura <u...@haeena.net>
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 +++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +
 arch/lib/lib-socket.c | 370 ++
 arch/lib/lib.c| 296 +
 arch/lib/lib.h|  21 +++
 9 files changed, 1148 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index ..974122c3a0f1
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ * Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include "sim-printf.h"
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf("Assert failed %s:%u \"" #v "\"\n",  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index ..e871a594b82c
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ * Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include 
+#include "sim-types.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+ 

[PATCH v6 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-09-03 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fdda62e6115e..e1003cf51d22 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -57,7 +57,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -99,8 +99,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -335,7 +336,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(_lock);
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (entry && use_table(head))
*phead = head;
else
@@ -356,7 +357,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -374,7 +375,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -707,7 +708,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, , ); h; next_entry(, )) {
+   for (ctl_table_first_entry(ctl_dir, , ); h;
+ctl_table_next_entry(, )) {
if (!scan(h, entry, , file, ctx)) {
sysctl_head_finish(h);
break;
@@ -865,7 +867,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry->mode))
@@ -961,13 +963,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir->header.parent)
return >dir;
-   parent = xlate_dir(set, dir->header.parent);
+   parent = ctl_table_xlate_dir(set, dir->header.parent);
if (IS_ERR(parent))
return parent;
procname = dir->header.ctl_table[0].procname;
@@ -988,13 +990,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(_lock);
root = (*pentry)->data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)->parent);
+   dir = ctl_table_xlate_dir(set, (*phead)->parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)->procname;
head = NULL;
-   entry = find_entry(, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry && use_table(head)) {
unuse_table(*phead);
@@ -1106,7 +1108,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry->procname; entry++) {
const char *procname = entry->procname;
-   link = find_entry(, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(, dir, procname, 
strlen(procname));
if (!link)
return false;

[PATCH v6 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-09-03 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 209 +++
 5 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a99f0e5243e1..104c1aeec560 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index ..d9fe7d59bd4e
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eaeabdcb..7d8314f95ce3 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 8da63e4e470f..2cf4f0f67a19 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include 
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#endif
+
 #include 
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index ..974c8aed0275
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,209 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ *     Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+#include 
+#include 
+#include 
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0 && (flags & __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, , sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret && p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache->name = name;
+   cache->size = size;
+   cache->align = align;
+   cache->flags = flags;
+   cache->ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{

[PATCH v6 04/10] lib: time handling (kernel glue code)

2015-09-03 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki <thehaj...@gmail.com>
---
 arch/lib/hrtimer.c | 117 ++
 arch/lib/tasklet-hrtimer.c |  57 +
 arch/lib/time.c| 116 ++
 arch/lib/timer.c   | 299 +
 4 files changed, 589 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index ..6a99bad6c5b7
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,117 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ *     Hajime Tazaki <taz...@sfc.wide.ad.jp>
+ */
+
+#include 
+#include "sim-assert.h"
+#include "sim.h"
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer->function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer->_softexpires),
+ , timer);
+   timer->base = event;
+   } else {
+   /* mark as completed. */
+   timer->base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   "slack" range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer->_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, , timer);
+   timer->base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer->base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer->base);
+   timer->base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret >= 0);
+   return ret;
+}
+void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index ..fef4902d4938
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage <mathieu.lac...@gmail.com>
+ * Hajime Tazaki <taz...@sfc.wide.ad.jp>
+

[PATCH v6 00/10] an introduction of Linux library operating system (LibOS)

2015-09-03 Thread Hajime Tazaki
stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild .cmd files

patchset history
-
[v5] : https://lkml.org/lkml/2015/5/13/25
[v4] : https://lkml.org/lkml/2015/4/26/279
[v3] : https://lkml.org/lkml/2015/4/19/63
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of Linux library operating system (LibOS).

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personality and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as an API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-linus-upstream-libos-v6


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools


Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for  arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxiliary files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 235 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  25 ++
 arch/lib/defconfig| 655 ++
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/glue.c   | 284 +++
 arch/lib/hrtimer.c| 117 ++
 arch/lib/include/asm/Kbuild   |  57 +++
 arch/lib/include/asm/atomic.h |  62 
 arch/lib/include/asm/barrier.h|   8 +
 arch/lib/include/asm/bitsperlong.h|  16 +
 arch/lib/include/asm/current.h|   7 +
 arch/lib/include/asm/elf.h|  10 +
 arch/lib/include/asm/hardirq.h|   8 +
 arch/lib/include/asm/page.h   |  14 +
 arch/lib/include/asm/pgtable.h|  30 ++
 arch/lib/include/asm/processor.h  |  19 +
 arch/lib/include/asm/ptrace.h |   4 +
 arch/lib/include/asm/segment.h|   6 +
 arch/lib/include/asm/sembuf.h |   4 +
 arch/lib/include/asm/shmbuf.h |   4 +
 arch/lib/include/asm/shmparam.h   |   4 +
 arch/lib/include/asm/sigcontext.h |   6 +
 arch/lib/include/asm/stat.h   |   4 +
 arch/lib/include/asm/statfs.h |   4 +
 arch/lib/include/asm/swab.h   |   7 +
 arch/lib/include/asm/thread_info.h|  36 ++
 arch/lib/include/asm/uaccess.h|  

[PATCH v5 06/10] lib: sysctl handling (kernel glue code)

2015-05-12 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registered via
lib_init() API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index 000..5f08f9f
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sim-assert.h"
+#include "sim-types.h"
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int nr_pdflush_threads = 0;
+unsigned long scan_unevictable_pages = 0;
+int suid_dumpable = 0;
+int page_cluster 

[PATCH v5 07/10] lib: other kernel glue layer code

2015-05-12 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Christoph Paasch 
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/glue.c   | 289 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 +
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  34 ++
 arch/lib/random.c |  53 +
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 733 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index 000..3a1f301
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "linux/capability.h"
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index 000..ce424ff
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ *     Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index 000..33efe5f
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include 
+
+#include "sim-assert.h"
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = _mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return _mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type->name;
+
+   if (flags & MS_KERNMOUNT)
+   mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type->mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt->mnt.mnt_root = root;
+   mnt->mnt.mnt_sb = root->d_sb;
+   mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+   mnt->mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(>mnt_instance, >d_sb->s_mounts);
+
+   return >mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 65535;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index 000..93f72d1
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,289 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include /* loff_t */
+#include /* ESPIPE */
+#include   /* PAGE_CACHE_SIZE

[PATCH v5 10/10] lib: tools used for test scripts

2015-05-12 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki 
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index 000..57a74a0
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index 000..3da25429
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo "bzr is missing" 
&& ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo "autotools 
is missing" && ./bake/bake.py check && exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
"pygoocanvas|graphviz|python-dev" | grep Missing && (echo "required packages 
are missing") || echo ""
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index 000..51ac5a5
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index 000..9377ac3
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM=" -s dce-umip"
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index 000..e81e2d8
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE="-v"
+VALGRIND=""
+FAULT_INJECTION=""
+
+if [ "$1" = "-g" ] ; then
+ VALGRIND="-g"
+# Not implemneted yet.
+#elif [ "$1" = "-f" ] ; then
+# FAULT_INJECTION="-f"
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index 000..198e7e4
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/"`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat > ${NUSE_CONF} << ENDCONF
+
+interface ${IFNAME}
+   address ${

[PATCH v5 08/10] lib: auxiliary files for auto-generated asm-generic files of libos

2015-05-12 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/include/asm/Kbuild   | 57 +
 arch/lib/include/asm/atomic.h | 59 +++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 ++
 arch/lib/include/asm/current.h|  7 +
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 +
 arch/lib/include/asm/pgtable.h| 30 ++
 arch/lib/include/asm/processor.h  | 19 +++
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 +
 arch/lib/include/asm/thread_info.h| 36 +
 arch/lib/include/asm/uaccess.h| 14 +
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 331 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index 000..c647b1c
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index 000..444a953
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,59 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include 
+#include 
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)&(v)->counter)
+void atomic64_add(long i, atomic64_t *v);
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v->counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v->counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_sub(1LL, (v))
+int atomic64_dec_and_test(atomic64_t *v);
+int atomic64

[PATCH v5 09/10] lib: libos build scripts and documentation

2015-05-12 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 653 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1252 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index 000..fbf7946
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a shared part of
+applications (NUSE/DCE). Pure userspace part

[PATCH v5 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-05-12 Thread Hajime Tazaki
context primitives of kernel such as soft interrupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 242 ++
 4 files changed, 832 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index 000..98a568a
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "lib.h"
+#include "sim.h"
+#include "sim-assert.h"
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(>kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid->numbers[0].nr = pid;
+   cred->fsuid = make_kuid(current_user_ns(), 0);
+   cred->fsgid = make_kgid(current_user_ns(), 0);
+   cred->user = user;
+   atomic_set(>usage, 1);
+   info->task = >kernel_task;
+   info->preempt_count = 0;
+   info->flags = 0;
+   atomic_set(>count, 1);
+   ns->uts_ns = 0;
+   ns->ipc_ns = 0;
+   ns->mnt_ns = 0;
+   ns->pid_ns_for_children = 0;
+   ns->net_ns = _net;
+   task->kernel_task.cred = cred;
+   task->kernel_task.pid = pid;
+   task->kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task->kernel_task.nsproxy = ns;
+   task->kernel_task.stack = info;
+   /* this is a hack. */
+   task->kernel_task.group_leader = >kernel_task;
+   task->private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task->kernel_task.nsproxy);
+   lib_free((void *)task->kernel_task.cred);
+   lib_free((void *)task->kernel_task.cred->user);
+   free_thread_info(task->kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task->private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task->kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return _task->kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   list_add(>task_list, >task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(>task_list, >task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait->task_list.prev != LIST_POISON2)
+   list_del(>task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(>task_list))
+   list_add_

[PATCH v5 04/10] lib: time handling (kernel glue code)

2015-05-12 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/hrtimer.c | 122 +++
 arch/lib/tasklet-hrtimer.c |  57 +++
 arch/lib/time.c| 144 +++
 arch/lib/timer.c   | 238 +
 4 files changed, 561 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index 000..4565b59
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,122 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim-assert.h"
+#include "sim.h"
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer->function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer->_softexpires),
+ , timer);
+   timer->base = event;
+   } else {
+   /* mark as completed. */
+   timer->base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   "slack" range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer->_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, , timer);
+   timer->base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer->base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer->base);
+   timer->base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret >= 0);
+   return ret;
+}
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
+}
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index 000..fef4902
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajim

[PATCH v5 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-05-12 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki 
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 205 +++
 5 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ffd24c8..0288cf8 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index 000..d9fe7d5
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eae..7d8314f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 4c3ac12..2ea37c9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include 
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#endif
+
 #include 
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index 000..37596862
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,205 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+#include 
+#include 
+#include 
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0 && (flags & __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, , sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret && p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache->name = name;
+   cache->size = size;
+   cache->align = align;
+   cache->flags = flags;
+   cache->ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{
+   kfree(cache);
+}
+int kmem_cache_shrink(struct kmem_cache *cache)
+{
+   return 1;
+}
+const char *kmem_cache_name(struct kme

[PATCH v5 03/10] lib: public headers and API implementations for userspace programs

2015-05-12 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 ++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +++
 arch/lib/lib-socket.c | 410 ++
 arch/lib/lib.c| 294 ++
 arch/lib/lib.h|  21 +++
 9 files changed, 1186 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index 000..974122c
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include "sim-printf.h"
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf("Assert failed %s:%u \"" #v "\"\n",  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index 000..e871a59
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include 
+#include "sim-types.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struct SimDevicePacket (*dev_create_packet)(struct SimDevice *dev,
+  

[PATCH v5 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-05-12 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki 
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fea2561..7c5924c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -35,7 +35,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -77,8 +77,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -300,7 +301,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(_lock);
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (entry && use_table(head))
*phead = head;
else
@@ -321,7 +322,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -339,7 +340,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -670,7 +671,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, , ); h; next_entry(, )) {
+   for (ctl_table_first_entry(ctl_dir, , ); h;
+ctl_table_next_entry(, )) {
if (!scan(h, entry, , file, ctx)) {
sysctl_head_finish(h);
break;
@@ -828,7 +830,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry->mode))
@@ -924,13 +926,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir->header.parent)
return >dir;
-   parent = xlate_dir(set, dir->header.parent);
+   parent = ctl_table_xlate_dir(set, dir->header.parent);
if (IS_ERR(parent))
return parent;
procname = dir->header.ctl_table[0].procname;
@@ -951,13 +953,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(_lock);
root = (*pentry)->data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)->parent);
+   dir = ctl_table_xlate_dir(set, (*phead)->parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)->procname;
head = NULL;
-   entry = find_entry(, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry && use_table(head)) {
unuse_table(*phead);
@@ -1069,7 +1071,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry->procname; entry++) {
const char *procname = entry->procname;
-   link = find_entry(, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(, dir, procname, 
strlen(procname));
if (!link)
return false;
if (S_ISDIR(link->mode) &am

[PATCH v5 00/10] an introduction of Linux library operating system (LibOS)

2015-05-12 Thread Hajime Tazaki
This is the 5th version of Linux LibOS patchset which reflects a
couple of comments received from people.

changes from v4:
- Patch 09/10 ("lib: libos build scripts and documentation")
1) lib: fix dependency detection of kernel/time/timeconst.h
   (commented by Richard Weinberger)
- Overall
2) rebased to Linux 4.1-rc3 (4cfceaf0c087f47033f5e61a801f4136d6fb68c6)

changes from v3:
- Patch 09/10 ("lib: libos build scripts and documentation")
1) Remove RFC (now it's a proposal)
2) build environment cleanup (commented by Paul Bolle)
- Overall
3) change based tree from arnd/asm-generic to torvalds/linux.git
   (commented by Richard Weinberger)
4) rebased to Linux 4.1-rc1 (b787f68c36d49bb1d9236f403813641efa74a031)
5) change the title of cover letter a bit

changes from v2:
- Patch 02/11 ("slab: add private memory allocator header for arch/lib")
1) add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
   to 02 (commented by Christoph Lameter)
- Overall
2) rewrite commit log messages

changes from v1:
- Patch 01/11 ("sysctl: make some functions unstatic to access by arch/lib"):
1) add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
- Patch 08/11 ("lib: other kernel glue layer code"):
2) significantly reduce glue codes (stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild .cmd files

patchset history
-
[v4] : https://lkml.org/lkml/2015/4/26/279
[v3] : https://lkml.org/lkml/2015/4/19/63
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of Linux library operating system (LibOS).

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personalization and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-linus-upstream-libos-v5


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools


Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for  arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxiliary files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  25 ++
 arch/lib/defconfig

[PATCH v5 06/10] lib: sysctl handling (kernel glue code)

2015-05-12 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registered via
lib_init() API.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index 000..5f08f9f
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/mm.h
+#include linux/mmzone.h
+#include linux/mman.h
+#include linux/ratelimit.h
+#include linux/proc_fs.h
+#include sim-assert.h
+#include sim-types.h
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int

[PATCH v5 07/10] lib: other kernel glue layer code

2015-05-12 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Christoph Paasch christoph.paa...@gmail.com
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/glue.c   | 289 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 +
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  34 ++
 arch/lib/random.c |  53 +
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 733 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index 000..3a1f301
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/capability.h
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index 000..ce424ff
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ * Frederic Urbani
+ */
+
+#include sim.h
+#include sim-assert.h
+#include linux/fs.h
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index 000..33efe5f
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ * Frederic Urbani
+ */
+
+#include fs/mount.h
+
+#include sim-assert.h
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = local_mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return local_mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type-name;
+
+   if (flags  MS_KERNMOUNT)
+   mnt-mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type-mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt-mnt.mnt_root = root;
+   mnt-mnt.mnt_sb = root-d_sb;
+   mnt-mnt_mountpoint = mnt-mnt.mnt_root;
+   mnt-mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(mnt-mnt_instance, root-d_sb-s_mounts);
+
+   return mnt-mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 65535;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index 000..93f72d1
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,289 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz

[PATCH v5 10/10] lib: tools used for test scripts

2015-05-12 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index 000..57a74a0
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index 000..3da25429
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo bzr is missing 
 ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo autotools 
is missing  ./bake/bake.py check  exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
pygoocanvas|graphviz|python-dev | grep Missing  (echo required packages 
are missing) || echo 
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index 000..51ac5a5
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index 000..9377ac3
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM= -s dce-umip
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index 000..e81e2d8
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE=-v
+VALGRIND=
+FAULT_INJECTION=
+
+if [ $1 = -g ] ; then
+ VALGRIND=-g
+# Not implemneted yet.
+#elif [ $1 = -f ] ; then
+# FAULT_INJECTION=-f
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index 000..198e7e4
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat  ${NUSE_CONF}  ENDCONF
+
+interface ${IFNAME}
+   address ${IPADDR}
+   netmask 255.255.255.0
+   macaddr 00:01:01:01:01:02
+   viftype RAW
+
+route
+   network 0.0.0.0
+   netmask 0.0.0.0
+   gateway ${GW}
+
+ENDCONF
+
+cd ${LIBOS_TOOLS}
+sudo NUSECONF

[PATCH v5 08/10] lib: auxiliary files for auto-generated asm-generic files of libos

2015-05-12 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/include/asm/Kbuild   | 57 +
 arch/lib/include/asm/atomic.h | 59 +++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 ++
 arch/lib/include/asm/current.h|  7 +
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 +
 arch/lib/include/asm/pgtable.h| 30 ++
 arch/lib/include/asm/processor.h  | 19 +++
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 +
 arch/lib/include/asm/thread_info.h| 36 +
 arch/lib/include/asm/uaccess.h| 14 +
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 331 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index 000..c647b1c
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index 000..444a953
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,59 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include linux/types.h
+#include asm-generic/cmpxchg.h
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)(v)-counter)
+void atomic64_add(long i, atomic64_t *v);
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v-counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v-counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_sub(1LL, (v))
+int

[PATCH v5 09/10] lib: libos build scripts and documentation

2015-05-12 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Ryo Nakamura u...@haeena.net
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 653 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1252 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index 000..fbf7946
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a shared part of
+applications

[PATCH v5 00/10] an introduction of Linux library operating system (LibOS)

2015-05-12 Thread Hajime Tazaki
This is the 5th version of Linux LibOS patchset which reflects a
couple of comments received from people.

changes from v4:
- Patch 09/10 (lib: libos build scripts and documentation)
1) lib: fix dependency detection of kernel/time/timeconst.h
   (commented by Richard Weinberger)
- Overall
2) rebased to Linux 4.1-rc3 (4cfceaf0c087f47033f5e61a801f4136d6fb68c6)

changes from v3:
- Patch 09/10 (lib: libos build scripts and documentation)
1) Remove RFC (now it's a proposal)
2) build environment cleanup (commented by Paul Bolle)
- Overall
3) change based tree from arnd/asm-generic to torvalds/linux.git
   (commented by Richard Weinberger)
4) rebased to Linux 4.1-rc1 (b787f68c36d49bb1d9236f403813641efa74a031)
5) change the title of cover letter a bit

changes from v2:
- Patch 02/11 (slab: add private memory allocator header for arch/lib)
1) add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
   to 02 (commented by Christoph Lameter)
- Overall
2) rewrite commit log messages

changes from v1:
- Patch 01/11 (sysctl: make some functions unstatic to access by arch/lib):
1) add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
- Patch 08/11 (lib: other kernel glue layer code):
2) significantly reduce glue codes (stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild .cmd files

patchset history
-
[v4] : https://lkml.org/lkml/2015/4/26/279
[v3] : https://lkml.org/lkml/2015/4/19/63
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of Linux library operating system (LibOS).

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personalization and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-linus-upstream-libos-v5


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools


Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for  arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxiliary files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  25 ++
 arch/lib/defconfig| 653 ++
 arch/lib/filemap.c

[PATCH v5 03/10] lib: public headers and API implementations for userspace programs

2015-05-12 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Ryo Nakamura u...@haeena.net
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 ++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +++
 arch/lib/lib-socket.c | 410 ++
 arch/lib/lib.c| 294 ++
 arch/lib/lib.h|  21 +++
 9 files changed, 1186 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index 000..974122c
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include sim-printf.h
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf(Assert failed %s:%u \ #v \\n,  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index 000..e871a59
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include linux/socket.h
+#include sim-types.h
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struct

[PATCH v5 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-05-12 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fea2561..7c5924c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -35,7 +35,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -77,8 +77,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -300,7 +301,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(sysctl_lock);
-   entry = find_entry(head, dir, name, namelen);
+   entry = ctl_table_find_entry(head, dir, name, namelen);
if (entry  use_table(head))
*phead = head;
else
@@ -321,7 +322,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -339,7 +340,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -670,7 +671,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, h, entry); h; next_entry(h, entry)) {
+   for (ctl_table_first_entry(ctl_dir, h, entry); h;
+ctl_table_next_entry(h, entry)) {
if (!scan(h, entry, pos, file, ctx)) {
sysctl_head_finish(h);
break;
@@ -828,7 +830,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(head, dir, name, namelen);
+   entry = ctl_table_find_entry(head, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry-mode))
@@ -924,13 +926,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir-header.parent)
return set-dir;
-   parent = xlate_dir(set, dir-header.parent);
+   parent = ctl_table_xlate_dir(set, dir-header.parent);
if (IS_ERR(parent))
return parent;
procname = dir-header.ctl_table[0].procname;
@@ -951,13 +953,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(sysctl_lock);
root = (*pentry)-data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)-parent);
+   dir = ctl_table_xlate_dir(set, (*phead)-parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)-procname;
head = NULL;
-   entry = find_entry(head, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(head, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry  use_table(head)) {
unuse_table(*phead);
@@ -1069,7 +1071,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry-procname; entry++) {
const char *procname = entry-procname;
-   link = find_entry(head, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(head, dir, procname, 
strlen(procname));
if (!link)
return false

[PATCH v5 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-05-12 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 205 +++
 5 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ffd24c8..0288cf8 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include linux/slib_def.h
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index 000..d9fe7d5
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eae..7d8314f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 4c3ac12..2ea37c9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include linux/slub_def.h
 #endif
 
+#ifdef CONFIG_SLIB
+#include linux/slib_def.h
+#endif
+
 #include linux/memcontrol.h
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index 000..37596862
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,205 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include sim.h
+#include sim-assert.h
+#include linux/page-flags.h
+#include linux/types.h
+#include linux/slab.h
+#include linux/slib_def.h
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0  (flags  __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, size, sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret  p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache-name = name;
+   cache-size = size;
+   cache-align = align;
+   cache-flags = flags;
+   cache-ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{
+   kfree

[PATCH v5 04/10] lib: time handling (kernel glue code)

2015-05-12 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/hrtimer.c | 122 +++
 arch/lib/tasklet-hrtimer.c |  57 +++
 arch/lib/time.c| 144 +++
 arch/lib/timer.c   | 238 +
 4 files changed, 561 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index 000..4565b59
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,122 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/hrtimer.h
+#include sim-assert.h
+#include sim.h
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer-function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer-_softexpires),
+ trampoline, timer);
+   timer-base = event;
+   } else {
+   /* mark as completed. */
+   timer-base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   slack range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer-_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, trampoline, timer);
+   timer-base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer-base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer-base);
+   timer-base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret = 0);
+   return ret;
+}
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
+}
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index 000..fef4902
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage

[PATCH v5 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-05-12 Thread Hajime Tazaki
context primitives of kernel such as soft interrupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 242 ++
 4 files changed, 832 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index 000..98a568a
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/wait.h
+#include linux/list.h
+#include linux/sched.h
+#include linux/nsproxy.h
+#include linux/hash.h
+#include net/net_namespace.h
+#include lib.h
+#include sim.h
+#include sim-assert.h
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(task-kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid-numbers[0].nr = pid;
+   cred-fsuid = make_kuid(current_user_ns(), 0);
+   cred-fsgid = make_kgid(current_user_ns(), 0);
+   cred-user = user;
+   atomic_set(cred-usage, 1);
+   info-task = task-kernel_task;
+   info-preempt_count = 0;
+   info-flags = 0;
+   atomic_set(ns-count, 1);
+   ns-uts_ns = 0;
+   ns-ipc_ns = 0;
+   ns-mnt_ns = 0;
+   ns-pid_ns_for_children = 0;
+   ns-net_ns = init_net;
+   task-kernel_task.cred = cred;
+   task-kernel_task.pid = pid;
+   task-kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task-kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task-kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task-kernel_task.nsproxy = ns;
+   task-kernel_task.stack = info;
+   /* this is a hack. */
+   task-kernel_task.group_leader = task-kernel_task;
+   task-private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task-kernel_task.nsproxy);
+   lib_free((void *)task-kernel_task.cred);
+   lib_free((void *)task-kernel_task.cred-user);
+   free_thread_info(task-kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task-private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task-kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return lib_task-kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait-flags = ~WQ_FLAG_EXCLUSIVE;
+   list_add(wait-task_list, q-task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait-flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(wait-task_list, q-task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait-task_list.prev != LIST_POISON2)
+   list_del(wait-task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait-flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(wait-task_list))
+   list_add_tail(wait

Re: [PATCH v4 00/10] an introduction of Linux library operating system (LibOS)

2015-04-29 Thread Hajime Tazaki

At Mon, 27 Apr 2015 09:39:20 +0200,
Richard Weinberger wrote:

> > Hmm, it still does not build. This time I got:
> > 
> >   CC  kernel/time/time.o
> > In file included from kernel/time/time.c:44:0:
> > kernel/time/timeconst.h:11:2: error: #error "kernel/timeconst.h has the 
> > wrong HZ value!"
> >  #error "kernel/timeconst.h has the wrong HZ value!"
> >   ^
> > arch/lib/Makefile:187: recipe for target 'kernel/time/time.o' failed
> > make: *** [kernel/time/time.o] Error 1
> 
> A make mrproper made the issue go away.
> Please use kbuild. :)

thanks for the report.

it's been fixed and I added a test (locally) to avoid
further regressions.

-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/10] an introduction of Linux library operating system (LibOS)

2015-04-29 Thread Hajime Tazaki

At Mon, 27 Apr 2015 09:39:20 +0200,
Richard Weinberger wrote:

  Hmm, it still does not build. This time I got:
  
CC  kernel/time/time.o
  In file included from kernel/time/time.c:44:0:
  kernel/time/timeconst.h:11:2: error: #error kernel/timeconst.h has the 
  wrong HZ value!
   #error kernel/timeconst.h has the wrong HZ value!
^
  arch/lib/Makefile:187: recipe for target 'kernel/time/time.o' failed
  make: *** [kernel/time/time.o] Error 1
 
 A make mrproper made the issue go away.
 Please use kbuild. :)

thanks for the report.

it's been fixed and I added a test (locally) to avoid
further regressions.

-- Hajime
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 03/10] lib: public headers and API implementations for userspace programs

2015-04-26 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 ++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +++
 arch/lib/lib-socket.c | 410 ++
 arch/lib/lib.c| 294 ++
 arch/lib/lib.h|  21 +++
 9 files changed, 1186 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index 000..974122c
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include "sim-printf.h"
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf("Assert failed %s:%u \"" #v "\"\n",  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index 000..e871a59
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include 
+#include "sim-types.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struct SimDevicePacket (*dev_create_packet)(struct SimDevice *dev,
+  

[PATCH v4 10/10] lib: tools used for test scripts

2015-04-26 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki 
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index 000..57a74a0
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index 000..3da25429
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo "bzr is missing" 
&& ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo "autotools 
is missing" && ./bake/bake.py check && exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
"pygoocanvas|graphviz|python-dev" | grep Missing && (echo "required packages 
are missing") || echo ""
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index 000..51ac5a5
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index 000..9377ac3
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM=" -s dce-umip"
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index 000..e81e2d8
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE="-v"
+VALGRIND=""
+FAULT_INJECTION=""
+
+if [ "$1" = "-g" ] ; then
+ VALGRIND="-g"
+# Not implemneted yet.
+#elif [ "$1" = "-f" ] ; then
+# FAULT_INJECTION="-f"
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index 000..198e7e4
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/"`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat > ${NUSE_CONF} << ENDCONF
+
+interface ${IFNAME}
+   address ${

[PATCH v4 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-04-26 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki 
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fea2561..7c5924c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -35,7 +35,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -77,8 +77,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -300,7 +301,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(_lock);
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (entry && use_table(head))
*phead = head;
else
@@ -321,7 +322,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -339,7 +340,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -670,7 +671,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, , ); h; next_entry(, )) {
+   for (ctl_table_first_entry(ctl_dir, , ); h;
+ctl_table_next_entry(, )) {
if (!scan(h, entry, , file, ctx)) {
sysctl_head_finish(h);
break;
@@ -828,7 +830,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry->mode))
@@ -924,13 +926,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir->header.parent)
return >dir;
-   parent = xlate_dir(set, dir->header.parent);
+   parent = ctl_table_xlate_dir(set, dir->header.parent);
if (IS_ERR(parent))
return parent;
procname = dir->header.ctl_table[0].procname;
@@ -951,13 +953,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(_lock);
root = (*pentry)->data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)->parent);
+   dir = ctl_table_xlate_dir(set, (*phead)->parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)->procname;
head = NULL;
-   entry = find_entry(, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry && use_table(head)) {
unuse_table(*phead);
@@ -1069,7 +1071,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry->procname; entry++) {
const char *procname = entry->procname;
-   link = find_entry(, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(, dir, procname, 
strlen(procname));
if (!link)
return false;
if (S_ISDIR(link->mode) &am

[PATCH v4 07/10] lib: other kernel glue layer code

2015-04-26 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Christoph Paasch 
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/glue.c   | 289 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 +
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  34 ++
 arch/lib/random.c |  53 +
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 733 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index 000..3a1f301
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "linux/capability.h"
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index 000..ce424ff
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ *     Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index 000..324e10b
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include 
+
+#include "sim-assert.h"
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = _mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return _mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type->name;
+
+   if (flags & MS_KERNMOUNT)
+   mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type->mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt->mnt.mnt_root = root;
+   mnt->mnt.mnt_sb = root->d_sb;
+   mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+   mnt->mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(>mnt_instance, >d_sb->s_mounts);
+
+   return >mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 1024;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index 000..93f72d1
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,289 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include /* loff_t */
+#include /* ESPIPE */
+#include   /* PAGE_CACHE_SIZE

[PATCH v4 06/10] lib: sysctl handling (kernel glue code)

2015-04-26 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registed via
lib_init() API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index 000..5f08f9f
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sim-assert.h"
+#include "sim-types.h"
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int nr_pdflush_threads = 0;
+unsigned long scan_unevictable_pages = 0;
+int suid_dumpable = 0;
+int page_cluster 

[PATCH v4 04/10] lib: time handling (kernel glue code)

2015-04-26 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/hrtimer.c | 122 +++
 arch/lib/tasklet-hrtimer.c |  57 +++
 arch/lib/time.c| 144 +++
 arch/lib/timer.c   | 238 +
 4 files changed, 561 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index 000..4565b59
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,122 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim-assert.h"
+#include "sim.h"
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer->function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer->_softexpires),
+ , timer);
+   timer->base = event;
+   } else {
+   /* mark as completed. */
+   timer->base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   "slack" range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer->_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, , timer);
+   timer->base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer->base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer->base);
+   timer->base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret >= 0);
+   return ret;
+}
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
+}
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index 000..fef4902
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajim

[PATCH v4 09/10] lib: libos build scripts and documentation

2015-04-26 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 653 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1252 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index 000..fbf7946
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a shared part of
+applications (NUSE/DCE). Pure userspace part

[PATCH v4 08/10] lib: auxially files for auto-generated asm-generic files of libos

2015-04-26 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/include/asm/Kbuild   | 57 +++
 arch/lib/include/asm/atomic.h | 50 ++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 ++
 arch/lib/include/asm/current.h|  7 +
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 +
 arch/lib/include/asm/pgtable.h| 30 ++
 arch/lib/include/asm/processor.h  | 19 
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 +
 arch/lib/include/asm/thread_info.h| 36 ++
 arch/lib/include/asm/uaccess.h| 14 +
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 322 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index 000..c647b1c
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index 000..41a49285
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,50 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include 
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)&(v)->counter)
+void atomic64_add(long i, atomic64_t *v);
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v->counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v->counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_sub(1LL, (v))
+int atomic64_dec_and_test(atomic64_t *v);
+int atomic64_inc_and_te

[PATCH v4 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-04-26 Thread Hajime Tazaki
contexnt primitives of kernel such as soft interupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 242 ++
 4 files changed, 832 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index 000..98a568a
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "lib.h"
+#include "sim.h"
+#include "sim-assert.h"
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(>kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid->numbers[0].nr = pid;
+   cred->fsuid = make_kuid(current_user_ns(), 0);
+   cred->fsgid = make_kgid(current_user_ns(), 0);
+   cred->user = user;
+   atomic_set(>usage, 1);
+   info->task = >kernel_task;
+   info->preempt_count = 0;
+   info->flags = 0;
+   atomic_set(>count, 1);
+   ns->uts_ns = 0;
+   ns->ipc_ns = 0;
+   ns->mnt_ns = 0;
+   ns->pid_ns_for_children = 0;
+   ns->net_ns = _net;
+   task->kernel_task.cred = cred;
+   task->kernel_task.pid = pid;
+   task->kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task->kernel_task.nsproxy = ns;
+   task->kernel_task.stack = info;
+   /* this is a hack. */
+   task->kernel_task.group_leader = >kernel_task;
+   task->private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task->kernel_task.nsproxy);
+   lib_free((void *)task->kernel_task.cred);
+   lib_free((void *)task->kernel_task.cred->user);
+   free_thread_info(task->kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task->private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task->kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return _task->kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   list_add(>task_list, >task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(>task_list, >task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait->task_list.prev != LIST_POISON2)
+   list_del(>task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(>task_list))
+   list_add_

[PATCH v4 00/10] an introduction of Linux library operating system (LibOS)

2015-04-26 Thread Hajime Tazaki
This is the 4th version of Linux LibOS patchset which reflects a
couple of comments received from people.

changes from v3:
- Patch 09/10 ("lib: libos build scripts and documentation")
1) Remove RFC (now it's a proposal)
2) build environment cleanup (commented by Paul Bolle)
- Overall
3) change based tree from arnd/asm-generic to torvalds/linux.git
   (commented by Richard Weinberger)
4) rebased to Linux 4.1-rc1 (b787f68c36d49bb1d9236f403813641efa74a031)
5) change the title of cover letter a bit

changes from v2:
- Patch 02/11 ("slab: add private memory allocator header for arch/lib")
1) add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
   to 02 (commented by Christoph Lameter)
- Overall
2) rewrite commit log messages

changes from v1:
- Patch 01/11 ("sysctl: make some functions unstatic to access by arch/lib"):
1) add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
- Patch 08/11 ("lib: other kernel glue layer code"):
2) significantly reduce glue codes (stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild .cmd files

patchset history
-
[v3] : https://lkml.org/lkml/2015/4/19/63
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of Linux library operating system (LibOS).

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personalization and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-linus-upstream-libos-v4

(based on the commit b787f68c36d49bb1d9236f403813641efa74a031 of 
torvalds/linux.git)


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools

Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for  arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxially files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  25 ++
 arch/lib/defconfig| 653 ++
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/glue.c   |

[PATCH v4 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-04-26 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki 
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 205 +++
 5 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ffd24c8..0288cf8 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index 000..d9fe7d5
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eae..7d8314f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 4c3ac12..2ea37c9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include 
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#endif
+
 #include 
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index 000..37596862
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,205 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+#include 
+#include 
+#include 
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0 && (flags & __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, , sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret && p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache->name = name;
+   cache->size = size;
+   cache->align = align;
+   cache->flags = flags;
+   cache->ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{
+   kfree(cache);
+}
+int kmem_cache_shrink(struct kmem_cache *cache)
+{
+   return 1;
+}
+const char *kmem_cache_name(struct kme

[PATCH v4 00/10] an introduction of Linux library operating system (LibOS)

2015-04-26 Thread Hajime Tazaki
This is the 4th version of Linux LibOS patchset which reflects a
couple of comments received from people.

changes from v3:
- Patch 09/10 (lib: libos build scripts and documentation)
1) Remove RFC (now it's a proposal)
2) build environment cleanup (commented by Paul Bolle)
- Overall
3) change based tree from arnd/asm-generic to torvalds/linux.git
   (commented by Richard Weinberger)
4) rebased to Linux 4.1-rc1 (b787f68c36d49bb1d9236f403813641efa74a031)
5) change the title of cover letter a bit

changes from v2:
- Patch 02/11 (slab: add private memory allocator header for arch/lib)
1) add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
   to 02 (commented by Christoph Lameter)
- Overall
2) rewrite commit log messages

changes from v1:
- Patch 01/11 (sysctl: make some functions unstatic to access by arch/lib):
1) add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
- Patch 08/11 (lib: other kernel glue layer code):
2) significantly reduce glue codes (stubs) (commented by Richard Weinberger)
- Others
3) adapt to linux-4.0.0
4) detect make dependency by Kbuild .cmd files

patchset history
-
[v3] : https://lkml.org/lkml/2015/4/19/63
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of Linux library operating system (LibOS).

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personalization and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-linus-upstream-libos-v4

(based on the commit b787f68c36d49bb1d9236f403813641efa74a031 of 
torvalds/linux.git)


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools

Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for  arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxially files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  25 ++
 arch/lib/defconfig| 653 ++
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/glue.c   | 289 +++
 arch/lib/hrtimer.c

[PATCH v4 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-04-26 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 205 +++
 5 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ffd24c8..0288cf8 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -191,7 +191,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -356,6 +356,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include linux/slib_def.h
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -434,6 +437,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index 000..d9fe7d5
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eae..7d8314f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 4c3ac12..2ea37c9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include linux/slub_def.h
 #endif
 
+#ifdef CONFIG_SLIB
+#include linux/slib_def.h
+#endif
+
 #include linux/memcontrol.h
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index 000..37596862
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,205 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include sim.h
+#include sim-assert.h
+#include linux/page-flags.h
+#include linux/types.h
+#include linux/slab.h
+#include linux/slib_def.h
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0  (flags  __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, size, sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret  p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+   struct kmem_cache *cache = kmalloc(sizeof(struct kmem_cache), flags);
+
+   if (!cache)
+   return NULL;
+   cache-name = name;
+   cache-size = size;
+   cache-align = align;
+   cache-flags = flags;
+   cache-ctor = ctor;
+   return cache;
+}
+void kmem_cache_destroy(struct kmem_cache *cache)
+{
+   kfree

[PATCH v4 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-04-26 Thread Hajime Tazaki
contexnt primitives of kernel such as soft interupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 242 ++
 4 files changed, 832 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index 000..98a568a
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/wait.h
+#include linux/list.h
+#include linux/sched.h
+#include linux/nsproxy.h
+#include linux/hash.h
+#include net/net_namespace.h
+#include lib.h
+#include sim.h
+#include sim-assert.h
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(task-kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid-numbers[0].nr = pid;
+   cred-fsuid = make_kuid(current_user_ns(), 0);
+   cred-fsgid = make_kgid(current_user_ns(), 0);
+   cred-user = user;
+   atomic_set(cred-usage, 1);
+   info-task = task-kernel_task;
+   info-preempt_count = 0;
+   info-flags = 0;
+   atomic_set(ns-count, 1);
+   ns-uts_ns = 0;
+   ns-ipc_ns = 0;
+   ns-mnt_ns = 0;
+   ns-pid_ns_for_children = 0;
+   ns-net_ns = init_net;
+   task-kernel_task.cred = cred;
+   task-kernel_task.pid = pid;
+   task-kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task-kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task-kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task-kernel_task.nsproxy = ns;
+   task-kernel_task.stack = info;
+   /* this is a hack. */
+   task-kernel_task.group_leader = task-kernel_task;
+   task-private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task-kernel_task.nsproxy);
+   lib_free((void *)task-kernel_task.cred);
+   lib_free((void *)task-kernel_task.cred-user);
+   free_thread_info(task-kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task-private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task-kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return lib_task-kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait-flags = ~WQ_FLAG_EXCLUSIVE;
+   list_add(wait-task_list, q-task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait-flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(wait-task_list, q-task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait-task_list.prev != LIST_POISON2)
+   list_del(wait-task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait-flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(wait-task_list))
+   list_add_tail(wait

[PATCH v4 07/10] lib: other kernel glue layer code

2015-04-26 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Christoph Paasch christoph.paa...@gmail.com
---
 arch/lib/capability.c |  25 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/glue.c   | 289 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 +
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  34 ++
 arch/lib/random.c |  53 +
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 733 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index 000..3a1f301
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,25 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/capability.h
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index 000..ce424ff
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ * Frederic Urbani
+ */
+
+#include sim.h
+#include sim-assert.h
+#include linux/fs.h
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index 000..324e10b
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ * Frederic Urbani
+ */
+
+#include fs/mount.h
+
+#include sim-assert.h
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = local_mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return local_mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type-name;
+
+   if (flags  MS_KERNMOUNT)
+   mnt-mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type-mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt-mnt.mnt_root = root;
+   mnt-mnt.mnt_sb = root-d_sb;
+   mnt-mnt_mountpoint = mnt-mnt.mnt_root;
+   mnt-mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(mnt-mnt_instance, root-d_sb-s_mounts);
+
+   return mnt-mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 1024;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new file mode 100644
index 000..93f72d1
--- /dev/null
+++ b/arch/lib/glue.c
@@ -0,0 +1,289 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz

[PATCH v4 06/10] lib: sysctl handling (kernel glue code)

2015-04-26 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registed via
lib_init() API.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index 000..5f08f9f
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/mm.h
+#include linux/mmzone.h
+#include linux/mman.h
+#include linux/ratelimit.h
+#include linux/proc_fs.h
+#include sim-assert.h
+#include sim-types.h
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int

[PATCH v4 04/10] lib: time handling (kernel glue code)

2015-04-26 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/hrtimer.c | 122 +++
 arch/lib/tasklet-hrtimer.c |  57 +++
 arch/lib/time.c| 144 +++
 arch/lib/timer.c   | 238 +
 4 files changed, 561 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index 000..4565b59
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,122 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/hrtimer.h
+#include sim-assert.h
+#include sim.h
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer-function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer-_softexpires),
+ trampoline, timer);
+   timer-base = event;
+   } else {
+   /* mark as completed. */
+   timer-base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   slack range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer-_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, trampoline, timer);
+   timer-base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer-base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer-base);
+   timer-base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret = 0);
+   return ret;
+}
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
+}
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index 000..fef4902
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage

[PATCH v4 09/10] lib: libos build scripts and documentation

2015-04-26 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Ryo Nakamura u...@haeena.net
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   3 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 224 
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 653 ++
 arch/lib/generate-linker-script.py|  50 +++
 8 files changed, 1252 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index 000..fbf7946
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e., arch/lib) only contains a shared part of
+applications

[PATCH v4 03/10] lib: public headers and API implementations for userspace programs

2015-04-26 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Ryo Nakamura u...@haeena.net
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 ++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +++
 arch/lib/lib-socket.c | 410 ++
 arch/lib/lib.c| 294 ++
 arch/lib/lib.h|  21 +++
 9 files changed, 1186 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index 000..974122c
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include sim-printf.h
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf(Assert failed %s:%u \ #v \\n,  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index 000..e871a59
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include linux/socket.h
+#include sim-types.h
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struct

[PATCH v4 10/10] lib: tools used for test scripts

2015-04-26 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index 000..57a74a0
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index 000..3da25429
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo bzr is missing 
 ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo autotools 
is missing  ./bake/bake.py check  exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
pygoocanvas|graphviz|python-dev | grep Missing  (echo required packages 
are missing) || echo 
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index 000..51ac5a5
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index 000..9377ac3
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM= -s dce-umip
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index 000..e81e2d8
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE=-v
+VALGRIND=
+FAULT_INJECTION=
+
+if [ $1 = -g ] ; then
+ VALGRIND=-g
+# Not implemneted yet.
+#elif [ $1 = -f ] ; then
+# FAULT_INJECTION=-f
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index 000..198e7e4
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat  ${NUSE_CONF}  ENDCONF
+
+interface ${IFNAME}
+   address ${IPADDR}
+   netmask 255.255.255.0
+   macaddr 00:01:01:01:01:02
+   viftype RAW
+
+route
+   network 0.0.0.0
+   netmask 0.0.0.0
+   gateway ${GW}
+
+ENDCONF
+
+cd ${LIBOS_TOOLS}
+sudo NUSECONF

[PATCH v4 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-04-26 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fea2561..7c5924c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -35,7 +35,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -77,8 +77,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -300,7 +301,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(sysctl_lock);
-   entry = find_entry(head, dir, name, namelen);
+   entry = ctl_table_find_entry(head, dir, name, namelen);
if (entry  use_table(head))
*phead = head;
else
@@ -321,7 +322,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -339,7 +340,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -670,7 +671,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, h, entry); h; next_entry(h, entry)) {
+   for (ctl_table_first_entry(ctl_dir, h, entry); h;
+ctl_table_next_entry(h, entry)) {
if (!scan(h, entry, pos, file, ctx)) {
sysctl_head_finish(h);
break;
@@ -828,7 +830,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(head, dir, name, namelen);
+   entry = ctl_table_find_entry(head, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry-mode))
@@ -924,13 +926,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir-header.parent)
return set-dir;
-   parent = xlate_dir(set, dir-header.parent);
+   parent = ctl_table_xlate_dir(set, dir-header.parent);
if (IS_ERR(parent))
return parent;
procname = dir-header.ctl_table[0].procname;
@@ -951,13 +953,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(sysctl_lock);
root = (*pentry)-data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)-parent);
+   dir = ctl_table_xlate_dir(set, (*phead)-parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)-procname;
head = NULL;
-   entry = find_entry(head, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(head, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry  use_table(head)) {
unuse_table(*phead);
@@ -1069,7 +1071,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry-procname; entry++) {
const char *procname = entry-procname;
-   link = find_entry(head, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(head, dir, procname, 
strlen(procname));
if (!link)
return false

[PATCH v4 08/10] lib: auxially files for auto-generated asm-generic files of libos

2015-04-26 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/include/asm/Kbuild   | 57 +++
 arch/lib/include/asm/atomic.h | 50 ++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 16 ++
 arch/lib/include/asm/current.h|  7 +
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 +
 arch/lib/include/asm/pgtable.h| 30 ++
 arch/lib/include/asm/processor.h  | 19 
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 +
 arch/lib/include/asm/thread_info.h| 36 ++
 arch/lib/include/asm/uaccess.h| 14 +
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 322 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index 000..c647b1c
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index 000..41a49285
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,50 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include linux/types.h
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)(v)-counter)
+void atomic64_add(long i, atomic64_t *v);
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v-counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v-counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_sub(1LL, (v))
+int atomic64_dec_and_test(atomic64_t *v);
+int

Re: [RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-24 Thread Hajime Tazaki

At Fri, 24 Apr 2015 10:59:21 +0200,
Richard Weinberger wrote:
> Am 24.04.2015 um 10:22 schrieb Hajime Tazaki:
> >> You *really* need to shape up wrt the build process.
> > 
> > at the moment, the implementation of libos can't automate to
> > follow such changes in the build process. but good news is
> > it's a trivial task to follow up the latest function.
> > 
> > my observation on this manual follow up since around 3.7
> > kernel (2.5 yrs ago) is that these changes mostly happened
> > during merge-window of each new version, and the fix only
> > takes a couple of hours at maximum.
> > 
> > I think I can survive with these changes but I'd like to ask
> > broader opinions.
> > 
> > 
> > one more question:
> > 
> > I'd really like to have a suggestion on which tree I should
> > base for libos tree.
> > 
> > I'm proposing a patchset to arnd/asm-generic tree (which I
> > believe the base tree for new arch/), while the patchset is
> > tested with davem/net-next tree because right now libos is
> > only for net/.
> > 
> > shall I propose a patchset based on Linus' tree instead ?
> 
> I'd suggest the following:
> Maintain LibOS in your git tree and follow Linus' tree.

I see. will do it from next patch version.

> Make sure that all kernel releases build and work.
> 
> This way you can experiment with automation and other
> stuff. If it works well you can ask for mainline inclusion
> after a few kernel releases.
> 
> Your git history will show how much maintenance burden
> LibOS has and how much with every merge window breaks and
> needs manual fixup.

I believe this experiment is what we have been doing in the
past a couple of years (it's been tested with net-next tree,
not Linus tree though).

and the experiment is working well (as I stated in the
previous email) and that's why I'm here to propose this
patchset.

you can also see the git history: it also includes libos
specific commits of course.

https://github.com/libos-nuse/net-next-nuse/commits/nuse?author=thehajime



-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-24 Thread Hajime Tazaki

Hi Richard,

At Fri, 24 Apr 2015 09:40:32 +0200,
Richard Weinberger wrote:
> 
> Hi!
> 
> Am 19.04.2015 um 15:28 schrieb Hajime Tazaki:
> > changes from v2:
> > - Patch 02/11 ("slab: add private memory allocator header for arch/lib")
> > * add new allocator named SLIB (Library Allocator): Patch 04/11 is 
> > integrated
> >   to 02 (commented by Christoph Lameter)
> > - Overall
> > * rewrite commit log messages
> > 
> > changes from v1:
> > - Patch 01/11 ("sysctl: make some functions unstatic to access by 
> > arch/lib"):
> > * add prefix ctl_table_ to newly publiced functions (commented by Joe 
> > Perches)
> > - Patch 08/11 ("lib: other kernel glue layer code"):
> > * significantly reduce glue codes (stubs) (commented by Richard Weinberger)
> > - Others
> > * adapt to linux-4.0.0
> > * detect make dependency by Kbuild .cmd files
> 
> I still fail to build it. :-(
> 
> for-asm-upstream-v3 on top of Linus' tree gives:
(snip)
> arch/lib/Makefile:210: recipe for target 'arch/lib/capability.o' failed
> make: *** [arch/lib/capability.o] Error 1

I'm also aware of and already fixed this issue for pre-v4
patch of libos.

> And on top of v4.0 it fails too:
(snip)
> In file included from arch/lib/lib-socket.c:12:0:
> ./include/linux/net.h:216:5: note: declared here
>  int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
>  ^
> arch/lib/Makefile:210: recipe for target 'arch/lib/lib-socket.o' failed
> make: *** [arch/lib/lib-socket.o] Error 1

since tag v4.0 to libos v3 patch, there is an update on the
sock_sendmsg(): v3 patch already followed the change. that's
why the patch can't build on top of v4.0.

> You *really* need to shape up wrt the build process.

at the moment, the implementation of libos can't automate to
follow such changes in the build process. but good news is
it's a trivial task to follow up the latest function.

my observation on this manual follow up since around 3.7
kernel (2.5 yrs ago) is that these changes mostly happened
during merge-window of each new version, and the fix only
takes a couple of hours at maximum.

I think I can survive with these changes but I'd like to ask
broader opinions.


one more question:

I'd really like to have a suggestion on which tree I should
base for libos tree.

I'm proposing a patchset to arnd/asm-generic tree (which I
believe the base tree for new arch/), while the patchset is
tested with davem/net-next tree because right now libos is
only for net/.

shall I propose a patchset based on Linus' tree instead ?

thank you for your feedback.

-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-24 Thread Hajime Tazaki

At Fri, 24 Apr 2015 10:59:21 +0200,
Richard Weinberger wrote:
 Am 24.04.2015 um 10:22 schrieb Hajime Tazaki:
  You *really* need to shape up wrt the build process.
  
  at the moment, the implementation of libos can't automate to
  follow such changes in the build process. but good news is
  it's a trivial task to follow up the latest function.
  
  my observation on this manual follow up since around 3.7
  kernel (2.5 yrs ago) is that these changes mostly happened
  during merge-window of each new version, and the fix only
  takes a couple of hours at maximum.
  
  I think I can survive with these changes but I'd like to ask
  broader opinions.
  
  
  one more question:
  
  I'd really like to have a suggestion on which tree I should
  base for libos tree.
  
  I'm proposing a patchset to arnd/asm-generic tree (which I
  believe the base tree for new arch/), while the patchset is
  tested with davem/net-next tree because right now libos is
  only for net/.
  
  shall I propose a patchset based on Linus' tree instead ?
 
 I'd suggest the following:
 Maintain LibOS in your git tree and follow Linus' tree.

I see. will do it from next patch version.

 Make sure that all kernel releases build and work.
 
 This way you can experiment with automation and other
 stuff. If it works well you can ask for mainline inclusion
 after a few kernel releases.
 
 Your git history will show how much maintenance burden
 LibOS has and how much with every merge window breaks and
 needs manual fixup.

I believe this experiment is what we have been doing in the
past a couple of years (it's been tested with net-next tree,
not Linus tree though).

and the experiment is working well (as I stated in the
previous email) and that's why I'm here to propose this
patchset.

you can also see the git history: it also includes libos
specific commits of course.

https://github.com/libos-nuse/net-next-nuse/commits/nuse?author=thehajime



-- Hajime
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-24 Thread Hajime Tazaki

Hi Richard,

At Fri, 24 Apr 2015 09:40:32 +0200,
Richard Weinberger wrote:
 
 Hi!
 
 Am 19.04.2015 um 15:28 schrieb Hajime Tazaki:
  changes from v2:
  - Patch 02/11 (slab: add private memory allocator header for arch/lib)
  * add new allocator named SLIB (Library Allocator): Patch 04/11 is 
  integrated
to 02 (commented by Christoph Lameter)
  - Overall
  * rewrite commit log messages
  
  changes from v1:
  - Patch 01/11 (sysctl: make some functions unstatic to access by 
  arch/lib):
  * add prefix ctl_table_ to newly publiced functions (commented by Joe 
  Perches)
  - Patch 08/11 (lib: other kernel glue layer code):
  * significantly reduce glue codes (stubs) (commented by Richard Weinberger)
  - Others
  * adapt to linux-4.0.0
  * detect make dependency by Kbuild .cmd files
 
 I still fail to build it. :-(
 
 for-asm-upstream-v3 on top of Linus' tree gives:
(snip)
 arch/lib/Makefile:210: recipe for target 'arch/lib/capability.o' failed
 make: *** [arch/lib/capability.o] Error 1

I'm also aware of and already fixed this issue for pre-v4
patch of libos.

 And on top of v4.0 it fails too:
(snip)
 In file included from arch/lib/lib-socket.c:12:0:
 ./include/linux/net.h:216:5: note: declared here
  int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
  ^
 arch/lib/Makefile:210: recipe for target 'arch/lib/lib-socket.o' failed
 make: *** [arch/lib/lib-socket.o] Error 1

since tag v4.0 to libos v3 patch, there is an update on the
sock_sendmsg(): v3 patch already followed the change. that's
why the patch can't build on top of v4.0.

 You *really* need to shape up wrt the build process.

at the moment, the implementation of libos can't automate to
follow such changes in the build process. but good news is
it's a trivial task to follow up the latest function.

my observation on this manual follow up since around 3.7
kernel (2.5 yrs ago) is that these changes mostly happened
during merge-window of each new version, and the fix only
takes a couple of hours at maximum.

I think I can survive with these changes but I'd like to ask
broader opinions.


one more question:

I'd really like to have a suggestion on which tree I should
base for libos tree.

I'm proposing a patchset to arnd/asm-generic tree (which I
believe the base tree for new arch/), while the patchset is
tested with davem/net-next tree because right now libos is
only for net/.

shall I propose a patchset based on Linus' tree instead ?

thank you for your feedback.

-- Hajime
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 09/10] lib: libos build scripts and documentation

2015-04-21 Thread Hajime Tazaki

Hi Paul,

many thanks for your review. 

all the fixes will be on next patchset.
my comments are below.

At Mon, 20 Apr 2015 22:43:07 +0200,
Paul Bolle wrote:
> 
> Some random observations while I'm still trying to wrap my head around
> all this (which might take quite some time).
> 
> On Sun, 2015-04-19 at 22:28 +0900, Hajime Tazaki wrote:
> > --- /dev/null
> > +++ b/arch/lib/Kconfig
> > @@ -0,0 +1,124 @@
> > +menuconfig LIB
> > +   bool "LibOS-specific options"
> > +   def_bool n
> 
> This is the start of the Kconfig parse for lib. (That would basically
> still be true even if you didn't set KBUILD_KCONFIG, see below.) So why
> not do something like all arches do:
> 
> config LIB
>   def_bool y
>   select [...]
> 
> Ie, why would someone want to build for ARCH=lib and still not set LIB?

agreed. fixed.

> > +config EXPERIMENTAL
> > +   def_bool y
> 
> Unneeded: removed treewide in, I think, 2014.

thanks. fixed.

> > +config MMU
> > +def_bool n
> 
> Add empty line.
> 
> > +config FPU
> > +def_bool n
> 
> Ditto.

both are fixed.

> > +config KTIME_SCALAR
> > +   def_bool y
> 
> This one is unused.

deleted.

> > +config GENERIC_BUG
> > +   def_bool y
> > +   depends on BUG
> 
> Add empty line here.

fixed.

> > +config GENERIC_FIND_NEXT_BIT
> > +   def_bool y
> 
> This one is unused too.

deleted.

> > +config SLIB
> > +   def_bool y
> 
> You've also added SLIB to init/Kconfig in 02/10. But "make ARCH=lib
> *config" will never visit init/Kconfig, will it? And, apparently, none
> of SL[AOU]B are wanted for lib. So I think the entry for config SLIB in
> that file can be dropped (as other arches will never see it because it
> depends on LIB).
> 
> (Note that I haven't actually looked into all the Kconfig entries added
> above. Perhaps I might do that. But I'm pretty sure most of the time all
> I can say is: "I have no idea why this entry defaults to $VALUE".)

I intended to SLIB be a generic one, not only for the
arch/lib, as we discussed during v2 patch. 

but, you're right: for the moment, no one uses SLIB, we
don't visit init/Kconfig, I dropped config SLIB entry from
init/Kconfig.

> > +source "net/Kconfig"
> > +
> > +source "drivers/base/Kconfig"
> > +
> > +source "crypto/Kconfig"
> > +
> > +source "lib/Kconfig"
> > +
> > +
> 
> Trailing empty lines.

deleted. thanks.

> > diff --git a/arch/lib/Makefile b/arch/lib/Makefile
> > new file mode 100644
> > index 000..d8a0bf9
> > --- /dev/null
> > +++ b/arch/lib/Makefile
> > @@ -0,0 +1,251 @@
> > +ARCH_DIR := arch/lib
> > +SRCDIR=$(dir $(firstword $(MAKEFILE_LIST)))
> 
> Do you use SRCDIR?

no. deleted the line.

> > +DCE_TESTDIR=$(srctree)/tools/testing/libos/
> > +KBUILD_KCONFIG := arch/$(ARCH)/Kconfig
> 
> I think you copied this from arch/um/Makefile. But arch/um/ is, well,
> special. Why should lib not start the kconfig parse in the file named
> Kconfig? And if you want to start in arch/lib/Kconfig, it would be nice
> to add a mainmenu (just like arch/x86/um/Kconfig does).

right now, 'lib' only wants to eat arch/lib/Kconfig so that
build and link its wanted files instead of configurable one.

so I beilive arch/lib is also special as arch/um is.

I added a mainmenu btw. thanks.

> (I don't read Makefilese well enough to understand the rest of this
> file. I think it's scary.)

indeed. thank you again to review the cryptic files..

> When I did
> make ARCH=lib menuconfig
> 
> I saw (among other things):
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `lzo/' given more than once in the 
> same rule.
(snip)
> arch/lib/Makefile.print:41: target `ppp/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `slip/' given more than once in the 
> same rule.
> 
> I have no idea why. Unclean tree?

this was due to inappropriate handling of the internal
directory listing procedure. fixed.

> > +.PHONY : core
> > +.NOTPARALLEL : print $(subdirs) $(final-obj-m)
> 
> > --- /dev/null
> > +++ b/arch/lib/processor.mk
> > @@ -0,0 +1,7 @@
> > +PROCESSOR=$(shell uname -m)
> > +PROCESSOR_x

Re: [RFC PATCH v3 09/10] lib: libos build scripts and documentation

2015-04-21 Thread Hajime Tazaki

Hi Paul,

many thanks for your review. 

all the fixes will be on next patchset.
my comments are below.

At Mon, 20 Apr 2015 22:43:07 +0200,
Paul Bolle wrote:
 
 Some random observations while I'm still trying to wrap my head around
 all this (which might take quite some time).
 
 On Sun, 2015-04-19 at 22:28 +0900, Hajime Tazaki wrote:
  --- /dev/null
  +++ b/arch/lib/Kconfig
  @@ -0,0 +1,124 @@
  +menuconfig LIB
  +   bool LibOS-specific options
  +   def_bool n
 
 This is the start of the Kconfig parse for lib. (That would basically
 still be true even if you didn't set KBUILD_KCONFIG, see below.) So why
 not do something like all arches do:
 
 config LIB
   def_bool y
   select [...]
 
 Ie, why would someone want to build for ARCH=lib and still not set LIB?

agreed. fixed.

  +config EXPERIMENTAL
  +   def_bool y
 
 Unneeded: removed treewide in, I think, 2014.

thanks. fixed.

  +config MMU
  +def_bool n
 
 Add empty line.
 
  +config FPU
  +def_bool n
 
 Ditto.

both are fixed.

  +config KTIME_SCALAR
  +   def_bool y
 
 This one is unused.

deleted.

  +config GENERIC_BUG
  +   def_bool y
  +   depends on BUG
 
 Add empty line here.

fixed.

  +config GENERIC_FIND_NEXT_BIT
  +   def_bool y
 
 This one is unused too.

deleted.

  +config SLIB
  +   def_bool y
 
 You've also added SLIB to init/Kconfig in 02/10. But make ARCH=lib
 *config will never visit init/Kconfig, will it? And, apparently, none
 of SL[AOU]B are wanted for lib. So I think the entry for config SLIB in
 that file can be dropped (as other arches will never see it because it
 depends on LIB).
 
 (Note that I haven't actually looked into all the Kconfig entries added
 above. Perhaps I might do that. But I'm pretty sure most of the time all
 I can say is: I have no idea why this entry defaults to $VALUE.)

I intended to SLIB be a generic one, not only for the
arch/lib, as we discussed during v2 patch. 

but, you're right: for the moment, no one uses SLIB, we
don't visit init/Kconfig, I dropped config SLIB entry from
init/Kconfig.

  +source net/Kconfig
  +
  +source drivers/base/Kconfig
  +
  +source crypto/Kconfig
  +
  +source lib/Kconfig
  +
  +
 
 Trailing empty lines.

deleted. thanks.

  diff --git a/arch/lib/Makefile b/arch/lib/Makefile
  new file mode 100644
  index 000..d8a0bf9
  --- /dev/null
  +++ b/arch/lib/Makefile
  @@ -0,0 +1,251 @@
  +ARCH_DIR := arch/lib
  +SRCDIR=$(dir $(firstword $(MAKEFILE_LIST)))
 
 Do you use SRCDIR?

no. deleted the line.

  +DCE_TESTDIR=$(srctree)/tools/testing/libos/
  +KBUILD_KCONFIG := arch/$(ARCH)/Kconfig
 
 I think you copied this from arch/um/Makefile. But arch/um/ is, well,
 special. Why should lib not start the kconfig parse in the file named
 Kconfig? And if you want to start in arch/lib/Kconfig, it would be nice
 to add a mainmenu (just like arch/x86/um/Kconfig does).

right now, 'lib' only wants to eat arch/lib/Kconfig so that
build and link its wanted files instead of configurable one.

so I beilive arch/lib is also special as arch/um is.

I added a mainmenu btw. thanks.

 (I don't read Makefilese well enough to understand the rest of this
 file. I think it's scary.)

indeed. thank you again to review the cryptic files..

 When I did
 make ARCH=lib menuconfig
 
 I saw (among other things):
 arch/lib/Makefile.print:41: target `trace/' given more than once in the 
 same rule.
 arch/lib/Makefile.print:41: target `trace/' given more than once in the 
 same rule.
 arch/lib/Makefile.print:41: target `trace/' given more than once in the 
 same rule.
 arch/lib/Makefile.print:41: target `trace/' given more than once in the 
 same rule.
 arch/lib/Makefile.print:41: target `lzo/' given more than once in the 
 same rule.
(snip)
 arch/lib/Makefile.print:41: target `ppp/' given more than once in the 
 same rule.
 arch/lib/Makefile.print:41: target `slip/' given more than once in the 
 same rule.
 
 I have no idea why. Unclean tree?

this was due to inappropriate handling of the internal
directory listing procedure. fixed.

  +.PHONY : core
  +.NOTPARALLEL : print $(subdirs) $(final-obj-m)
 
  --- /dev/null
  +++ b/arch/lib/processor.mk
  @@ -0,0 +1,7 @@
  +PROCESSOR=$(shell uname -m)
  +PROCESSOR_x86_64=64
  +PROCESSOR_i686=32
  +PROCESSOR_i586=32
  +PROCESSOR_i386=32
  +PROCESSOR_i486=32
  +PROCESSOR_SIZE=$(PROCESSOR_$(PROCESSOR))
 
 The rest of the tree appears to use BITS instead of PROCESSOR_SIZE. And
 I do hope there's a cleaner way for lib to set PROCESSOR_SIZE than this.

the variable PROCESSOR_SIZE is only used by
arch/lib/Makefile, with the following lines.

 +ifeq ($(PROCESSOR_SIZE),64)
 +CFLAGS+= -DCONFIG_64BIT
 +endif

Thus it eventually uses CONFIG_64BIT.

I think a cleaner way is to follow the way of arch/um, like
below: I deleted processor.mk and PROCESSOR_SIZE variable.

ifeq ($(SUBARCH),x86)
  ifeq ($(shell uname -m),x86_64)
CFLAGS+= -DCONFIG_64BIT
endif

though it's not able to cross-compile yet

[RFC PATCH v3 04/10] lib: time handling (kernel glue code)

2015-04-19 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/hrtimer.c | 122 +++
 arch/lib/tasklet-hrtimer.c |  57 +++
 arch/lib/time.c| 144 +++
 arch/lib/timer.c   | 238 +
 4 files changed, 561 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index 000..4565b59
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,122 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include "sim-assert.h"
+#include "sim.h"
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer->function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer->_softexpires),
+ , timer);
+   timer->base = event;
+   } else {
+   /* mark as completed. */
+   timer->base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   "slack" range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer->_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, , timer);
+   timer->base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer->base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer->base);
+   timer->base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret >= 0);
+   return ret;
+}
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
+}
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index 000..fef4902
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajim

[RFC PATCH v3 08/10] lib: auxially files for auto-generated asm-generic files of libos

2015-04-19 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/include/asm/Kbuild   | 57 +++
 arch/lib/include/asm/atomic.h | 50 ++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 12 
 arch/lib/include/asm/current.h|  7 +
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 +
 arch/lib/include/asm/pgtable.h| 30 ++
 arch/lib/include/asm/processor.h  | 19 
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 +
 arch/lib/include/asm/thread_info.h| 36 ++
 arch/lib/include/asm/uaccess.h| 14 +
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 318 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index 000..c647b1c
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index 000..41a49285
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,50 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include 
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)&(v)->counter)
+void atomic64_add(long i, atomic64_t *v);
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v->counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v->counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_sub(1LL, (v))
+int atomic64_dec_and_test(atomic64_t *v);
+int atomic64_inc_and_test(

[RFC PATCH v3 06/10] lib: sysctl handling (kernel glue code)

2015-04-19 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registed via
lib_init() API.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index 000..5f08f9f
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "sim-assert.h"
+#include "sim-types.h"
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int nr_pdflush_threads = 0;
+unsigned long scan_unevictable_pages = 0;
+int suid_dumpable = 0;
+int page_cluster 

[RFC PATCH v3 10/10] lib: tools used for test scripts

2015-04-19 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki 
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index 000..57a74a0
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index 000..3da25429
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo "bzr is missing" 
&& ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo "autotools 
is missing" && ./bake/bake.py check && exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
"pygoocanvas|graphviz|python-dev" | grep Missing && (echo "required packages 
are missing") || echo ""
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index 000..51ac5a5
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index 000..9377ac3
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM=" -s dce-umip"
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index 000..e81e2d8
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE="-v"
+VALGRIND=""
+FAULT_INJECTION=""
+
+if [ "$1" = "-g" ] ; then
+ VALGRIND="-g"
+# Not implemneted yet.
+#elif [ "$1" = "-f" ] ; then
+# FAULT_INJECTION="-f"
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index 000..198e7e4
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/"`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat > ${NUSE_CONF} << ENDCONF
+
+interface ${IFNAME}
+   address ${

[RFC PATCH v3 09/10] lib: libos build scripts and documentation

2015-04-19 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   8 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 251 +
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 653 ++
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/processor.mk |   7 +
 9 files changed, 1291 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py
 create mode 100644 arch/lib/processor.mk

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index 000..fbf7946
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External Repository
+
+The kernel source tree (i.e

[RFC PATCH v3 07/10] lib: other kernel glue layer code

2015-04-19 Thread Hajime Tazaki
These files are used to provide the same function calls so that other
network stack code keeps untouched.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Christoph Paasch 
---
 arch/lib/capability.c |  47 +
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 +
 arch/lib/glue.c   | 283 ++
 arch/lib/modules.c|  36 +++
 arch/lib/pid.c|  29 ++
 arch/lib/print.c  |  56 ++
 arch/lib/proc.c   |  34 ++
 arch/lib/random.c |  53 ++
 arch/lib/sysfs.c  |  83 +++
 arch/lib/vmscan.c |  26 +
 11 files changed, 749 insertions(+)
 create mode 100644 arch/lib/capability.c
 create mode 100644 arch/lib/filemap.c
 create mode 100644 arch/lib/fs.c
 create mode 100644 arch/lib/glue.c
 create mode 100644 arch/lib/modules.c
 create mode 100644 arch/lib/pid.c
 create mode 100644 arch/lib/print.c
 create mode 100644 arch/lib/proc.c
 create mode 100644 arch/lib/random.c
 create mode 100644 arch/lib/sysfs.c
 create mode 100644 arch/lib/vmscan.c

diff --git a/arch/lib/capability.c b/arch/lib/capability.c
new file mode 100644
index 000..7054fea
--- /dev/null
+++ b/arch/lib/capability.c
@@ -0,0 +1,47 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include "linux/capability.h"
+
+struct sock;
+struct sk_buff;
+
+int file_caps_enabled = 0;
+
+bool capable(int cap)
+{
+   switch (cap) {
+   case CAP_NET_RAW:
+   case CAP_NET_BIND_SERVICE:
+   case CAP_NET_ADMIN:
+   return 1;
+   default:
+   break;
+   }
+
+   return 0;
+}
+
+int cap_netlink_recv(struct sk_buff *skb, int cap)
+{
+   return 0;
+}
+
+int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
+{
+   return 0;
+}
+bool ns_capable(struct user_namespace *ns, int cap)
+{
+   return true;
+}
+bool file_ns_capable(const struct file *file, struct user_namespace *ns,
+int cap)
+{
+   return true;
+}
diff --git a/arch/lib/filemap.c b/arch/lib/filemap.c
new file mode 100644
index 000..ce424ff
--- /dev/null
+++ b/arch/lib/filemap.c
@@ -0,0 +1,32 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ *     Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+
+
+ssize_t generic_file_aio_read(struct kiocb *a, const struct iovec *b,
+ unsigned long c, loff_t d)
+{
+   lib_assert(false);
+
+   return 0;
+}
+
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -ENOSYS;
+}
+
+ssize_t
+generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   return 0;
+}
diff --git a/arch/lib/fs.c b/arch/lib/fs.c
new file mode 100644
index 000..324e10b
--- /dev/null
+++ b/arch/lib/fs.c
@@ -0,0 +1,70 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ * Frederic Urbani
+ */
+
+#include 
+
+#include "sim-assert.h"
+
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
+unsigned int dirtytime_expire_interval;
+
+void __init mnt_init(void)
+{
+}
+
+/* Implementation taken from vfs_kern_mount from linux/namespace.c */
+struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+{
+   static struct mount local_mnt;
+   static int count = 0;
+   struct mount *mnt = _mnt;
+   struct dentry *root = 0;
+
+   /* XXX */
+   if (count != 0) return _mnt.mnt;
+   count++;
+
+   memset(mnt, 0, sizeof(struct mount));
+   if (!type)
+   return ERR_PTR(-ENODEV);
+   int flags = MS_KERNMOUNT;
+   char *name = (char *)type->name;
+
+   if (flags & MS_KERNMOUNT)
+   mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+   root = type->mount(type, flags, name, data);
+   if (IS_ERR(root))
+   return ERR_CAST(root);
+
+   mnt->mnt.mnt_root = root;
+   mnt->mnt.mnt_sb = root->d_sb;
+   mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+   mnt->mnt_parent = mnt;
+   /* DCE is monothreaded , so we do not care of lock here */
+   list_add_tail(>mnt_instance, >d_sb->s_mounts);
+
+   return >mnt;
+}
+void inode_wait_for_writeback(struct inode *inode)
+{
+}
+void truncate_inode_pages_final(struct address_space *mapping)
+{
+}
+int dirtytime_interval_handler(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return -ENOSYS;
+}
+
+unsigned int nr_free_buffer_pages(void)
+{
+   return 1024;
+}
diff --git a/arch/lib/glue.c b/arch/lib/glue.c
new 

[RFC PATCH v3 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-04-19 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki 
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index f92d5dd..56feec7 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -35,7 +35,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -77,8 +77,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -300,7 +301,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(_lock);
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (entry && use_table(head))
*phead = head;
else
@@ -321,7 +322,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -339,7 +340,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -670,7 +671,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, , ); h; next_entry(, )) {
+   for (ctl_table_first_entry(ctl_dir, , ); h;
+ctl_table_next_entry(, )) {
if (!scan(h, entry, , file, ctx)) {
sysctl_head_finish(h);
break;
@@ -828,7 +830,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(, dir, name, namelen);
+   entry = ctl_table_find_entry(, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry->mode))
@@ -924,13 +926,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir->header.parent)
return >dir;
-   parent = xlate_dir(set, dir->header.parent);
+   parent = ctl_table_xlate_dir(set, dir->header.parent);
if (IS_ERR(parent))
return parent;
procname = dir->header.ctl_table[0].procname;
@@ -951,13 +953,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(_lock);
root = (*pentry)->data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)->parent);
+   dir = ctl_table_xlate_dir(set, (*phead)->parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)->procname;
head = NULL;
-   entry = find_entry(, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry && use_table(head)) {
unuse_table(*phead);
@@ -1069,7 +1071,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry->procname; entry++) {
const char *procname = entry->procname;
-   link = find_entry(, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(, dir, procname, 
strlen(procname));
if (!link)
return false;
if (S_ISDIR(link->mode) &am

[RFC PATCH v3 02/10] slab: add SLIB (Library memory allocator) for arch/lib

2015-04-19 Thread Hajime Tazaki
add SLIB allocator for arch/lib (CONFIG_LIB) to wrap kmalloc and co.
This will bring user's own allocator of libos: malloc(3) etc.

Signed-off-by: Hajime Tazaki 
---
 include/linux/slab.h |   6 +-
 include/linux/slib_def.h |  21 +
 init/Kconfig |   8 ++
 mm/Makefile  |   1 +
 mm/slab.h|   4 +
 mm/slib.c| 205 +++
 6 files changed, 244 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/slib_def.h
 create mode 100644 mm/slib.c

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9a139b6..b167daa 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -192,7 +192,7 @@ size_t ksize(const void *);
 #endif
 #endif
 
-#ifdef CONFIG_SLOB
+#if defined(CONFIG_SLOB) || defined(CONFIG_SLIB)
 /*
  * SLOB passes all requests larger than one page to the page allocator.
  * No kmalloc array is necessary since objects of different sizes can
@@ -350,6 +350,9 @@ kmalloc_order_trace(size_t size, gfp_t flags, unsigned int 
order)
 }
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#else
 static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 {
unsigned int order = get_order(size);
@@ -428,6 +431,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t 
flags)
}
return __kmalloc(size, flags);
 }
+#endif /* CONFIG_SLIB */
 
 /*
  * Determine size used for the nth kmalloc cache.
diff --git a/include/linux/slib_def.h b/include/linux/slib_def.h
new file mode 100644
index 000..d9fe7d5
--- /dev/null
+++ b/include/linux/slib_def.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_SLLB_DEF_H
+#define _LINUX_SLLB_DEF_H
+
+
+struct kmem_cache {
+   unsigned int object_size;
+   const char *name;
+   size_t size;
+   size_t align;
+   unsigned long flags;
+   void (*ctor)(void *);
+};
+
+void *__kmalloc(size_t size, gfp_t flags);
+void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+   return __kmalloc(size, flags);
+}
+
+#endif /* _LINUX_SLLB_DEF_H */
diff --git a/init/Kconfig b/init/Kconfig
index 9afb971..8bdee98 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1695,6 +1695,14 @@ config SLOB
   allocator. SLOB is generally more space efficient but
   does not perform as well on large systems.
 
+config SLIB
+   bool "SLIB (Library Allocator)"
+   depends on LIB
+   help
+  SLIB replaces the slab allocator with a registered allocator
+  function via lib_init() used by libos (CONFIG_LIB). It usually
+  used malloc(3) or any allocators.
+
 endchoice
 
 config SLUB_CPU_PARTIAL
diff --git a/mm/Makefile b/mm/Makefile
index 4bf586e..8c66951 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_NUMA)+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_SLIB) += slib.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/slab.h b/mm/slab.h
index 1cf40054..89c2319 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -37,6 +37,10 @@ struct kmem_cache {
 #include 
 #endif
 
+#ifdef CONFIG_SLIB
+#include 
+#endif
+
 #include 
 
 /*
diff --git a/mm/slib.c b/mm/slib.c
new file mode 100644
index 000..37596862
--- /dev/null
+++ b/mm/slib.c
@@ -0,0 +1,205 @@
+/*
+ * Library Slab Allocator (SLIB)
+ *
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ *     Hajime Tazaki 
+ */
+
+#include "sim.h"
+#include "sim-assert.h"
+#include 
+#include 
+#include 
+#include 
+
+/* glues */
+struct kmem_cache *files_cachep;
+
+void kfree(const void *p)
+{
+   unsigned long start;
+
+   if (p == 0)
+   return;
+   start = (unsigned long)p;
+   start -= sizeof(size_t);
+   lib_free((void *)start);
+}
+size_t ksize(const void *p)
+{
+   size_t *psize = (size_t *)p;
+
+   psize--;
+   return *psize;
+}
+void *__kmalloc(size_t size, gfp_t flags)
+{
+   void *p = lib_malloc(size + sizeof(size));
+   unsigned long start;
+
+   if (!p)
+   return NULL;
+
+   if (p != 0 && (flags & __GFP_ZERO))
+   lib_memset(p, 0, size + sizeof(size));
+   lib_memcpy(p, , sizeof(size));
+   start = (unsigned long)p;
+   return (void *)(start + sizeof(size));
+}
+
+void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+{
+   return kmalloc(size, flags);
+}
+
+void *krealloc(const void *p, size_t new_size, gfp_t flags)
+{
+   void *ret;
+
+   if (!new_size) {
+   kfree(p);
+   return ZERO_SIZE_PTR;
+   }
+
+   ret = __kmalloc(new_size, flags);
+   if (ret && p != ret)
+   kfree(p);
+
+   return ret;
+}
+
+struct kmem_cache 

[RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-19 Thread Hajime Tazaki
changes from v2:
- Patch 02/11 ("slab: add private memory allocator header for arch/lib")
* add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
  to 02 (commented by Christoph Lameter)
- Overall
* rewrite commit log messages

changes from v1:
- Patch 01/11 ("sysctl: make some functions unstatic to access by arch/lib"):
* add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
- Patch 08/11 ("lib: other kernel glue layer code"):
* significantly reduce glue codes (stubs) (commented by Richard Weinberger)
- Others
* adapt to linux-4.0.0
* detect make dependency by Kbuild .cmd files

patchset history
-
[v2] : https://lkml.org/lkml/2015/4/17/140
[v1] : https://lkml.org/lkml/2015/3/24/254

This is an introduction of library operating system (LibOS) for Linux.

Our objective is to build the kernel network stack as a shared library
that can be linked to by userspace programs to provide network stack
personalization and testing facilities, and allow researchers to more
easily simulate complex network topologies of linux routers/hosts.

Although the architecture itself can virtualize various things, the
current design only focuses on the network stack. You can benefit
network stack feature such as TCP, UDP, SCTP, DCCP (IPv4 and IPv6),
Mobie IPv6, Multipath TCP (IPv4/IPv6, out-of-tree at the present
moment), and netlink with various userspace applications (quagga,
iproute2, iperf, wget, and thttpd).

== What is LibOS ? ==

The library exposes an entry point as API, which is lib_init(), in
order to connect userspace applications to the (userspace-version)
kernel network stack. The clock source, virtual struct net_device, and
scheduler are provided by caller while kernel resource like system
calls is provided by callee.

Once the LibOS is initialized via the API, userspace applications with
POSIX socket can use the system calls defined in LibOS by replacing
from the original socket-related symbols to the LibOS-specific
one. Then application can benefit the network stack of LibOS without
involving the host network stack.

Currently, there are two users of LibOS: Network Stack in Userspace
(NUSE) and ns-3 network simulatior with Direct Code Execution
(DCE). These codes are managed at an external repository(*1).


== How to use it ? ==

to build the library,
% make {defconfig,menuconfig} ARCH=lib

then, build it.
% make library ARCH=lib

You will see liblinux-$(KERNELVERSION).so in the top directory.

== More information ==

The crucial difference between UML (user-mode linux) and this approach
is that we allow multiple network stack instances to co-exist within a
single process with dlmopen(3) like linking for easy debugging.


These patches are also available on this branch:

git://github.com/libos-nuse/net-next-nuse.git for-asm-upstream-v3


For further information, here is a slideset presented at the last
netdev0.1 conference.

http://www.slideshare.net/hajimetazaki/library-operating-system-for-linux-netdev01

I would appreciate any kind of your feedback regarding to upstream
this feature.

*1 https://github.com/libos-nuse/linux-libos-tools

Hajime Tazaki (10):
  sysctl: make some functions unstatic to access by arch/lib
  slab: add SLIB (Library memory allocator) for arch/lib
  lib: public headers and API implementations for userspace programs
  lib: time handling (kernel glue code)
  lib: context and scheduling functions (kernel glue code) for libos
  lib: sysctl handling (kernel glue code)
  lib: other kernel glue layer code
  lib: auxially files for auto-generated asm-generic files of libos
  lib: libos build scripts and documentation
  lib: tools used for test scripts

 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   8 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 251 +
 arch/lib/Makefile.print   |  45 +++
 arch/lib/capability.c |  47 +++
 arch/lib/defconfig| 653 ++
 arch/lib/filemap.c|  32 ++
 arch/lib/fs.c |  70 
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/glue.c   | 283 +++
 arch/lib/hrtimer.c| 122 +++
 arch/lib/include/asm/Kbuild   |  57 +++
 arch/lib/include/asm/atomic.h |  50 +++
 arch/lib/include/asm/barrier.h|   8 +
 arch/lib/include/asm/bitsperlong.h|  12 +
 arch/lib/include/asm/current.h|   7 +
 arch/lib/include/asm/elf.h|  10 +
 arch/lib/include/asm/hardirq.h|   8 +
 arch/lib/include/asm/page.h   |  14 +
 arch/lib/include/asm/pgtable.h|  30 ++
 arch/lib/include/asm/processor.h  |  19 +
 arch/lib/include/asm/ptrace.h |   4 +
 arch/lib/include/asm/segment.h|   6 +

[RFC PATCH v3 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-04-19 Thread Hajime Tazaki
contexnt primitives of kernel such as soft interupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki 
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 242 ++
 4 files changed, 832 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index 000..98a568a
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "lib.h"
+#include "sim.h"
+#include "sim-assert.h"
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(>kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid->numbers[0].nr = pid;
+   cred->fsuid = make_kuid(current_user_ns(), 0);
+   cred->fsgid = make_kgid(current_user_ns(), 0);
+   cred->user = user;
+   atomic_set(>usage, 1);
+   info->task = >kernel_task;
+   info->preempt_count = 0;
+   info->flags = 0;
+   atomic_set(>count, 1);
+   ns->uts_ns = 0;
+   ns->ipc_ns = 0;
+   ns->mnt_ns = 0;
+   ns->pid_ns_for_children = 0;
+   ns->net_ns = _net;
+   task->kernel_task.cred = cred;
+   task->kernel_task.pid = pid;
+   task->kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task->kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task->kernel_task.nsproxy = ns;
+   task->kernel_task.stack = info;
+   /* this is a hack. */
+   task->kernel_task.group_leader = >kernel_task;
+   task->private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task->kernel_task.nsproxy);
+   lib_free((void *)task->kernel_task.cred);
+   lib_free((void *)task->kernel_task.cred->user);
+   free_thread_info(task->kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task->private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task->kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return _task->kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   list_add(>task_list, >task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(>task_list, >task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait->task_list.prev != LIST_POISON2)
+   list_del(>task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait->flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(>task_list))
+   list_add_

[RFC PATCH v3 03/10] lib: public headers and API implementations for userspace programs

2015-04-19 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki 
Signed-off-by: Ryo Nakamura 
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 ++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +++
 arch/lib/lib-socket.c | 410 ++
 arch/lib/lib.c| 294 ++
 arch/lib/lib.h|  21 +++
 9 files changed, 1186 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index 000..974122c
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include "sim-printf.h"
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf("Assert failed %s:%u \"" #v "\"\n",  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index 000..e871a59
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage 
+ * Hajime Tazaki 
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include 
+#include "sim-types.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struct SimDevicePacket (*dev_create_packet)(struct SimDevice *dev,
+  

[RFC PATCH v3 04/10] lib: time handling (kernel glue code)

2015-04-19 Thread Hajime Tazaki
timer related (internal) functions such as add_timer(),
do_gettimeofday() of kernel are trivially reimplemented
for libos. these eventually call the functions registered by lib_init()
API.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/hrtimer.c | 122 +++
 arch/lib/tasklet-hrtimer.c |  57 +++
 arch/lib/time.c| 144 +++
 arch/lib/timer.c   | 238 +
 4 files changed, 561 insertions(+)
 create mode 100644 arch/lib/hrtimer.c
 create mode 100644 arch/lib/tasklet-hrtimer.c
 create mode 100644 arch/lib/time.c
 create mode 100644 arch/lib/timer.c

diff --git a/arch/lib/hrtimer.c b/arch/lib/hrtimer.c
new file mode 100644
index 000..4565b59
--- /dev/null
+++ b/arch/lib/hrtimer.c
@@ -0,0 +1,122 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/hrtimer.h
+#include sim-assert.h
+#include sim.h
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ * @timer:  the timer to be initialized
+ * @clock_id:   the clock to be used
+ * @mode:   timer mode abs/rel
+ */
+void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode)
+{
+   memset(timer, 0, sizeof(*timer));
+}
+static void trampoline(void *context)
+{
+   struct hrtimer *timer = context;
+   enum hrtimer_restart restart = timer-function(timer);
+
+   if (restart == HRTIMER_RESTART) {
+   void *event =
+   lib_event_schedule_ns(ktime_to_ns(timer-_softexpires),
+ trampoline, timer);
+   timer-base = event;
+   } else {
+   /* mark as completed. */
+   timer-base = 0;
+   }
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
+ * @timer:  the timer to be added
+ * @tim:expiry time
+ * @delta_ns:   slack range for the timer
+ * @mode:   expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+unsigned long delta_ns,
+const enum hrtimer_mode mode,
+int wakeup)
+{
+   int ret = hrtimer_cancel(timer);
+   s64 ns = ktime_to_ns(tim);
+   void *event;
+
+   if (mode == HRTIMER_MODE_ABS)
+   ns -= lib_current_ns();
+   timer-_softexpires = ns_to_ktime(ns);
+   event = lib_event_schedule_ns(ns, trampoline, timer);
+   timer-base = event;
+   return ret;
+}
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ * @timer:  hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *cannot be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+   /* Note: we cannot return -1 from this function.
+  see comment in hrtimer_cancel. */
+   if (timer-base == 0)
+   /* timer was not active yet */
+   return 1;
+   lib_event_cancel(timer-base);
+   timer-base = 0;
+   return 0;
+}
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ * @timer:  the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+   /* Note: because we assume a uniprocessor non-interruptible */
+   /* system when running in the kernel, we know that the timer */
+   /* is not running when we execute this code, so, know that */
+   /* try_to_cancel cannot return -1 and we don't need to retry */
+   /* the cancel later to wait for the handler to finish. */
+   int ret = hrtimer_try_to_cancel(timer);
+
+   lib_assert(ret = 0);
+   return ret;
+}
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
+}
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+  unsigned long delta_ns, const enum hrtimer_mode mode)
+{
+   return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
+}
+
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+   *tp = ns_to_timespec(1);
+   return 0;
+}
diff --git a/arch/lib/tasklet-hrtimer.c b/arch/lib/tasklet-hrtimer.c
new file mode 100644
index 000..fef4902
--- /dev/null
+++ b/arch/lib/tasklet-hrtimer.c
@@ -0,0 +1,57 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage

[RFC PATCH v3 08/10] lib: auxially files for auto-generated asm-generic files of libos

2015-04-19 Thread Hajime Tazaki
these files works as stubs in order to transparently run the other
kernel part (e.g., net/) on libos environment.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/include/asm/Kbuild   | 57 +++
 arch/lib/include/asm/atomic.h | 50 ++
 arch/lib/include/asm/barrier.h|  8 +
 arch/lib/include/asm/bitsperlong.h| 12 
 arch/lib/include/asm/current.h|  7 +
 arch/lib/include/asm/elf.h| 10 ++
 arch/lib/include/asm/hardirq.h|  8 +
 arch/lib/include/asm/page.h   | 14 +
 arch/lib/include/asm/pgtable.h| 30 ++
 arch/lib/include/asm/processor.h  | 19 
 arch/lib/include/asm/ptrace.h |  4 +++
 arch/lib/include/asm/segment.h|  6 
 arch/lib/include/asm/sembuf.h |  4 +++
 arch/lib/include/asm/shmbuf.h |  4 +++
 arch/lib/include/asm/shmparam.h   |  4 +++
 arch/lib/include/asm/sigcontext.h |  6 
 arch/lib/include/asm/stat.h   |  4 +++
 arch/lib/include/asm/statfs.h |  4 +++
 arch/lib/include/asm/swab.h   |  7 +
 arch/lib/include/asm/thread_info.h| 36 ++
 arch/lib/include/asm/uaccess.h| 14 +
 arch/lib/include/asm/unistd.h |  4 +++
 arch/lib/include/uapi/asm/byteorder.h |  6 
 23 files changed, 318 insertions(+)
 create mode 100644 arch/lib/include/asm/Kbuild
 create mode 100644 arch/lib/include/asm/atomic.h
 create mode 100644 arch/lib/include/asm/barrier.h
 create mode 100644 arch/lib/include/asm/bitsperlong.h
 create mode 100644 arch/lib/include/asm/current.h
 create mode 100644 arch/lib/include/asm/elf.h
 create mode 100644 arch/lib/include/asm/hardirq.h
 create mode 100644 arch/lib/include/asm/page.h
 create mode 100644 arch/lib/include/asm/pgtable.h
 create mode 100644 arch/lib/include/asm/processor.h
 create mode 100644 arch/lib/include/asm/ptrace.h
 create mode 100644 arch/lib/include/asm/segment.h
 create mode 100644 arch/lib/include/asm/sembuf.h
 create mode 100644 arch/lib/include/asm/shmbuf.h
 create mode 100644 arch/lib/include/asm/shmparam.h
 create mode 100644 arch/lib/include/asm/sigcontext.h
 create mode 100644 arch/lib/include/asm/stat.h
 create mode 100644 arch/lib/include/asm/statfs.h
 create mode 100644 arch/lib/include/asm/swab.h
 create mode 100644 arch/lib/include/asm/thread_info.h
 create mode 100644 arch/lib/include/asm/uaccess.h
 create mode 100644 arch/lib/include/asm/unistd.h
 create mode 100644 arch/lib/include/uapi/asm/byteorder.h

diff --git a/arch/lib/include/asm/Kbuild b/arch/lib/include/asm/Kbuild
new file mode 100644
index 000..c647b1c
--- /dev/null
+++ b/arch/lib/include/asm/Kbuild
@@ -0,0 +1,57 @@
+generic-y += auxvec.h
+generic-y += bitops.h
+generic-y += bug.h
+generic-y += cache.h
+generic-y += cacheflush.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += cmpxchg.h
+generic-y += delay.h
+generic-y += device.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += exec.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += fcntl.h
+generic-y += ftrace.h
+generic-y += io.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq.h
+generic-y += irqflags.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += mcs_spinlock.h
+generic-y += mman.h
+generic-y += mmu.h
+generic-y += mmu_context.h
+generic-y += module.h
+generic-y += mutex.h
+generic-y += param.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += preempt.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += setup.h
+generic-y += signal.h
+generic-y += siginfo.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += timex.h
+generic-y += tlbflush.h
+generic-y += types.h
+generic-y += topology.h
+generic-y += trace_clock.h
+generic-y += unaligned.h
diff --git a/arch/lib/include/asm/atomic.h b/arch/lib/include/asm/atomic.h
new file mode 100644
index 000..41a49285
--- /dev/null
+++ b/arch/lib/include/asm/atomic.h
@@ -0,0 +1,50 @@
+#ifndef _ASM_SIM_ATOMIC_H
+#define _ASM_SIM_ATOMIC_H
+
+#include linux/types.h
+
+#if !defined(CONFIG_64BIT)
+typedef struct {
+   volatile long long counter;
+} atomic64_t;
+#endif
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v)(*(volatile long *)(v)-counter)
+void atomic64_add(long i, atomic64_t *v);
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+   v-counter -= i;
+}
+static inline void atomic64_inc(atomic64_t *v)
+{
+   v-counter++;
+}
+int atomic64_sub_and_test(long i, atomic64_t *v);
+#define atomic64_dec(v)atomic64_sub(1LL, (v))
+int atomic64_dec_and_test(atomic64_t *v);
+int

[RFC PATCH v3 06/10] lib: sysctl handling (kernel glue code)

2015-04-19 Thread Hajime Tazaki
This interacts with fs/proc_fs.c for sysctl-like interface registed via
lib_init() API.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/sysctl.c | 270 ++
 1 file changed, 270 insertions(+)
 create mode 100644 arch/lib/sysctl.c

diff --git a/arch/lib/sysctl.c b/arch/lib/sysctl.c
new file mode 100644
index 000..5f08f9f
--- /dev/null
+++ b/arch/lib/sysctl.c
@@ -0,0 +1,270 @@
+/*
+ * sysctl wrapper for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/mm.h
+#include linux/mmzone.h
+#include linux/mman.h
+#include linux/ratelimit.h
+#include linux/proc_fs.h
+#include sim-assert.h
+#include sim-types.h
+
+int drop_caches_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *length,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+  void *buffer, size_t *lenp,
+  loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_ratio_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_bytes_handler(struct ctl_table *table, int write,
+   void *buffer, size_t *lenp,
+   loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int scan_unevictable_handler(struct ctl_table *table, int write,
+void __user *buffer,
+size_t *length, loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+int sched_rt_handler(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp,
+loff_t *ppos)
+{
+   lib_assert(false);
+   return 0;
+}
+
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS;
+int sysctl_overcommit_ratio = 50;
+int sysctl_panic_on_oom = 0;
+int sysctl_oom_dump_tasks = 0;
+int sysctl_oom_kill_allocating_task = 0;
+int sysctl_nr_trim_pages = 0;
+int sysctl_drop_caches = 0;
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES - 1] = { 32 };
+unsigned int sysctl_sched_child_runs_first = 0;
+unsigned int sysctl_sched_compat_yield = 0;
+unsigned int sysctl_sched_rt_period = 100;
+int sysctl_sched_rt_runtime = 95;
+
+int vm_highmem_is_dirtyable;
+unsigned long vm_dirty_bytes = 0;
+int vm_dirty_ratio = 20;
+int dirty_background_ratio = 10;
+unsigned int dirty_expire_interval = 30 * 100;
+unsigned int dirty_writeback_interval = 5 * 100;
+unsigned long dirty_background_bytes = 0;
+int percpu_pagelist_fraction = 0;
+int panic_timeout = 0;
+int panic_on_oops = 0;
+int printk_delay_msec = 0;
+int panic_on_warn = 0;
+DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
+
+#define RESERVED_PIDS 300
+int pid_max = PID_MAX_DEFAULT;
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+int min_free_kbytes = 1024;
+int max_threads = 100;
+int laptop_mode = 0;
+
+#define DEFAULT_MESSAGE_LOGLEVEL 4
+#define MINIMUM_CONSOLE_LOGLEVEL 1
+#define DEFAULT_CONSOLE_LOGLEVEL 7
+int console_printk[4] = {
+   DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
+   DEFAULT_MESSAGE_LOGLEVEL,   /* default_message_loglevel */
+   MINIMUM_CONSOLE_LOGLEVEL,   /* minimum_console_loglevel */
+   DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
+};
+
+int print_fatal_signals = 0;
+unsigned int core_pipe_limit = 0;
+int core_uses_pid = 0;
+int vm_swappiness = 60;
+int

[RFC PATCH v3 10/10] lib: tools used for test scripts

2015-04-19 Thread Hajime Tazaki
These auxiliary files are used for testing and debugging of net/ code
with libos. a simple test is implemented with make test ARCH=lib.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 tools/testing/libos/.gitignore   |  6 +
 tools/testing/libos/Makefile | 38 +++
 tools/testing/libos/README   | 15 +++
 tools/testing/libos/bisect.sh| 10 +++
 tools/testing/libos/dce-test.sh  | 23 
 tools/testing/libos/nuse-test.sh | 57 
 6 files changed, 149 insertions(+)
 create mode 100644 tools/testing/libos/.gitignore
 create mode 100644 tools/testing/libos/Makefile
 create mode 100644 tools/testing/libos/README
 create mode 100755 tools/testing/libos/bisect.sh
 create mode 100755 tools/testing/libos/dce-test.sh
 create mode 100755 tools/testing/libos/nuse-test.sh

diff --git a/tools/testing/libos/.gitignore b/tools/testing/libos/.gitignore
new file mode 100644
index 000..57a74a0
--- /dev/null
+++ b/tools/testing/libos/.gitignore
@@ -0,0 +1,6 @@
+*.pcap
+files-*
+bake
+buildtop
+core
+exitprocs
diff --git a/tools/testing/libos/Makefile b/tools/testing/libos/Makefile
new file mode 100644
index 000..3da25429
--- /dev/null
+++ b/tools/testing/libos/Makefile
@@ -0,0 +1,38 @@
+ADD_PARAM?=
+
+all: test
+
+bake:
+   hg clone http://code.nsnam.org/bake
+
+check_pkgs:
+   @./bake/bake.py check | grep Bazaar | grep OK || (echo bzr is missing 
 ./bake/bake.py check)
+   @./bake/bake.py check | grep autoreconf | grep OK || (echo autotools 
is missing  ./bake/bake.py check  exit 1)
+
+testbin: bake check_pkgs
+   @cp ../../../arch/lib/tools/bakeconf-linux.xml bake/bakeconf.xml
+   @mkdir -p buildtop/build/bin_dce
+   cd buildtop ; \
+   ../bake/bake.py configure -e dce-linux-inkernel $(BAKECONF_PARAMS)
+   cd buildtop ; \
+   ../bake/bake.py show --enabledTree | grep -v  -E 
pygoocanvas|graphviz|python-dev | grep Missing  (echo required packages 
are missing) || echo 
+   cd buildtop ; \
+   ../bake/bake.py download ; \
+   ../bake/bake.py update ; \
+   ../bake/bake.py build
+
+test:
+   @./dce-test.sh ADD_PARAM=$(ADD_PARAM)
+
+test-valgrind:
+   @./dce-test.sh -g ADD_PARAM=$(ADD_PARAM)
+
+test-fault-injection:
+   @./dce-test.sh -f ADD_PARAM=$(ADD_PARAM)
+
+clean:
+#  @rm -rf buildtop
+   @rm -f *.pcap
+   @rm -rf files-*
+   @rm -f exitprocs
+   @rm -f core
diff --git a/tools/testing/libos/README b/tools/testing/libos/README
new file mode 100644
index 000..51ac5a5
--- /dev/null
+++ b/tools/testing/libos/README
@@ -0,0 +1,15 @@
+
+- bisect.sh
+a sample script to bisect an issue of network stack code with the help
+of LibOS (and ns-3 network simulator). This was used to detect the issue
+for the following patch.
+
+http://patchwork.ozlabs.org/patch/436351/
+
+- dce-test.sh
+a test script invoked by 'make test ARCH=lib'. The contents of test
+scenario are implemented as test suites of ns-3 network simulator.
+
+- nuse-test.sh
+a simple test script for Network Stack in Userspace (NUSE).
+
diff --git a/tools/testing/libos/bisect.sh b/tools/testing/libos/bisect.sh
new file mode 100755
index 000..9377ac3
--- /dev/null
+++ b/tools/testing/libos/bisect.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+git merge origin/nuse --no-commit
+make clean ARCH=lib
+make library ARCH=lib OPT=no
+make test ARCH=lib ADD_PARAM= -s dce-umip
+RET=$?
+git reset --hard
+
+exit $RET
diff --git a/tools/testing/libos/dce-test.sh b/tools/testing/libos/dce-test.sh
new file mode 100755
index 000..e81e2d8
--- /dev/null
+++ b/tools/testing/libos/dce-test.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -e
+#set -x
+export LD_LOG=symbol-fail
+#VERBOSE=-v
+VALGRIND=
+FAULT_INJECTION=
+
+if [ $1 = -g ] ; then
+ VALGRIND=-g
+# Not implemneted yet.
+#elif [ $1 = -f ] ; then
+# FAULT_INJECTION=-f
+fi
+
+# FIXME
+#export NS_ATTRIBUTE_DEFAULT='ns3::DceManagerHelper::LoaderFactory=ns3::\
+#DlmLoaderFactory[];ns3::TaskManager::FiberManagerType=UcontextFiberManager'
+
+cd buildtop/source/ns-3-dce
+LD_LIBRARY_PATH=${srctree} ./test.py -n ${VALGRIND} ${FAULT_INJECTION}\
+  ${VERBOSE} ${ADD_PARAM}
diff --git a/tools/testing/libos/nuse-test.sh b/tools/testing/libos/nuse-test.sh
new file mode 100755
index 000..198e7e4
--- /dev/null
+++ b/tools/testing/libos/nuse-test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash -e
+
+LIBOS_TOOLS=arch/lib/tools
+
+IFNAME=`ip route |grep default | awk '{print $5}'`
+GW=`ip route |grep default | awk '{print $3}'`
+#XXX
+IPADDR=`echo $GW | sed -r s/([0-9]+\.[0-9]+\.[0-9]+\.)([0-9]+)$/\1\`expr \2 + 
10\`/`
+
+# ip route
+# ip address
+# ip link
+
+NUSE_CONF=/tmp/nuse.conf
+
+cat  ${NUSE_CONF}  ENDCONF
+
+interface ${IFNAME}
+   address ${IPADDR}
+   netmask 255.255.255.0
+   macaddr 00:01:01:01:01:02
+   viftype RAW
+
+route
+   network 0.0.0.0
+   netmask 0.0.0.0
+   gateway ${GW}
+
+ENDCONF
+
+cd ${LIBOS_TOOLS}
+sudo NUSECONF

[RFC PATCH v3 09/10] lib: libos build scripts and documentation

2015-04-19 Thread Hajime Tazaki
document and build scripts for libos architecture.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Ryo Nakamura u...@haeena.net
---
 Documentation/virtual/libos-howto.txt | 144 
 MAINTAINERS   |   9 +
 arch/lib/.gitignore   |   8 +
 arch/lib/Kconfig  | 124 +++
 arch/lib/Makefile | 251 +
 arch/lib/Makefile.print   |  45 +++
 arch/lib/defconfig| 653 ++
 arch/lib/generate-linker-script.py|  50 +++
 arch/lib/processor.mk |   7 +
 9 files changed, 1291 insertions(+)
 create mode 100644 Documentation/virtual/libos-howto.txt
 create mode 100644 arch/lib/.gitignore
 create mode 100644 arch/lib/Kconfig
 create mode 100644 arch/lib/Makefile
 create mode 100644 arch/lib/Makefile.print
 create mode 100644 arch/lib/defconfig
 create mode 100755 arch/lib/generate-linker-script.py
 create mode 100644 arch/lib/processor.mk

diff --git a/Documentation/virtual/libos-howto.txt 
b/Documentation/virtual/libos-howto.txt
new file mode 100644
index 000..fbf7946
--- /dev/null
+++ b/Documentation/virtual/libos-howto.txt
@@ -0,0 +1,144 @@
+Library operating system (libos) version of Linux
+=
+
+* Overview
+
+New hardware independent architecture 'arch/lib', configured by
+CONFIG_LIB gives you two features.
+
+- network stack in userspace (NUSE)
+  NUSE will give you a personalized network stack for each application
+  without replacing host operating system.
+
+- network simulator integration, which is called Direct Code Execution (DCE)
+  DCE will give us a network simulation environment with Linux network stack
+  to investigate the detail behavior protocol implementation with a flexible
+  network configuration. This is also useful for the testing environment.
+
+(- more abstracted implementation of underlying platform will be a future
+   direction (e.g., rump hypercall))
+
+In both features, Linux kernel network stack is running on top of
+userspace application with a linked or dynamically loaded library.
+
+They have their own, isolated network stack from host operating system
+so they are configured different IP addresses as other virtualization
+methods do.
+
+
+* How different with others ?
+
+- User-mode Linux (UML)
+
+UML is a way to execute Linux kernel code as a userspace
+application. It is completely isolated from host kernel but can host
+arbitrary userspace applications on top of UML.
+
+- namespace / container
+
+Container technologies with namespace brings a process-level isolation
+to host multiple network entities but shares the kernel among
+processes, which prevents to introduce new features implemented in
+kernel space.
+
+
+* How to build it ?
+
+configuration of arch/lib follows a standard configuration of kernel.
+
+ make defconfig ARCH=lib
+
+or
+
+ make menuconfig ARCH=lib
+
+then you can build a set of libraries for libos.
+
+ make library ARCH=lib
+
+This will give you a shared library file liblinux-$(KERNELVERSION).so
+in the top directory.
+
+* Hello world
+
+you may first need to configure a configuration file, named
+'nuse.conf' so that the library version of network stack can know what
+kind of IP configuration should be used. There is an example file
+at arch/lib/nuse.conf.sample: you may copy and modify it for your purpose.
+
+ sudo NUSECONF=nuse.conf ./nuse ping www.google.com
+
+
+
+* Example use cases
+- regression test with Direct Code Execution (DCE)
+
+'make test' by DCE gives a test platform for networking code, with the
+help of network simulator facilities like link delay/bandwidth/drop
+configurations, large network topology with userspace routing protocol
+daemons, etc.
+
+An interesting feature is the determinism of any test executions. A
+test script always gives same results in every execution if there is
+no modification on test target code.
+
+For the first step, you need to obtain network simulator
+environment. 'make testbin' does all the stuff for the preparation.
+
+% make testbin -C tools/testing/libos
+
+Then, you can 'make test' for your code.
+
+% make test ARCH=lib
+
+ PASS: TestSuite netlink-socket
+ PASS: TestSuite process-manager
+ PASS: TestSuite dce-cradle
+ PASS: TestSuite dce-mptcp
+ PASS: TestSuite dce-umip
+ PASS: TestSuite dce-quagga
+ PASS: Example dce-tcp-simple
+ PASS: Example dce-udp-simple
+
+
+- userspace network stack (NUSE)
+
+an application can use its own network stack, distinct from host network stack
+in order to personalize any network feature to the application specific one.
+The 'nuse' wrapper script, based on LD_PRELOAD technique, carefully replaces
+socket API and redirects system calls to the network stack library, provided by
+this framework.
+
+the network stack can be used with any kind of raw-socket like
+technologies such as Intel DPDK, netmap, etc.
+
+
+
+* Files / External

[RFC PATCH v3 05/10] lib: context and scheduling functions (kernel glue code) for libos

2015-04-19 Thread Hajime Tazaki
contexnt primitives of kernel such as soft interupts, scheduling,
tasklet are implemented for libos. these functions eventually call the
functions registered by lib_init() API as well.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 arch/lib/sched.c | 406 +++
 arch/lib/softirq.c   | 108 ++
 arch/lib/tasklet.c   |  76 ++
 arch/lib/workqueue.c | 242 ++
 4 files changed, 832 insertions(+)
 create mode 100644 arch/lib/sched.c
 create mode 100644 arch/lib/softirq.c
 create mode 100644 arch/lib/tasklet.c
 create mode 100644 arch/lib/workqueue.c

diff --git a/arch/lib/sched.c b/arch/lib/sched.c
new file mode 100644
index 000..98a568a
--- /dev/null
+++ b/arch/lib/sched.c
@@ -0,0 +1,406 @@
+/*
+ * glue code for library version of Linux kernel
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#include linux/wait.h
+#include linux/list.h
+#include linux/sched.h
+#include linux/nsproxy.h
+#include linux/hash.h
+#include net/net_namespace.h
+#include lib.h
+#include sim.h
+#include sim-assert.h
+
+/**
+   called by wait_event macro:
+   - prepare_to_wait
+   - schedule
+   - finish_wait
+ */
+
+struct SimTask *lib_task_create(void *private, unsigned long pid)
+{
+   struct SimTask *task = lib_malloc(sizeof(struct SimTask));
+   struct cred *cred;
+   struct nsproxy *ns;
+   struct user_struct *user;
+   struct thread_info *info;
+   struct pid *kpid;
+
+   if (!task)
+   return NULL;
+   memset(task, 0, sizeof(struct SimTask));
+   cred = lib_malloc(sizeof(struct cred));
+   if (!cred)
+   return NULL;
+   /* XXX: we could optimize away this allocation by sharing it
+  for all tasks */
+   ns = lib_malloc(sizeof(struct nsproxy));
+   if (!ns)
+   return NULL;
+   user = lib_malloc(sizeof(struct user_struct));
+   if (!user)
+   return NULL;
+   info = alloc_thread_info(task-kernel_task);
+   if (!info)
+   return NULL;
+   kpid = lib_malloc(sizeof(struct pid));
+   if (!kpid)
+   return NULL;
+   kpid-numbers[0].nr = pid;
+   cred-fsuid = make_kuid(current_user_ns(), 0);
+   cred-fsgid = make_kgid(current_user_ns(), 0);
+   cred-user = user;
+   atomic_set(cred-usage, 1);
+   info-task = task-kernel_task;
+   info-preempt_count = 0;
+   info-flags = 0;
+   atomic_set(ns-count, 1);
+   ns-uts_ns = 0;
+   ns-ipc_ns = 0;
+   ns-mnt_ns = 0;
+   ns-pid_ns_for_children = 0;
+   ns-net_ns = init_net;
+   task-kernel_task.cred = cred;
+   task-kernel_task.pid = pid;
+   task-kernel_task.pids[PIDTYPE_PID].pid = kpid;
+   task-kernel_task.pids[PIDTYPE_PGID].pid = kpid;
+   task-kernel_task.pids[PIDTYPE_SID].pid = kpid;
+   task-kernel_task.nsproxy = ns;
+   task-kernel_task.stack = info;
+   /* this is a hack. */
+   task-kernel_task.group_leader = task-kernel_task;
+   task-private = private;
+   return task;
+}
+void lib_task_destroy(struct SimTask *task)
+{
+   lib_free((void *)task-kernel_task.nsproxy);
+   lib_free((void *)task-kernel_task.cred);
+   lib_free((void *)task-kernel_task.cred-user);
+   free_thread_info(task-kernel_task.stack);
+   lib_free(task);
+}
+void *lib_task_get_private(struct SimTask *task)
+{
+   return task-private;
+}
+
+int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+   struct SimTask *task = lib_task_start((void (*)(void *))fn, arg);
+
+   return task-kernel_task.pid;
+}
+
+struct task_struct *get_current(void)
+{
+   struct SimTask *lib_task = lib_task_current();
+
+   return lib_task-kernel_task;
+}
+
+struct thread_info *current_thread_info(void)
+{
+   return task_thread_info(get_current());
+}
+struct thread_info *alloc_thread_info(struct task_struct *task)
+{
+   return lib_malloc(sizeof(struct thread_info));
+}
+void free_thread_info(struct thread_info *ti)
+{
+   lib_free(ti);
+}
+
+
+void __put_task_struct(struct task_struct *t)
+{
+   lib_free(t);
+}
+
+void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait-flags = ~WQ_FLAG_EXCLUSIVE;
+   list_add(wait-task_list, q-task_list);
+}
+void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   wait-flags |= WQ_FLAG_EXCLUSIVE;
+   list_add_tail(wait-task_list, q-task_list);
+}
+void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   if (wait-task_list.prev != LIST_POISON2)
+   list_del(wait-task_list);
+}
+void
+prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+   wait-flags |= WQ_FLAG_EXCLUSIVE;
+   if (list_empty(wait-task_list))
+   list_add_tail(wait

[RFC PATCH v3 03/10] lib: public headers and API implementations for userspace programs

2015-04-19 Thread Hajime Tazaki
userspace programs which uses libos access via a public API, lib_init(),
with passed arguments struct SimImported and struct SimExported.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
Signed-off-by: Ryo Nakamura u...@haeena.net
---
 arch/lib/include/sim-assert.h |  23 +++
 arch/lib/include/sim-init.h   | 134 ++
 arch/lib/include/sim-printf.h |  13 ++
 arch/lib/include/sim-types.h  |  53 ++
 arch/lib/include/sim.h|  51 ++
 arch/lib/lib-device.c | 187 +++
 arch/lib/lib-socket.c | 410 ++
 arch/lib/lib.c| 294 ++
 arch/lib/lib.h|  21 +++
 9 files changed, 1186 insertions(+)
 create mode 100644 arch/lib/include/sim-assert.h
 create mode 100644 arch/lib/include/sim-init.h
 create mode 100644 arch/lib/include/sim-printf.h
 create mode 100644 arch/lib/include/sim-types.h
 create mode 100644 arch/lib/include/sim.h
 create mode 100644 arch/lib/lib-device.c
 create mode 100644 arch/lib/lib-socket.c
 create mode 100644 arch/lib/lib.c
 create mode 100644 arch/lib/lib.h

diff --git a/arch/lib/include/sim-assert.h b/arch/lib/include/sim-assert.h
new file mode 100644
index 000..974122c
--- /dev/null
+++ b/arch/lib/include/sim-assert.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#ifndef SIM_ASSERT_H
+#define SIM_ASSERT_H
+
+#include sim-printf.h
+
+#define lib_assert(v) {
\
+   while (!(v)) {  \
+   lib_printf(Assert failed %s:%u \ #v \\n,  \
+   __FILE__, __LINE__);\
+   char *p = 0;\
+   *p = 1; \
+   }   \
+   }
+
+
+#endif /* SIM_ASSERT_H */
diff --git a/arch/lib/include/sim-init.h b/arch/lib/include/sim-init.h
new file mode 100644
index 000..e871a59
--- /dev/null
+++ b/arch/lib/include/sim-init.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 INRIA, Hajime Tazaki
+ *
+ * Author: Mathieu Lacage mathieu.lac...@gmail.com
+ * Hajime Tazaki taz...@sfc.wide.ad.jp
+ */
+
+#ifndef SIM_INIT_H
+#define SIM_INIT_H
+
+#include linux/socket.h
+#include sim-types.h
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
+struct SimExported {
+   struct SimTask *(*task_create)(void *priv, unsigned long pid);
+   void (*task_destroy)(struct SimTask *task);
+   void *(*task_get_private)(struct SimTask *task);
+
+   int (*sock_socket)(int domain, int type, int protocol,
+   struct SimSocket **socket);
+   int (*sock_close)(struct SimSocket *socket);
+   ssize_t (*sock_recvmsg)(struct SimSocket *socket, struct msghdr *msg,
+   int flags);
+   ssize_t (*sock_sendmsg)(struct SimSocket *socket,
+   const struct msghdr *msg, int flags);
+   int (*sock_getsockname)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_getpeername)(struct SimSocket *socket,
+   struct sockaddr *name, int *namelen);
+   int (*sock_bind)(struct SimSocket *socket, const struct sockaddr *name,
+   int namelen);
+   int (*sock_connect)(struct SimSocket *socket,
+   const struct sockaddr *name, int namelen,
+   int flags);
+   int (*sock_listen)(struct SimSocket *socket, int backlog);
+   int (*sock_shutdown)(struct SimSocket *socket, int how);
+   int (*sock_accept)(struct SimSocket *socket,
+   struct SimSocket **newSocket, int flags);
+   int (*sock_ioctl)(struct SimSocket *socket, int request, char *argp);
+   int (*sock_setsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   const void *optval, int optlen);
+   int (*sock_getsockopt)(struct SimSocket *socket, int level,
+   int optname,
+   void *optval, int *optlen);
+
+   void (*sock_poll)(struct SimSocket *socket, void *ret);
+   void (*sock_pollfreewait)(void *polltable);
+
+   struct SimDevice *(*dev_create)(const char *ifname, void *priv,
+   enum SimDevFlags flags);
+   void (*dev_destroy)(struct SimDevice *dev);
+   void *(*dev_get_private)(struct SimDevice *task);
+   void (*dev_set_address)(struct SimDevice *dev,
+   unsigned char buffer[6]);
+   void (*dev_set_mtu)(struct SimDevice *dev, int mtu);
+   struct

[RFC PATCH v3 01/10] sysctl: make some functions unstatic to access by arch/lib

2015-04-19 Thread Hajime Tazaki
libos (arch/lib) emulates a sysctl-like interface by a function call of
userspace by enumerating sysctl tree from sysctl_table_root. It requires
to be publicly accessible to this symbol and related functions.

Signed-off-by: Hajime Tazaki taz...@sfc.wide.ad.jp
---
 fs/proc/proc_sysctl.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index f92d5dd..56feec7 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -35,7 +35,7 @@ static struct ctl_table root_table[] = {
},
{ }
 };
-static struct ctl_table_root sysctl_table_root = {
+struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
{{.count = 1,
  .nreg = 1,
@@ -77,8 +77,9 @@ static int namecmp(const char *name1, int len1, const char 
*name2, int len2)
 }
 
 /* Called under sysctl_lock */
-static struct ctl_table *find_entry(struct ctl_table_header **phead,
-   struct ctl_dir *dir, const char *name, int namelen)
+struct ctl_table *ctl_table_find_entry(struct ctl_table_header **phead,
+  struct ctl_dir *dir, const char *name,
+  int namelen)
 {
struct ctl_table_header *head;
struct ctl_table *entry;
@@ -300,7 +301,7 @@ static struct ctl_table *lookup_entry(struct 
ctl_table_header **phead,
struct ctl_table *entry;
 
spin_lock(sysctl_lock);
-   entry = find_entry(head, dir, name, namelen);
+   entry = ctl_table_find_entry(head, dir, name, namelen);
if (entry  use_table(head))
*phead = head;
else
@@ -321,7 +322,7 @@ static struct ctl_node *first_usable_entry(struct rb_node 
*node)
return NULL;
 }
 
-static void first_entry(struct ctl_dir *dir,
+void ctl_table_first_entry(struct ctl_dir *dir,
struct ctl_table_header **phead, struct ctl_table **pentry)
 {
struct ctl_table_header *head = NULL;
@@ -339,7 +340,7 @@ static void first_entry(struct ctl_dir *dir,
*pentry = entry;
 }
 
-static void next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
+void ctl_table_next_entry(struct ctl_table_header **phead, struct ctl_table 
**pentry)
 {
struct ctl_table_header *head = *phead;
struct ctl_table *entry = *pentry;
@@ -670,7 +671,8 @@ static int proc_sys_readdir(struct file *file, struct 
dir_context *ctx)
 
pos = 2;
 
-   for (first_entry(ctl_dir, h, entry); h; next_entry(h, entry)) {
+   for (ctl_table_first_entry(ctl_dir, h, entry); h;
+ctl_table_next_entry(h, entry)) {
if (!scan(h, entry, pos, file, ctx)) {
sysctl_head_finish(h);
break;
@@ -828,7 +830,7 @@ static struct ctl_dir *find_subdir(struct ctl_dir *dir,
struct ctl_table_header *head;
struct ctl_table *entry;
 
-   entry = find_entry(head, dir, name, namelen);
+   entry = ctl_table_find_entry(head, dir, name, namelen);
if (!entry)
return ERR_PTR(-ENOENT);
if (!S_ISDIR(entry-mode))
@@ -924,13 +926,13 @@ failed:
return subdir;
 }
 
-static struct ctl_dir *xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
+struct ctl_dir *ctl_table_xlate_dir(struct ctl_table_set *set, struct ctl_dir 
*dir)
 {
struct ctl_dir *parent;
const char *procname;
if (!dir-header.parent)
return set-dir;
-   parent = xlate_dir(set, dir-header.parent);
+   parent = ctl_table_xlate_dir(set, dir-header.parent);
if (IS_ERR(parent))
return parent;
procname = dir-header.ctl_table[0].procname;
@@ -951,13 +953,13 @@ static int sysctl_follow_link(struct ctl_table_header 
**phead,
spin_lock(sysctl_lock);
root = (*pentry)-data;
set = lookup_header_set(root, namespaces);
-   dir = xlate_dir(set, (*phead)-parent);
+   dir = ctl_table_xlate_dir(set, (*phead)-parent);
if (IS_ERR(dir))
ret = PTR_ERR(dir);
else {
const char *procname = (*pentry)-procname;
head = NULL;
-   entry = find_entry(head, dir, procname, strlen(procname));
+   entry = ctl_table_find_entry(head, dir, procname, 
strlen(procname));
ret = -ENOENT;
if (entry  use_table(head)) {
unuse_table(*phead);
@@ -1069,7 +1071,7 @@ static bool get_links(struct ctl_dir *dir,
/* Are there links available for every entry in table? */
for (entry = table; entry-procname; entry++) {
const char *procname = entry-procname;
-   link = find_entry(head, dir, procname, strlen(procname));
+   link = ctl_table_find_entry(head, dir, procname, 
strlen(procname));
if (!link)
return false

  1   2   >