Re: [RFC PATCH 10/13] mm: Introduce first class virtual address spaces

2017-03-13 Thread Till Smejkal
Hi Vineet,

On Mon, 13 Mar 2017, Vineet Gupta wrote:
> I've not looked at the patches closely (or read the references paper fully 
> yet),
> but at first glance it seems on ARC architecture, we can can potentially
> use/leverage this mechanism to implement the shared TLB entries. Before anyone
> shouts these are not same as the IA64/x86 protection keys which allow TLB 
> entries
> with different protection bits across processes etc. These TLB entries are
> actually *shared* by processes.
> 
> Conceptually there's shared address spaces, independent of processes. e.g. 
> ldso
> code is shared address space #1, libc (code) #2  System can support a 
> limited
> number of shared addr spaces (say 64, enough for typical embedded sys).
> 
> While Normal TLB entries are tagged with ASID (Addr space ID) to keep them 
> unique
> across processes, Shared TLB entries are tagged with Shared address space ID.
> 
> A process MMU context consists of ASID (a single number) and a SASID bitmap 
> (to
> allow "subscription" to multiple Shared spaces. The subscriptions are set up 
> bu
> userspace ld.so which knows about the libs process wants to map.
> 
> The restriction ofcourse is that the spaces are mapped at *same* vaddr is all
> participating processes. I know this goes against whole security, address 
> space
> randomization - but it gives much better real time performance. Why does each
> process need to take a MMU exception for libc code...
> 
> So long story short - it seems there can be multiple uses of this 
> infrastructure !

During the development of this code, we also looked at shared TLB entries, but
the other way around. We wanted to use them to prevent flushing of TLB entries 
of
shared memory regions when switching between multiple ASes. Unfortunately, we 
never
finished this part of the code.

However, we also investigated into a different use-case for first class virtual
address spaces that is related to what you propose if I didn't understand 
something
wrong. The idea is to move shared libraries into their own first class virtual
address space and only load some small trampoline code in the application AS. 
This
trampoline code performs the VAS switch in the libraries AS and execute the 
requested
function there. If we combine this architecture with tagged TLB entries to 
prevent
TLB flushes during the switch operation, it can also reach an acceptable 
performance.
A side effect of moving the shared library into its own AS is that it can not 
be used
by ROP-attacks because it is not accessible in the application's AS.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 10/13] mm: Introduce first class virtual address spaces

2017-03-13 Thread Greg Kroah-Hartman
On Mon, Mar 13, 2017 at 03:14:12PM -0700, Till Smejkal wrote:

There's no way with that many cc: lists and people that this is really
making it through very many people's filters and actually on a mailing
list.  Please trim them down.

Minor sysfs questions/issues:

> +struct vas {
> + struct kobject kobj;/* < the internal kobject that we use *
> +  *   for reference counting and sysfs *
> +  *   handling.*/
> +
> + int id; /* < ID   */
> + char name[VAS_MAX_NAME_LENGTH]; /* < name */

The kobject has a name, why not use that?

> +
> + struct mutex mtx;   /* < lock for parallel access.*/
> +
> + struct mm_struct *mm;   /* < a partial memory map containing  *
> +  *   all mappings of this VAS.*/
> +
> + struct list_head link;  /* < the link in the global VAS list. */
> + struct rcu_head rcu;/* < the RCU helper used for  *
> +  *   asynchronous VAS deletion.   */
> +
> + u16 refcount;   /* < how often is the VAS attached.   */

The kobject has a refcount, use that?  Don't have 2 refcounts in the
same structure, that way lies madness.  And bugs, lots of bugs...

And if this really is a refcount (hint, I don't think it is), you should
use the refcount_t type.

> +/**
> + * The sysfs structure we need to handle attributes of a VAS.
> + **/
> +struct vas_sysfs_attr {
> + struct attribute attr;
> + ssize_t (*show)(struct vas *vas, struct vas_sysfs_attr *vsattr,
> + char *buf);
> + ssize_t (*store)(struct vas *vas, struct vas_sysfs_attr *vsattr,
> +  const char *buf, size_t count);
> +};
> +
> +#define VAS_SYSFS_ATTR(NAME, MODE, SHOW, STORE)  
> \
> +static struct vas_sysfs_attr vas_sysfs_attr_##NAME = \
> + __ATTR(NAME, MODE, SHOW, STORE)

__ATTR_RO and __ATTR_RW should work better for you.  If you really need
this.

Oh, and where is the Documentation/ABI/ updates to try to describe the
sysfs structure and files?  Did I miss that in the series?

> +static ssize_t __show_vas_name(struct vas *vas, struct vas_sysfs_attr 
> *vsattr,
> +char *buf)
> +{
> + return scnprintf(buf, PAGE_SIZE, "%s", vas->name);

It's a page size, just use sprintf() and be done with it.  No need to
ever check, you "know" it will be correct.

Also, what about a trailing '\n' for these attributes?

Oh wait, why have a name when the kobject name is already there in the
directory itself?  Do you really need this?

> +/**
> + * The ktype data structure representing a VAS.
> + **/
> +static struct kobj_type vas_ktype = {
> + .sysfs_ops = _sysfs_ops,
> + .release = __vas_release,

Why the odd __vas* naming?  What's wrong with vas_release?


> + .default_attrs = vas_default_attr,
> +};
> +
> +
> +/***
> + * Internally visible functions
> + ***/
> +
> +/**
> + * Working with the global VAS list.
> + **/
> +static inline void vas_remove(struct vas *vas)



You have a ton of inline functions, for no good reason.  Make them all
"real" functions please.  Unless you can measure the size/speed
differences?  If so, please say so.


thanks,

greg k-h

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 10/13] mm: Introduce first class virtual address spaces

2017-03-13 Thread Till Smejkal
Introduce a different type of address spaces which are first class citizens
in the OS. That means that the kernel now handles two types of AS, those
which are closely coupled with a process and those which aren't. While the
former ones are created and destroyed together with the process by the
kernel and are the default type of AS in the Linux kernel, the latter ones
have to be managed explicitly by the user and are the newly introduced
type.

Accordingly, a first class AS (also called VAS == virtual address space)
can exist in the OS independently from any process. A user has to
explicitly create and destroy them in the system. Processes and VAS can be
combined by attaching a previously created VAS to a process which basically
adds an additional AS to the process that the process' threads are able to
execute in. Hence, VAS allow a process to have different views onto the
main memory of the system (its original AS and the attached VAS) between
which its threads can switch arbitrarily during their lifetime.

The functionality made available through first class virtual address spaces
can be used in various different ways. One possible way to utilize VAS is
to compartmentalize a process for security reasons. Another possible usage
is to improve the performance of data-centric applications by being able to
manage different sets of data in memory without the need to map or unmap
them.

Furthermore, first class virtual address spaces can be attached to
different processes at the same time if the underlying memory is only
readable. This mechanism allows sharing of whole address spaces between
multiple processes that can both execute in them using the contained
memory.

Signed-off-by: Till Smejkal 
Signed-off-by: Marco Benatto 
---
 MAINTAINERS|   10 +
 arch/x86/entry/syscalls/syscall_32.tbl |9 +
 arch/x86/entry/syscalls/syscall_64.tbl |9 +
 fs/exec.c  |3 +
 include/linux/mm_types.h   |8 +
 include/linux/sched.h  |   17 +
 include/linux/syscalls.h   |   11 +
 include/linux/vas.h|  182 +++
 include/linux/vas_types.h  |   88 ++
 include/uapi/asm-generic/unistd.h  |   20 +-
 include/uapi/linux/Kbuild  |1 +
 include/uapi/linux/vas.h   |   16 +
 init/main.c|2 +
 kernel/exit.c  |2 +
 kernel/fork.c  |   28 +-
 kernel/sys_ni.c|   11 +
 mm/Kconfig |   20 +
 mm/Makefile|1 +
 mm/internal.h  |8 +
 mm/memory.c|3 +
 mm/mmap.c  |   22 +
 mm/vas.c   | 2188 
 22 files changed, 2657 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/vas.h
 create mode 100644 include/linux/vas_types.h
 create mode 100644 include/uapi/linux/vas.h
 create mode 100644 mm/vas.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 527d13759ecc..060b1c64e67a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5040,6 +5040,16 @@ F:   Documentation/firmware_class/
 F: drivers/base/firmware*.c
 F: include/linux/firmware.h
 
+FIRST CLASS VIRTUAL ADDRESS SPACES
+M: Till Smejkal 
+L: linux-ker...@vger.kernel.org
+L: linux...@kvack.org
+S: Maintained
+F: include/linux/vas_types.h
+F: include/linux/vas.h
+F: include/uapi/linux/vas.h
+F: mm/vas.c
+
 FLASH ADAPTER DRIVER (IBM Flash Adapter 900GB Full Height PCI Flash Card)
 M: Joshua Morris 
 M: Philip Kelleher 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 2b3618542544..8c553eef8c44 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -389,3 +389,12 @@
 380i386pkey_mprotect   sys_pkey_mprotect
 381i386pkey_alloc  sys_pkey_alloc
 382i386pkey_free   sys_pkey_free
+383i386vas_create  sys_vas_create
+384i386vas_delete  sys_vas_delete
+385i386vas_findsys_vas_find
+386i386vas_attach  sys_vas_attach
+387i386vas_detach  sys_vas_detach
+388i386vas_switch  sys_vas_switch
+389i386active_vas  sys_active_vas
+390i386vas_getattr sys_vas_getattr
+391i386vas_setattr sys_vas_setattr
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index e93ef0b38db8..72f1f0495710 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -338,6 +338,15 @@
 329

Re: [RFC PATCH 10/13] mm: Introduce first class virtual address spaces

2017-03-13 Thread Till Smejkal
Hi Greg,

First of all thanks for your reply.

On Tue, 14 Mar 2017, Greg Kroah-Hartman wrote:
> On Mon, Mar 13, 2017 at 03:14:12PM -0700, Till Smejkal wrote:
> 
> There's no way with that many cc: lists and people that this is really
> making it through very many people's filters and actually on a mailing
> list.  Please trim them down.

I am sorry that the patch's cc-list is too big. This was the list of people 
that the
get_maintainers.pl script produced. I already recognized that it was a huge 
number of
people, but I didn't want to remove anyone from the list because I wasn't sure 
who
would be interested in this patch set. Do you have any suggestion who to remove 
from
the list? I don't want to annoy anyone with useless emails.

> Minor sysfs questions/issues:
> 
> > +struct vas {
> > +   struct kobject kobj;/* < the internal kobject that we use *
> > +*   for reference counting and sysfs *
> > +*   handling.*/
> > +
> > +   int id; /* < ID   */
> > +   char name[VAS_MAX_NAME_LENGTH]; /* < name */
> 
> The kobject has a name, why not use that?

The reason why I don't use the kobject's name is that I don't restrict the 
names that
are used for VAS/VAS segments. Accordingly, it would be allowed to use a name 
like
"foo/bar/xyz" as VAS name. However, I am not sure what would happen in the 
sysfs if I
would use such a name for the kobject. Especially, since one could think of 
another
VAS with the name "foo/bar" whose name would conflict with the first one 
although it
not necessarily has any connection with it.

> > +
> > +   struct mutex mtx;   /* < lock for parallel access.*/
> > +
> > +   struct mm_struct *mm;   /* < a partial memory map containing  *
> > +*   all mappings of this VAS.*/
> > +
> > +   struct list_head link;  /* < the link in the global VAS list. */
> > +   struct rcu_head rcu;/* < the RCU helper used for  *
> > +*   asynchronous VAS deletion.   */
> > +
> > +   u16 refcount;   /* < how often is the VAS attached.   */
> 
> The kobject has a refcount, use that?  Don't have 2 refcounts in the
> same structure, that way lies madness.  And bugs, lots of bugs...
> 
> And if this really is a refcount (hint, I don't think it is), you should
> use the refcount_t type.

I actually use both the internal kobject refcount to keep track of how often a
VAS/VAS segment is referenced and this 'refcount' variable to keep track how 
often
the VAS is actually attached to a task. They not necessarily must be related to 
each
other. I can rename this variable to attach_count. Or if preferred I can
alternatively only use the kobject reference counter and remove this variable
completely though I would loose information about how often the VAS is attached 
to a
task because the kobject reference counter is also used to keep track of other
variables referencing the VAS.

> > +/**
> > + * The sysfs structure we need to handle attributes of a VAS.
> > + **/
> > +struct vas_sysfs_attr {
> > +   struct attribute attr;
> > +   ssize_t (*show)(struct vas *vas, struct vas_sysfs_attr *vsattr,
> > +   char *buf);
> > +   ssize_t (*store)(struct vas *vas, struct vas_sysfs_attr *vsattr,
> > +const char *buf, size_t count);
> > +};
> > +
> > +#define VAS_SYSFS_ATTR(NAME, MODE, SHOW, STORE)
> > \
> > +static struct vas_sysfs_attr vas_sysfs_attr_##NAME =   
> > \
> > +   __ATTR(NAME, MODE, SHOW, STORE)
> 
> __ATTR_RO and __ATTR_RW should work better for you.  If you really need
> this.

Thank you. I will have a look at these functions.

> Oh, and where is the Documentation/ABI/ updates to try to describe the
> sysfs structure and files?  Did I miss that in the series?

Oh sorry, I forgot to add this file. I will add the ABI descriptions for future
submissions.

> > +static ssize_t __show_vas_name(struct vas *vas, struct vas_sysfs_attr 
> > *vsattr,
> > +  char *buf)
> > +{
> > +   return scnprintf(buf, PAGE_SIZE, "%s", vas->name);
> 
> It's a page size, just use sprintf() and be done with it.  No need to
> ever check, you "know" it will be correct.

OK. I was following the sysfs example in the documentation that used scnprintf, 
but
if sprintf is preferred, I can change this.

> Also, what about a trailing '\n' for these attributes?

I will change this.

> Oh wait, why have a name when the kobject name is already there in the
> directory itself?  Do you really need this?

See above.

> > +/**
> > + * The ktype data structure representing a VAS.
> > + **/
> > +static struct kobj_type vas_ktype = {
> > +   .sysfs_ops = _sysfs_ops,
> > +   .release = __vas_release,
> 
> Why the odd