Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> On Thu, 16 Mar 2017, Till Smejkal wrote:
> > On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> > > Why do we need yet another mechanism to represent something which looks
> > > like a file instead of simply using existing mechanisms and extend them?
> > 
> > You are right. I also recognized during the discussion with Andy, Chris,
> > Matthew, Luck, Rich and the others that there are already other
> > techniques in the Linux kernel that can achieve the same functionality
> > when combined. As I said also to the others, I will drop the VAS segments
> > for future versions. The first class virtual address space feature was
> > the more interesting part of the patchset anyways.
> 
> While you are at it, could you please drop this 'first class' marketing as
> well? It has zero technical value, really.

Yes of course. I am sorry for the trouble that I caused already.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Thomas Gleixner
On Thu, 16 Mar 2017, Till Smejkal wrote:
> On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> > Why do we need yet another mechanism to represent something which looks
> > like a file instead of simply using existing mechanisms and extend them?
> 
> You are right. I also recognized during the discussion with Andy, Chris,
> Matthew, Luck, Rich and the others that there are already other
> techniques in the Linux kernel that can achieve the same functionality
> when combined. As I said also to the others, I will drop the VAS segments
> for future versions. The first class virtual address space feature was
> the more interesting part of the patchset anyways.

While you are at it, could you please drop this 'first class' marketing as
well? It has zero technical value, really.

Thanks,

tglx

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> Why do we need yet another mechanism to represent something which looks
> like a file instead of simply using existing mechanisms and extend them?

You are right. I also recognized during the discussion with Andy, Chris, 
Matthew,
Luck, Rich and the others that there are already other techniques in the Linux 
kernel
that can achieve the same functionality when combined. As I said also to the 
others,
I will drop the VAS segments for future versions. The first class virtual 
address
space feature was the more interesting part of the patchset anyways.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Wed, 15 Mar 2017, Luck, Tony wrote:
> On Wed, Mar 15, 2017 at 03:02:34PM -0700, Till Smejkal wrote:
> > I don't agree here. VAS segments are basically in-memory files that are 
> > handled by
> > the kernel directly without using a file system. Hence, if an application 
> > uses a VAS
> > segment to store data the same rules apply as if it uses a file. Everything 
> > that it
> > saves in the VAS segment might be accessible by other applications. An 
> > application
> > using VAS segments should be aware of this fact. In addition, the resources 
> > that are
> > represented by a VAS segment are not leaked. As I said, VAS segments are 
> > much like
> > files. Hence, if you don't want to use them any more, delete them. But as 
> > with files,
> > the kernel will not delete them for you (although something like this can 
> > be added).
> 
> So how do they differ from shmget(2), shmat(2), shmdt(2), shmctl(2)?
> 
> Apart from VAS having better names, instead of silly "key_t key" ones.

Unfortunately, I have to admit that the VAS segments don't differ from shm* a 
lot.
The implementation is differently, but the functionality that you can achieve 
with it
is very similar. I am sorry. We should have looked more closely at the whole
functionality that is provided by the shmem subsystem before working on VAS 
segments.

However, VAS segments are not the key part of this patch set. The more 
interesting
functionality in our opinion is the introduction of first class virtual address
spaces and what they can be used for. VAS segments were just another logical 
step for
us (from first class virtual address spaces to first class virtual address space
segments) but since their functionality can be achieved with various other 
already
existing features of the Linux kernel, I will probably drop them in future 
versions
of the patchset.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Thomas Gleixner
On Wed, 15 Mar 2017, Till Smejkal wrote:
> On Wed, 15 Mar 2017, Andy Lutomirski wrote:

> > > VAS segments on the other side would provide a functionality to
> > > achieve the same without the need of any mounted filesystem. However,
> > > I agree, that this is just a small advantage compared to what can
> > > already be achieved with the existing functionality provided by the
> > > Linux kernel.
> > 
> > I see this "small advantage" as "resource leak and security problem".
> 
> I don't agree here. VAS segments are basically in-memory files that are
> handled by the kernel directly without using a file system. Hence, if an

Why do we need yet another mechanism to represent something which looks
like a file instead of simply using existing mechanisms and extend them?

Thanks,

tglx

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Andy Lutomirski
On Tue, Mar 14, 2017 at 9:12 AM, Till Smejkal
 wrote:
> On Mon, 13 Mar 2017, Andy Lutomirski wrote:
>> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
>>  wrote:
>> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
>> >> This sounds rather complicated.  Getting TLB flushing right seems
>> >> tricky.  Why not just map the same thing into multiple mms?
>> >
>> > This is exactly what happens at the end. The memory region that is 
>> > described by the
>> > VAS segment will be mapped in the ASes that use the segment.
>>
>> So why is this kernel feature better than just doing MAP_SHARED
>> manually in userspace?
>
> One advantage of VAS segments is that they can be globally queried by user 
> programs
> which means that VAS segments can be shared by applications that not 
> necessarily have
> to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
> only work
> if the tasks that share the memory region are related (aka. have a common 
> parent that
> initialized the shared mapping). Otherwise, the shared mapping have to be 
> backed by a
> file.

What's wrong with memfd_create()?

> VAS segments on the other side allow sharing of pure in memory data by
> arbitrary related tasks without the need of a file. This becomes especially
> interesting if one combines VAS segments with non-volatile memory since one 
> can keep
> data structures in the NVM and still be able to share them between multiple 
> tasks.

What's wrong with regular mmap?

>
>> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
>> >> and not make it look magically different depending on which process
>> >> maps it?  If you need a trampoline (which you do, of course), just
>> >> write a trampoline in regular user code and map it manually.
>> >
>> > Did I understand you correctly that you are proposing that the switching 
>> > thread
>> > should make sure by itself that its code, stack, … memory regions are 
>> > properly setup
>> > in the new AS before/after switching into it? I think, this would make 
>> > using first
>> > class virtual address spaces much more difficult for user applications to 
>> > the extend
>> > that I am not even sure if they can be used at all. At the moment, 
>> > switching into a
>> > VAS is a very simple operation for an application because the kernel will 
>> > just simply
>> > do the right thing.
>>
>> Yes.  I think that having the same mm_struct look different from
>> different tasks is problematic.  Getting it right in the arch code is
>> going to be nasty.  The heuristics of what to share are also tough --
>> why would text + data + stack or whatever you're doing be adequate?
>> What if you're in a thread?  What if two tasks have their stacks in
>> the same place?
>
> The different ASes that a task now can have when it uses first class virtual 
> address
> spaces are not realized in the kernel by using only one mm_struct per task 
> that just
> looks differently but by using multiple mm_structs - one for each AS that the 
> task
> can execute in. When a task attaches a first class virtual address space to 
> itself to
> be able to use another AS, the kernel adds a temporary mm_struct to this task 
> that
> contains the mappings of the first class virtual address space and the one 
> shared
> with the task's original AS. If a thread now wants to switch into this 
> attached first
> class virtual address space the kernel only changes the 'mm' and 'active_mm' 
> pointers
> in the task_struct of the thread to the temporary mm_struct and performs the
> corresponding mm_switch operation. The original mm_struct of the thread will 
> not be
> changed.
>
> Accordingly, I do not magically make mm_structs look differently depending on 
> the
> task that uses it, but create temporary mm_structs that only contain mappings 
> to the
> same memory regions.

This sounds complicated and fragile.  What happens if a heuristically
shared region coincides with a region in the "first class address
space" being selected?

I think the right solution is "you're a user program playing virtual
address games -- make sure you do it right".

--Andy

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Luck, Tony
On Wed, Mar 15, 2017 at 03:02:34PM -0700, Till Smejkal wrote:
> I don't agree here. VAS segments are basically in-memory files that are 
> handled by
> the kernel directly without using a file system. Hence, if an application 
> uses a VAS
> segment to store data the same rules apply as if it uses a file. Everything 
> that it
> saves in the VAS segment might be accessible by other applications. An 
> application
> using VAS segments should be aware of this fact. In addition, the resources 
> that are
> represented by a VAS segment are not leaked. As I said, VAS segments are much 
> like
> files. Hence, if you don't want to use them any more, delete them. But as 
> with files,
> the kernel will not delete them for you (although something like this can be 
> added).

So how do they differ from shmget(2), shmat(2), shmdt(2), shmctl(2)?

Apart from VAS having better names, instead of silly "key_t key" ones.

-Tony

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> > One advantage of VAS segments is that they can be globally queried by user 
> > programs
> > which means that VAS segments can be shared by applications that not 
> > necessarily have
> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
> > only work
> > if the tasks that share the memory region are related (aka. have a common 
> > parent that
> > initialized the shared mapping). Otherwise, the shared mapping have to be 
> > backed by a
> > file.
> 
> What's wrong with memfd_create()?
> 
> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one 
> > can keep
> > data structures in the NVM and still be able to share them between multiple 
> > tasks.
> 
> What's wrong with regular mmap?

I never wanted to say that there is something wrong with regular mmap. We just
figured that with VAS segments you could remove the need to mmap your shared 
data but
instead can keep everything purely in memory.

Unfortunately, I am not at full speed with memfds. Is my understanding correct 
that
if the last user of such a file descriptor closes it, the corresponding memory 
is
freed? Accordingly, memfd cannot be used to keep data in memory while no 
program is
currently using it, can it? To be able to do this you need again some 
representation
of the data in a file? Yes, you can use a tmpfs to keep the file content in 
memory as
well, or some DAX filesystem to keep the file content in NVM, but this always
requires that such filesystems are mounted in the system that the application is
currently running on. VAS segments on the other side would provide a 
functionality to
achieve the same without the need of any mounted filesystem. However, I agree, 
that
this is just a small advantage compared to what can already be achieved with the
existing functionality provided by the Linux kernel. I probably need to revisit 
the
whole idea of first class virtual address space segments before continuing with 
this
pacthset. Thank you very much for the great feedback.

> >> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> >> >> and not make it look magically different depending on which process
> >> >> maps it?  If you need a trampoline (which you do, of course), just
> >> >> write a trampoline in regular user code and map it manually.
> >> >
> >> > Did I understand you correctly that you are proposing that the switching 
> >> > thread
> >> > should make sure by itself that its code, stack, … memory regions are 
> >> > properly setup
> >> > in the new AS before/after switching into it? I think, this would make 
> >> > using first
> >> > class virtual address spaces much more difficult for user applications 
> >> > to the extend
> >> > that I am not even sure if they can be used at all. At the moment, 
> >> > switching into a
> >> > VAS is a very simple operation for an application because the kernel 
> >> > will just simply
> >> > do the right thing.
> >>
> >> Yes.  I think that having the same mm_struct look different from
> >> different tasks is problematic.  Getting it right in the arch code is
> >> going to be nasty.  The heuristics of what to share are also tough --
> >> why would text + data + stack or whatever you're doing be adequate?
> >> What if you're in a thread?  What if two tasks have their stacks in
> >> the same place?
> >
> > The different ASes that a task now can have when it uses first class 
> > virtual address
> > spaces are not realized in the kernel by using only one mm_struct per task 
> > that just
> > looks differently but by using multiple mm_structs - one for each AS that 
> > the task
> > can execute in. When a task attaches a first class virtual address space to 
> > itself to
> > be able to use another AS, the kernel adds a temporary mm_struct to this 
> > task that
> > contains the mappings of the first class virtual address space and the one 
> > shared
> > with the task's original AS. If a thread now wants to switch into this 
> > attached first
> > class virtual address space the kernel only changes the 'mm' and 
> > 'active_mm' pointers
> > in the task_struct of the thread to the temporary mm_struct and performs the
> > corresponding mm_switch operation. The original mm_struct of the thread 
> > will not be
> > changed.
> >
> > Accordingly, I do not magically make mm_structs look differently depending 
> > on the
> > task that uses it, but create temporary mm_structs that only contain 
> > mappings to the
> > same memory regions.
> 
> This sounds complicated and fragile.  What happens if a heuristically
> shared region coincides with a region in the "first class address
> space" being selected?

If such a conflict happens, the task cannot use the first class address space 
and the
corresponding system call will return an error. 

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Till Smejkal
On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> On Wed, Mar 15, 2017 at 12:44 PM, Till Smejkal
>  wrote:
> > On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> >> > One advantage of VAS segments is that they can be globally queried by 
> >> > user programs
> >> > which means that VAS segments can be shared by applications that not 
> >> > necessarily have
> >> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data 
> >> > will only work
> >> > if the tasks that share the memory region are related (aka. have a 
> >> > common parent that
> >> > initialized the shared mapping). Otherwise, the shared mapping have to 
> >> > be backed by a
> >> > file.
> >>
> >> What's wrong with memfd_create()?
> >>
> >> > VAS segments on the other side allow sharing of pure in memory data by
> >> > arbitrary related tasks without the need of a file. This becomes 
> >> > especially
> >> > interesting if one combines VAS segments with non-volatile memory since 
> >> > one can keep
> >> > data structures in the NVM and still be able to share them between 
> >> > multiple tasks.
> >>
> >> What's wrong with regular mmap?
> >
> > I never wanted to say that there is something wrong with regular mmap. We 
> > just
> > figured that with VAS segments you could remove the need to mmap your 
> > shared data but
> > instead can keep everything purely in memory.
> 
> memfd does that.

Yes, that's right. Thanks for giving me the pointer to this. I should have 
researched
more carefully before starting to work at VAS segments.

> > VAS segments on the other side would provide a functionality to
> > achieve the same without the need of any mounted filesystem. However, I 
> > agree, that
> > this is just a small advantage compared to what can already be achieved 
> > with the
> > existing functionality provided by the Linux kernel.
> 
> I see this "small advantage" as "resource leak and security problem".

I don't agree here. VAS segments are basically in-memory files that are handled 
by
the kernel directly without using a file system. Hence, if an application uses 
a VAS
segment to store data the same rules apply as if it uses a file. Everything 
that it
saves in the VAS segment might be accessible by other applications. An 
application
using VAS segments should be aware of this fact. In addition, the resources 
that are
represented by a VAS segment are not leaked. As I said, VAS segments are much 
like
files. Hence, if you don't want to use them any more, delete them. But as with 
files,
the kernel will not delete them for you (although something like this can be 
added).

> >> This sounds complicated and fragile.  What happens if a heuristically
> >> shared region coincides with a region in the "first class address
> >> space" being selected?
> >
> > If such a conflict happens, the task cannot use the first class address 
> > space and the
> > corresponding system call will return an error. However, with the current 
> > available
> > virtual address space size that programs can use, such conflicts are 
> > probably rare.
> 
> A bug that hits 1% of the time is often worse than one that hits 100%
> of the time because debugging it is miserable.

I don't agree that this is a bug at all. If there is a conflict in the memory 
layout
of the ASes the application simply cannot use this first class virtual address 
space.
Every application that wants to use first class virtual address spaces should 
check
for error return values and handle them.

This situation is similar to mapping a file at some special address in memory 
because
the file contains pointer based data structures and the application wants to use
them, but the kernel cannot map the file at this particular position in the
application's AS because there is already a different conflicting mapping. If an
application wants to do such things, it should also handle all the errors that 
can
occur.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-16 Thread Andy Lutomirski
On Wed, Mar 15, 2017 at 12:44 PM, Till Smejkal
 wrote:
> On Wed, 15 Mar 2017, Andy Lutomirski wrote:
>> > One advantage of VAS segments is that they can be globally queried by user 
>> > programs
>> > which means that VAS segments can be shared by applications that not 
>> > necessarily have
>> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data 
>> > will only work
>> > if the tasks that share the memory region are related (aka. have a common 
>> > parent that
>> > initialized the shared mapping). Otherwise, the shared mapping have to be 
>> > backed by a
>> > file.
>>
>> What's wrong with memfd_create()?
>>
>> > VAS segments on the other side allow sharing of pure in memory data by
>> > arbitrary related tasks without the need of a file. This becomes especially
>> > interesting if one combines VAS segments with non-volatile memory since 
>> > one can keep
>> > data structures in the NVM and still be able to share them between 
>> > multiple tasks.
>>
>> What's wrong with regular mmap?
>
> I never wanted to say that there is something wrong with regular mmap. We just
> figured that with VAS segments you could remove the need to mmap your shared 
> data but
> instead can keep everything purely in memory.

memfd does that.

>
> Unfortunately, I am not at full speed with memfds. Is my understanding 
> correct that
> if the last user of such a file descriptor closes it, the corresponding 
> memory is
> freed? Accordingly, memfd cannot be used to keep data in memory while no 
> program is
> currently using it, can it?

No, stop right here.  If you want to have a bunch of memory that
outlives the program that allocates it, use a filesystem (tmpfs,
hugetlbfs, ext4, whatever).  Don't create new persistent kernel
things.

> VAS segments on the other side would provide a functionality to
> achieve the same without the need of any mounted filesystem. However, I 
> agree, that
> this is just a small advantage compared to what can already be achieved with 
> the
> existing functionality provided by the Linux kernel.

I see this "small advantage" as "resource leak and security problem".

>> This sounds complicated and fragile.  What happens if a heuristically
>> shared region coincides with a region in the "first class address
>> space" being selected?
>
> If such a conflict happens, the task cannot use the first class address space 
> and the
> corresponding system call will return an error. However, with the current 
> available
> virtual address space size that programs can use, such conflicts are probably 
> rare.

A bug that hits 1% of the time is often worse than one that hits 100%
of the time because debugging it is miserable.

--Andy

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Chris Metcalf

On 3/14/2017 12:12 PM, Till Smejkal wrote:

On Mon, 13 Mar 2017, Andy Lutomirski wrote:

On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
 wrote:

On Mon, 13 Mar 2017, Andy Lutomirski wrote:

This sounds rather complicated.  Getting TLB flushing right seems
tricky.  Why not just map the same thing into multiple mms?

This is exactly what happens at the end. The memory region that is described by 
the
VAS segment will be mapped in the ASes that use the segment.

So why is this kernel feature better than just doing MAP_SHARED
manually in userspace?

One advantage of VAS segments is that they can be globally queried by user 
programs
which means that VAS segments can be shared by applications that not 
necessarily have
to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
only work
if the tasks that share the memory region are related (aka. have a common 
parent that
initialized the shared mapping). Otherwise, the shared mapping have to be 
backed by a
file.


True, but why is this bad?  The shared mapping will be memory resident
regardless, even if backed by a file (unless swapped out under heavy
memory pressure, but arguably that's a feature anyway).  More importantly,
having a file name is a simple and consistent way of identifying such
shared memory segments.

With a little work, you can also arrange to map such files into memory
at a fixed address in all participating processes, thus making internal
pointers work correctly.


VAS segments on the other side allow sharing of pure in memory data by
arbitrary related tasks without the need of a file. This becomes especially
interesting if one combines VAS segments with non-volatile memory since one can 
keep
data structures in the NVM and still be able to share them between multiple 
tasks.


I am not fully up to speed on NV/pmem stuff, but isn't that exactly what
the DAX mode is supposed to allow you to do?  If so, isn't sharing a
mapped file on a DAX filesystem on top of pmem equivalent to what
you're proposing?

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Till Smejkal
On Tue, 14 Mar 2017, Chris Metcalf wrote:
> On 3/14/2017 12:12 PM, Till Smejkal wrote:
> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> > > On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
> > >  wrote:
> > > > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> > > > > This sounds rather complicated.  Getting TLB flushing right seems
> > > > > tricky.  Why not just map the same thing into multiple mms?
> > > > This is exactly what happens at the end. The memory region that is 
> > > > described by the
> > > > VAS segment will be mapped in the ASes that use the segment.
> > > So why is this kernel feature better than just doing MAP_SHARED
> > > manually in userspace?
> > One advantage of VAS segments is that they can be globally queried by user 
> > programs
> > which means that VAS segments can be shared by applications that not 
> > necessarily have
> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
> > only work
> > if the tasks that share the memory region are related (aka. have a common 
> > parent that
> > initialized the shared mapping). Otherwise, the shared mapping have to be 
> > backed by a
> > file.
> 
> True, but why is this bad?  The shared mapping will be memory resident
> regardless, even if backed by a file (unless swapped out under heavy
> memory pressure, but arguably that's a feature anyway).  More importantly,
> having a file name is a simple and consistent way of identifying such
> shared memory segments.
> 
> With a little work, you can also arrange to map such files into memory
> at a fixed address in all participating processes, thus making internal
> pointers work correctly.

I don't want to say that the interface provided by MAP_SHARED is bad. I am only
arguing that VAS segments and the interface that they provide have an advantage 
over
the existing ones in my opinion. However, Matthew Wilcox also suggested in some
earlier mail that VAS segments could be exported to user space via a special 
purpose
filesystem. This would enable users of VAS segments to also just use some 
special
files to setup the shared memory regions. But since the VAS segment itself 
already
knows where at has to be mapped in the virtual address space of the process, the
establishing of the shared memory region would be very easy for the user.

> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one 
> > can keep
> > data structures in the NVM and still be able to share them between multiple 
> > tasks.
> 
> I am not fully up to speed on NV/pmem stuff, but isn't that exactly what
> the DAX mode is supposed to allow you to do?  If so, isn't sharing a
> mapped file on a DAX filesystem on top of pmem equivalent to what
> you're proposing?

If I read the documentation to DAX filesystems correctly, it is indeed possible 
to us
them to create files that life purely in NVM. I wasn't fully aware of this 
feature.
Thanks for the pointer.

However, the main contribution of this patchset is actually the idea of first 
class
virtual address spaces and that they can be used to allow processes to have 
multiple
different views on the system's main memory. For us, VAS segments were another 
logic
step in the same direction (from first class virtual address spaces to first 
class
address space segments). However, if there is already functionality in the Linux
kernel to achieve the exact same behavior, there is no real need to add VAS 
segments.
I will continue thinking about them and either find a different situation where 
the
currently available interface is not sufficient/too complicated or drop VAS 
segments
from future version of the patch set.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Till Smejkal
On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
>  wrote:
> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> >> This sounds rather complicated.  Getting TLB flushing right seems
> >> tricky.  Why not just map the same thing into multiple mms?
> >
> > This is exactly what happens at the end. The memory region that is 
> > described by the
> > VAS segment will be mapped in the ASes that use the segment.
> 
> So why is this kernel feature better than just doing MAP_SHARED
> manually in userspace?

One advantage of VAS segments is that they can be globally queried by user 
programs
which means that VAS segments can be shared by applications that not 
necessarily have
to be related. If I am not mistaken, MAP_SHARED of pure in memory data will 
only work
if the tasks that share the memory region are related (aka. have a common 
parent that
initialized the shared mapping). Otherwise, the shared mapping have to be 
backed by a
file. VAS segments on the other side allow sharing of pure in memory data by
arbitrary related tasks without the need of a file. This becomes especially
interesting if one combines VAS segments with non-volatile memory since one can 
keep
data structures in the NVM and still be able to share them between multiple 
tasks.

> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> >> and not make it look magically different depending on which process
> >> maps it?  If you need a trampoline (which you do, of course), just
> >> write a trampoline in regular user code and map it manually.
> >
> > Did I understand you correctly that you are proposing that the switching 
> > thread
> > should make sure by itself that its code, stack, … memory regions are 
> > properly setup
> > in the new AS before/after switching into it? I think, this would make 
> > using first
> > class virtual address spaces much more difficult for user applications to 
> > the extend
> > that I am not even sure if they can be used at all. At the moment, 
> > switching into a
> > VAS is a very simple operation for an application because the kernel will 
> > just simply
> > do the right thing.
> 
> Yes.  I think that having the same mm_struct look different from
> different tasks is problematic.  Getting it right in the arch code is
> going to be nasty.  The heuristics of what to share are also tough --
> why would text + data + stack or whatever you're doing be adequate?
> What if you're in a thread?  What if two tasks have their stacks in
> the same place?

The different ASes that a task now can have when it uses first class virtual 
address
spaces are not realized in the kernel by using only one mm_struct per task that 
just
looks differently but by using multiple mm_structs - one for each AS that the 
task
can execute in. When a task attaches a first class virtual address space to 
itself to
be able to use another AS, the kernel adds a temporary mm_struct to this task 
that
contains the mappings of the first class virtual address space and the one 
shared
with the task's original AS. If a thread now wants to switch into this attached 
first
class virtual address space the kernel only changes the 'mm' and 'active_mm' 
pointers
in the task_struct of the thread to the temporary mm_struct and performs the
corresponding mm_switch operation. The original mm_struct of the thread will 
not be
changed.

Accordingly, I do not magically make mm_structs look differently depending on 
the
task that uses it, but create temporary mm_structs that only contain mappings 
to the
same memory regions.

I agree that finding a good heuristics of what to share is difficult. At the 
moment,
all memory regions that are available in the task's original AS will also be
available when a thread switches into an attached first class virtual address 
space
(aka. are shared). That means that VAS can mainly be used to extend the AS of a 
task
in the current state of the implementation. The reason why I implemented the 
sharing
in this way is that I didn't want to break shared libraries. If I only share
code+heap+stack, shared libraries would not work anymore after switching into a 
VAS.

> I could imagine something like a sigaltstack() mode that lets you set
> a signal up to also switch mm could be useful.

This is a very interesting idea. I will keep it in mind for future use cases of
multiple virtual address spaces per task.

Thanks
Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Till Smejkal
On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> On Mon, Mar 13, 2017 at 3:14 PM, Till Smejkal
>  wrote:
> > This patchset extends the kernel memory management subsystem with a new
> > type of address spaces (called VAS) which can be created and destroyed
> > independently of processes by a user in the system. During its lifetime
> > such a VAS can be attached to processes by the user which allows a process
> > to have multiple address spaces and thereby multiple, potentially
> > different, views on the system's main memory. During its execution the
> > threads belonging to the process are able to switch freely between the
> > different attached VAS and the process' original AS enabling them to
> > utilize the different available views on the memory.
> 
> Sounds like the old SKAS feature for UML.

I haven't heard of this feature before, but after shortly looking at the 
description
on the UML website it actually has some similarities with what I am proposing. 
But as
far as I can see this was not merged into the mainline kernel, was it? In 
addition, I
think that first class virtual address spaces goes even one step further by 
allowing
AS to live independently of processes.

> > In addition to the concept of first class virtual address spaces, this
> > patchset introduces yet another feature called VAS segments. VAS segments
> > are memory regions which have a fixed size and position in the virtual
> > address space and can be shared between multiple first class virtual
> > address spaces. Such shareable memory regions are especially useful for
> > in-memory pointer-based data structures or other pure in-memory data.
> 
> This sounds rather complicated.  Getting TLB flushing right seems
> tricky.  Why not just map the same thing into multiple mms?

This is exactly what happens at the end. The memory region that is described by 
the
VAS segment will be mapped in the ASes that use the segment.

> >
> > | VAS |  processes  |
> > -
> > switch  |   468ns |  1944ns |
> 
> The solution here is IMO to fix the scheduler.

IMHO it will be very difficult for the scheduler code to reach the same 
switching
time as the pure VAS switch because switching between VAS does not involve 
saving any
registers or FPU state and does not require selecting the next runnable task. 
VAS
switch is basically a system call that just changes the AS of the current thread
which makes it a very lightweight operation.

> Also, FWIW, I have patches (that need a little work) that will make
> switch_mm() wy faster on x86.

These patches will also improve the speed of the VAS switch operation. We are 
also
using the switch_mm function in the background to perform the actual hardware 
switch
between the two ASes. The main reason why the VAS switch is faster than the task
switch is that it just has to do fewer things.

> > At the current state of the development, first class virtual address spaces
> > have one limitation, that we haven't been able to solve so far. The feature
> > allows, that different threads of the same process can execute in different
> > AS at the same time. This is possible, because the VAS-switch operation
> > only changes the active mm_struct for the task_struct of the calling
> > thread. However, when a thread switches into a first class virtual address
> > space, some parts of its original AS are duplicated into the new one to
> > allow the thread to continue its execution at its current state.
> 
> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> and not make it look magically different depending on which process
> maps it?  If you need a trampoline (which you do, of course), just
> write a trampoline in regular user code and map it manually.

Did I understand you correctly that you are proposing that the switching thread
should make sure by itself that its code, stack, … memory regions are properly 
setup
in the new AS before/after switching into it? I think, this would make using 
first
class virtual address spaces much more difficult for user applications to the 
extend
that I am not even sure if they can be used at all. At the moment, switching 
into a
VAS is a very simple operation for an application because the kernel will just 
simply
do the right thing.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-14 Thread Andy Lutomirski
On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
 wrote:
> On Mon, 13 Mar 2017, Andy Lutomirski wrote:
>> This sounds rather complicated.  Getting TLB flushing right seems
>> tricky.  Why not just map the same thing into multiple mms?
>
> This is exactly what happens at the end. The memory region that is described 
> by the
> VAS segment will be mapped in the ASes that use the segment.

So why is this kernel feature better than just doing MAP_SHARED
manually in userspace?


>> Ick.  Please don't do this.  Can we please keep an mm as just an mm
>> and not make it look magically different depending on which process
>> maps it?  If you need a trampoline (which you do, of course), just
>> write a trampoline in regular user code and map it manually.
>
> Did I understand you correctly that you are proposing that the switching 
> thread
> should make sure by itself that its code, stack, … memory regions are 
> properly setup
> in the new AS before/after switching into it? I think, this would make using 
> first
> class virtual address spaces much more difficult for user applications to the 
> extend
> that I am not even sure if they can be used at all. At the moment, switching 
> into a
> VAS is a very simple operation for an application because the kernel will 
> just simply
> do the right thing.

Yes.  I think that having the same mm_struct look different from
different tasks is problematic.  Getting it right in the arch code is
going to be nasty.  The heuristics of what to share are also tough --
why would text + data + stack or whatever you're doing be adequate?
What if you're in a thread?  What if two tasks have their stacks in
the same place?

I could imagine something like a sigaltstack() mode that lets you set
a signal up to also switch mm could be useful.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-13 Thread Till Smejkal
On Tue, 14 Mar 2017, Richard Henderson wrote:
> On 03/14/2017 10:39 AM, Till Smejkal wrote:
> > > Is this an indication that full virtual address spaces are useless?  It
> > > would seem like if you only use virtual address segments then you avoid 
> > > all
> > > of the problems with executing code, active stacks, and brk.
> > 
> > What do you mean with *virtual address segments*? The nice part of first 
> > class
> > virtual address spaces is that one can share/reuse collections of address 
> > space
> > segments easily.
> 
> What do *I* mean?  You introduced the term, didn't you?
> Rereading your original I see you called them "VAS segments".

Oh, I am sorry. I thought that you were referring to some other feature that I 
don't
know.

> Anyway, whatever they are called, it would seem that these segments do not
> require any of the syncing mechanisms that are causing you problems.

Yes, VAS segments provide a possibility to share memory regions between multiple
address spaces without the need to synchronize heap, stack, etc. Unfortunately, 
the
VAS segment feature itself without the whole concept of first class virtual 
address
spaces is not as powerful. With some additional work it can probably be 
represented
with the existing shmem functionality.

The first class virtual address space feature on the other side provides a real
benefit for applications in our opinion namely that an application can switch 
between
different views on its memory which enables various interesting programming 
paradigms
as mentioned in the cover letter.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-13 Thread Richard Henderson

On 03/14/2017 10:39 AM, Till Smejkal wrote:

Is this an indication that full virtual address spaces are useless?  It
would seem like if you only use virtual address segments then you avoid all
of the problems with executing code, active stacks, and brk.


What do you mean with *virtual address segments*? The nice part of first class
virtual address spaces is that one can share/reuse collections of address space
segments easily.


What do *I* mean?  You introduced the term, didn't you?
Rereading your original I see you called them "VAS segments".

Anyway, whatever they are called, it would seem that these segments do not 
require any of the syncing mechanisms that are causing you problems.



r~

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-13 Thread Till Smejkal
On Tue, 14 Mar 2017, Richard Henderson wrote:
> On 03/14/2017 08:14 AM, Till Smejkal wrote:
> > At the current state of the development, first class virtual address spaces
> > have one limitation, that we haven't been able to solve so far. The feature
> > allows, that different threads of the same process can execute in different
> > AS at the same time. This is possible, because the VAS-switch operation
> > only changes the active mm_struct for the task_struct of the calling
> > thread. However, when a thread switches into a first class virtual address
> > space, some parts of its original AS are duplicated into the new one to
> > allow the thread to continue its execution at its current state.
> > Accordingly, parts of the processes AS (e.g. the code section, data
> > section, heap section and stack sections) exist in multiple AS if the
> > process has a VAS attached to it. Changes to these shared memory regions
> > are synchronized between the address spaces whenever a thread switches
> > between two of them. Unfortunately, in some scenarios the kernel is not
> > able to properly synchronize all these shared memory regions because of
> > conflicting changes. One such example happens if there are two threads, one
> > executing in an attached first class virtual address space, the other in
> > the tasks original address space. If both threads make changes to the heap
> > section that cause expansion of the underlying vm_area_struct, the kernel
> > cannot correctly synchronize these changes, because that would cause parts
> > of the virtual address space to be overwritten with unrelated data. In the
> > current implementation such conflicts are only detected but not resolved
> > and result in an error code being returned by the kernel during the VAS
> > switch operation. Unfortunately, that means for the particular thread that
> > tried to make the switch, that it cannot do this anymore in the future and
> > accordingly has to be killed.
> 
> This sounds like a fairly fundamental problem to me.

Yes I agree. This is a significant limitation of first class virtual address 
spaces.
However, conflict like this can be mitigated by being careful in the application
that uses multiple first class virtual address spaces. If all threads make sure 
that
they never resize shared memory regions when executing inside a VAS such 
conflicts do
not occur. Another possibility that I investigated but not yet finished is that 
such
resizes of shared memory regions have to be synchronized more frequently than 
just at
every switch between VASes. If one for example "forward" memory region resizes 
to all
AS that share this particular memory region during the resize operation, one can
completely eliminate this problem. Unfortunately, this introduces a significant 
cost
and introduces a difficult to handle race condition.

> Is this an indication that full virtual address spaces are useless?  It
> would seem like if you only use virtual address segments then you avoid all
> of the problems with executing code, active stacks, and brk.

What do you mean with *virtual address segments*? The nice part of first class
virtual address spaces is that one can share/reuse collections of address space
segments easily.

Till

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-13 Thread Richard Henderson

On 03/14/2017 08:14 AM, Till Smejkal wrote:

At the current state of the development, first class virtual address spaces
have one limitation, that we haven't been able to solve so far. The feature
allows, that different threads of the same process can execute in different
AS at the same time. This is possible, because the VAS-switch operation
only changes the active mm_struct for the task_struct of the calling
thread. However, when a thread switches into a first class virtual address
space, some parts of its original AS are duplicated into the new one to
allow the thread to continue its execution at its current state.
Accordingly, parts of the processes AS (e.g. the code section, data
section, heap section and stack sections) exist in multiple AS if the
process has a VAS attached to it. Changes to these shared memory regions
are synchronized between the address spaces whenever a thread switches
between two of them. Unfortunately, in some scenarios the kernel is not
able to properly synchronize all these shared memory regions because of
conflicting changes. One such example happens if there are two threads, one
executing in an attached first class virtual address space, the other in
the tasks original address space. If both threads make changes to the heap
section that cause expansion of the underlying vm_area_struct, the kernel
cannot correctly synchronize these changes, because that would cause parts
of the virtual address space to be overwritten with unrelated data. In the
current implementation such conflicts are only detected but not resolved
and result in an error code being returned by the kernel during the VAS
switch operation. Unfortunately, that means for the particular thread that
tried to make the switch, that it cannot do this anymore in the future and
accordingly has to be killed.


This sounds like a fairly fundamental problem to me.

Is this an indication that full virtual address spaces are useless?  It would 
seem like if you only use virtual address segments then you avoid all of the 
problems with executing code, active stacks, and brk.



r~

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-13 Thread Andy Lutomirski
On Mon, Mar 13, 2017 at 3:14 PM, Till Smejkal
 wrote:
> This patchset extends the kernel memory management subsystem with a new
> type of address spaces (called VAS) which can be created and destroyed
> independently of processes by a user in the system. During its lifetime
> such a VAS can be attached to processes by the user which allows a process
> to have multiple address spaces and thereby multiple, potentially
> different, views on the system's main memory. During its execution the
> threads belonging to the process are able to switch freely between the
> different attached VAS and the process' original AS enabling them to
> utilize the different available views on the memory.

Sounds like the old SKAS feature for UML.

> In addition to the concept of first class virtual address spaces, this
> patchset introduces yet another feature called VAS segments. VAS segments
> are memory regions which have a fixed size and position in the virtual
> address space and can be shared between multiple first class virtual
> address spaces. Such shareable memory regions are especially useful for
> in-memory pointer-based data structures or other pure in-memory data.

This sounds rather complicated.  Getting TLB flushing right seems
tricky.  Why not just map the same thing into multiple mms?

>
> | VAS |  processes  |
> -
> switch  |   468ns |  1944ns |

The solution here is IMO to fix the scheduler.

Also, FWIW, I have patches (that need a little work) that will make
switch_mm() wy faster on x86.

> At the current state of the development, first class virtual address spaces
> have one limitation, that we haven't been able to solve so far. The feature
> allows, that different threads of the same process can execute in different
> AS at the same time. This is possible, because the VAS-switch operation
> only changes the active mm_struct for the task_struct of the calling
> thread. However, when a thread switches into a first class virtual address
> space, some parts of its original AS are duplicated into the new one to
> allow the thread to continue its execution at its current state.

Ick.  Please don't do this.  Can we please keep an mm as just an mm
and not make it look magically different depending on which process
maps it?  If you need a trampoline (which you do, of course), just
write a trampoline in regular user code and map it manually.

--Andy

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 00/13] Introduce first class virtual address spaces

2017-03-13 Thread Till Smejkal
First class virtual address spaces (also called VAS) are a new functionality of
the Linux kernel allowing address spaces to exist independently of processes.
The general idea behind this feature is described in a paper at ASPLOS16 with
the title 'SpaceJMP: Programming with Multiple Virtual Address Spaces' [1].

This patchset extends the kernel memory management subsystem with a new
type of address spaces (called VAS) which can be created and destroyed
independently of processes by a user in the system. During its lifetime
such a VAS can be attached to processes by the user which allows a process
to have multiple address spaces and thereby multiple, potentially
different, views on the system's main memory. During its execution the
threads belonging to the process are able to switch freely between the
different attached VAS and the process' original AS enabling them to
utilize the different available views on the memory. These multiple virtual
address spaces per process and the possibility to switch between them
freely can be used in multiple interesting ways as also outlined in the
mentioned paper. Some of the many possible applications are for example to
compartmentalize a process for security reasons, to improve the performance
of data-centric applications and to introduce new application models [1].

In addition to the concept of first class virtual address spaces, this
patchset introduces yet another feature called VAS segments. VAS segments
are memory regions which have a fixed size and position in the virtual
address space and can be shared between multiple first class virtual
address spaces. Such shareable memory regions are especially useful for
in-memory pointer-based data structures or other pure in-memory data.

First class virtual address spaces have a significant advantage compared to
forking a process and using inter process communication mechanism, namely
that creating and switching between VAS is significant faster than creating
and switching between processes. As it can be seen in the following table,
measured on an Intel Xeon E5620 CPU with 2.40GHz, creating a VAS is about 7
times faster than forking and switching between VAS is up to 4 times faster
than switching between processes.

| VAS |  processes  |
-
switch  |   468ns |  1944ns |
create  | 20003ns |150491ns |

Hence, first class virtual address spaces provide a fast mechanism for
applications to utilize multiple virtual address spaces in parallel with a
higher performance than splitting up the application into multiple
independent processes.

Both VAS and VAS segments have another significant advantage when combined
with non-volatile memory. Because of their independent life cycle from
processes and other kernel data structures, they can be used to save
special memory regions or even whole AS into non-volatile memory making it
possible to reuse them across multiple system reboots.

At the current state of the development, first class virtual address spaces
have one limitation, that we haven't been able to solve so far. The feature
allows, that different threads of the same process can execute in different
AS at the same time. This is possible, because the VAS-switch operation
only changes the active mm_struct for the task_struct of the calling
thread. However, when a thread switches into a first class virtual address
space, some parts of its original AS are duplicated into the new one to
allow the thread to continue its execution at its current state.
Accordingly, parts of the processes AS (e.g. the code section, data
section, heap section and stack sections) exist in multiple AS if the
process has a VAS attached to it. Changes to these shared memory regions
are synchronized between the address spaces whenever a thread switches
between two of them. Unfortunately, in some scenarios the kernel is not
able to properly synchronize all these shared memory regions because of
conflicting changes. One such example happens if there are two threads, one
executing in an attached first class virtual address space, the other in
the tasks original address space. If both threads make changes to the heap
section that cause expansion of the underlying vm_area_struct, the kernel
cannot correctly synchronize these changes, because that would cause parts
of the virtual address space to be overwritten with unrelated data. In the
current implementation such conflicts are only detected but not resolved
and result in an error code being returned by the kernel during the VAS
switch operation. Unfortunately, that means for the particular thread that
tried to make the switch, that it cannot do this anymore in the future and
accordingly has to be killed.

This code was developed during an internship at Hewlett Packard Enterprise.

[1] http://impact.crhc.illinois.edu/shared/Papers/ASPLOS16-SpaceJMP.pdf

Till Smejkal (13):
  mm: Add mm_struct argument to 'mmap_region'
  mm: Add