Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
On 3/14/2017 12:12 PM, Till Smejkal wrote: On Mon, 13 Mar 2017, Andy Lutomirski wrote: On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal wrote: On Mon, 13 Mar 2017, Andy Lutomirski wrote: This sounds rather complicated. Getting TLB flushing right seems tricky. Why not just map the same thing into multiple mms? This is exactly what happens at the end. The memory region that is described by the VAS segment will be mapped in the ASes that use the segment. So why is this kernel feature better than just doing MAP_SHARED manually in userspace? One advantage of VAS segments is that they can be globally queried by user programs which means that VAS segments can be shared by applications that not necessarily have to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work if the tasks that share the memory region are related (aka. have a common parent that initialized the shared mapping). Otherwise, the shared mapping have to be backed by a file. True, but why is this bad? The shared mapping will be memory resident regardless, even if backed by a file (unless swapped out under heavy memory pressure, but arguably that's a feature anyway). More importantly, having a file name is a simple and consistent way of identifying such shared memory segments. With a little work, you can also arrange to map such files into memory at a fixed address in all participating processes, thus making internal pointers work correctly. VAS segments on the other side allow sharing of pure in memory data by arbitrary related tasks without the need of a file. This becomes especially interesting if one combines VAS segments with non-volatile memory since one can keep data structures in the NVM and still be able to share them between multiple tasks. I am not fully up to speed on NV/pmem stuff, but isn't that exactly what the DAX mode is supposed to allow you to do? If so, isn't sharing a mapped file on a DAX filesystem on top of pmem equivalent to what you're proposing? -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [RFC PATCH 07/13] kernel/fork: Split and export 'mm_alloc' and 'mm_init'
On Tue, 14 Mar 2017, David Laight wrote: > From: Linuxppc-dev Till Smejkal > > Sent: 13 March 2017 22:14 > > The only way until now to create a new memory map was via the exported > > function 'mm_alloc'. Unfortunately, this function not only allocates a new > > memory map, but also completely initializes it. However, with the > > introduction of first class virtual address spaces, some initialization > > steps done in 'mm_alloc' are not applicable to the memory maps needed for > > this feature and hence would lead to errors in the kernel code. > > > > Instead of introducing a new function that can allocate and initialize > > memory maps for first class virtual address spaces and potentially > > duplicate some code, I decided to split the mm_alloc function as well as > > the 'mm_init' function that it uses. > > > > Now there are four functions exported instead of only one. The new > > 'mm_alloc' function only allocates a new mm_struct and zeros it out. If one > > want to have the old behavior of mm_alloc one can use the newly introduced > > function 'mm_alloc_and_setup' which not only allocates a new mm_struct but > > also fully initializes it. > ... > > That looks like bugs waiting to happen. > You need unchanged code to fail to compile. Thank you for this hint. I can give the new mm_alloc function a different name so that code that uses the *old* mm_alloc function will fail to compile. I just reused the old name when I wrote the code, because mm_alloc was only used in very few locations in the kernel (2 times in the whole kernel source) which made identifying and changing them very easy. I also don't think that there will be many users in the kernel for mm_alloc in the future because it is a relatively low level data structure. But if it is better to use a different name for the new function, I am very happy to change this. Till ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
On Tue, 14 Mar 2017, Chris Metcalf wrote: > On 3/14/2017 12:12 PM, Till Smejkal wrote: > > On Mon, 13 Mar 2017, Andy Lutomirski wrote: > > > On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal > > > wrote: > > > > On Mon, 13 Mar 2017, Andy Lutomirski wrote: > > > > > This sounds rather complicated. Getting TLB flushing right seems > > > > > tricky. Why not just map the same thing into multiple mms? > > > > This is exactly what happens at the end. The memory region that is > > > > described by the > > > > VAS segment will be mapped in the ASes that use the segment. > > > So why is this kernel feature better than just doing MAP_SHARED > > > manually in userspace? > > One advantage of VAS segments is that they can be globally queried by user > > programs > > which means that VAS segments can be shared by applications that not > > necessarily have > > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will > > only work > > if the tasks that share the memory region are related (aka. have a common > > parent that > > initialized the shared mapping). Otherwise, the shared mapping have to be > > backed by a > > file. > > True, but why is this bad? The shared mapping will be memory resident > regardless, even if backed by a file (unless swapped out under heavy > memory pressure, but arguably that's a feature anyway). More importantly, > having a file name is a simple and consistent way of identifying such > shared memory segments. > > With a little work, you can also arrange to map such files into memory > at a fixed address in all participating processes, thus making internal > pointers work correctly. I don't want to say that the interface provided by MAP_SHARED is bad. I am only arguing that VAS segments and the interface that they provide have an advantage over the existing ones in my opinion. However, Matthew Wilcox also suggested in some earlier mail that VAS segments could be exported to user space via a special purpose filesystem. This would enable users of VAS segments to also just use some special files to setup the shared memory regions. But since the VAS segment itself already knows where at has to be mapped in the virtual address space of the process, the establishing of the shared memory region would be very easy for the user. > > VAS segments on the other side allow sharing of pure in memory data by > > arbitrary related tasks without the need of a file. This becomes especially > > interesting if one combines VAS segments with non-volatile memory since one > > can keep > > data structures in the NVM and still be able to share them between multiple > > tasks. > > I am not fully up to speed on NV/pmem stuff, but isn't that exactly what > the DAX mode is supposed to allow you to do? If so, isn't sharing a > mapped file on a DAX filesystem on top of pmem equivalent to what > you're proposing? If I read the documentation to DAX filesystems correctly, it is indeed possible to us them to create files that life purely in NVM. I wasn't fully aware of this feature. Thanks for the pointer. However, the main contribution of this patchset is actually the idea of first class virtual address spaces and that they can be used to allow processes to have multiple different views on the system's main memory. For us, VAS segments were another logic step in the same direction (from first class virtual address spaces to first class address space segments). However, if there is already functionality in the Linux kernel to achieve the exact same behavior, there is no real need to add VAS segments. I will continue thinking about them and either find a different situation where the currently available interface is not sufficient/too complicated or drop VAS segments from future version of the patch set. Till ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
On Mon, 13 Mar 2017, Andy Lutomirski wrote: > On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal > wrote: > > On Mon, 13 Mar 2017, Andy Lutomirski wrote: > >> This sounds rather complicated. Getting TLB flushing right seems > >> tricky. Why not just map the same thing into multiple mms? > > > > This is exactly what happens at the end. The memory region that is > > described by the > > VAS segment will be mapped in the ASes that use the segment. > > So why is this kernel feature better than just doing MAP_SHARED > manually in userspace? One advantage of VAS segments is that they can be globally queried by user programs which means that VAS segments can be shared by applications that not necessarily have to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work if the tasks that share the memory region are related (aka. have a common parent that initialized the shared mapping). Otherwise, the shared mapping have to be backed by a file. VAS segments on the other side allow sharing of pure in memory data by arbitrary related tasks without the need of a file. This becomes especially interesting if one combines VAS segments with non-volatile memory since one can keep data structures in the NVM and still be able to share them between multiple tasks. > >> Ick. Please don't do this. Can we please keep an mm as just an mm > >> and not make it look magically different depending on which process > >> maps it? If you need a trampoline (which you do, of course), just > >> write a trampoline in regular user code and map it manually. > > > > Did I understand you correctly that you are proposing that the switching > > thread > > should make sure by itself that its code, stack, … memory regions are > > properly setup > > in the new AS before/after switching into it? I think, this would make > > using first > > class virtual address spaces much more difficult for user applications to > > the extend > > that I am not even sure if they can be used at all. At the moment, > > switching into a > > VAS is a very simple operation for an application because the kernel will > > just simply > > do the right thing. > > Yes. I think that having the same mm_struct look different from > different tasks is problematic. Getting it right in the arch code is > going to be nasty. The heuristics of what to share are also tough -- > why would text + data + stack or whatever you're doing be adequate? > What if you're in a thread? What if two tasks have their stacks in > the same place? The different ASes that a task now can have when it uses first class virtual address spaces are not realized in the kernel by using only one mm_struct per task that just looks differently but by using multiple mm_structs - one for each AS that the task can execute in. When a task attaches a first class virtual address space to itself to be able to use another AS, the kernel adds a temporary mm_struct to this task that contains the mappings of the first class virtual address space and the one shared with the task's original AS. If a thread now wants to switch into this attached first class virtual address space the kernel only changes the 'mm' and 'active_mm' pointers in the task_struct of the thread to the temporary mm_struct and performs the corresponding mm_switch operation. The original mm_struct of the thread will not be changed. Accordingly, I do not magically make mm_structs look differently depending on the task that uses it, but create temporary mm_structs that only contain mappings to the same memory regions. I agree that finding a good heuristics of what to share is difficult. At the moment, all memory regions that are available in the task's original AS will also be available when a thread switches into an attached first class virtual address space (aka. are shared). That means that VAS can mainly be used to extend the AS of a task in the current state of the implementation. The reason why I implemented the sharing in this way is that I didn't want to break shared libraries. If I only share code+heap+stack, shared libraries would not work anymore after switching into a VAS. > I could imagine something like a sigaltstack() mode that lets you set > a signal up to also switch mm could be useful. This is a very interesting idea. I will keep it in mind for future use cases of multiple virtual address spaces per task. Thanks Till ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
On Mon, 13 Mar 2017, Andy Lutomirski wrote: > On Mon, Mar 13, 2017 at 3:14 PM, Till Smejkal > wrote: > > This patchset extends the kernel memory management subsystem with a new > > type of address spaces (called VAS) which can be created and destroyed > > independently of processes by a user in the system. During its lifetime > > such a VAS can be attached to processes by the user which allows a process > > to have multiple address spaces and thereby multiple, potentially > > different, views on the system's main memory. During its execution the > > threads belonging to the process are able to switch freely between the > > different attached VAS and the process' original AS enabling them to > > utilize the different available views on the memory. > > Sounds like the old SKAS feature for UML. I haven't heard of this feature before, but after shortly looking at the description on the UML website it actually has some similarities with what I am proposing. But as far as I can see this was not merged into the mainline kernel, was it? In addition, I think that first class virtual address spaces goes even one step further by allowing AS to live independently of processes. > > In addition to the concept of first class virtual address spaces, this > > patchset introduces yet another feature called VAS segments. VAS segments > > are memory regions which have a fixed size and position in the virtual > > address space and can be shared between multiple first class virtual > > address spaces. Such shareable memory regions are especially useful for > > in-memory pointer-based data structures or other pure in-memory data. > > This sounds rather complicated. Getting TLB flushing right seems > tricky. Why not just map the same thing into multiple mms? This is exactly what happens at the end. The memory region that is described by the VAS segment will be mapped in the ASes that use the segment. > > > > | VAS | processes | > > - > > switch | 468ns | 1944ns | > > The solution here is IMO to fix the scheduler. IMHO it will be very difficult for the scheduler code to reach the same switching time as the pure VAS switch because switching between VAS does not involve saving any registers or FPU state and does not require selecting the next runnable task. VAS switch is basically a system call that just changes the AS of the current thread which makes it a very lightweight operation. > Also, FWIW, I have patches (that need a little work) that will make > switch_mm() wy faster on x86. These patches will also improve the speed of the VAS switch operation. We are also using the switch_mm function in the background to perform the actual hardware switch between the two ASes. The main reason why the VAS switch is faster than the task switch is that it just has to do fewer things. > > At the current state of the development, first class virtual address spaces > > have one limitation, that we haven't been able to solve so far. The feature > > allows, that different threads of the same process can execute in different > > AS at the same time. This is possible, because the VAS-switch operation > > only changes the active mm_struct for the task_struct of the calling > > thread. However, when a thread switches into a first class virtual address > > space, some parts of its original AS are duplicated into the new one to > > allow the thread to continue its execution at its current state. > > Ick. Please don't do this. Can we please keep an mm as just an mm > and not make it look magically different depending on which process > maps it? If you need a trampoline (which you do, of course), just > write a trampoline in regular user code and map it manually. Did I understand you correctly that you are proposing that the switching thread should make sure by itself that its code, stack, … memory regions are properly setup in the new AS before/after switching into it? I think, this would make using first class virtual address spaces much more difficult for user applications to the extend that I am not even sure if they can be used at all. At the moment, switching into a VAS is a very simple operation for an application because the kernel will just simply do the right thing. Till ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal wrote: > On Mon, 13 Mar 2017, Andy Lutomirski wrote: >> This sounds rather complicated. Getting TLB flushing right seems >> tricky. Why not just map the same thing into multiple mms? > > This is exactly what happens at the end. The memory region that is described > by the > VAS segment will be mapped in the ASes that use the segment. So why is this kernel feature better than just doing MAP_SHARED manually in userspace? >> Ick. Please don't do this. Can we please keep an mm as just an mm >> and not make it look magically different depending on which process >> maps it? If you need a trampoline (which you do, of course), just >> write a trampoline in regular user code and map it manually. > > Did I understand you correctly that you are proposing that the switching > thread > should make sure by itself that its code, stack, … memory regions are > properly setup > in the new AS before/after switching into it? I think, this would make using > first > class virtual address spaces much more difficult for user applications to the > extend > that I am not even sure if they can be used at all. At the moment, switching > into a > VAS is a very simple operation for an application because the kernel will > just simply > do the right thing. Yes. I think that having the same mm_struct look different from different tasks is problematic. Getting it right in the arch code is going to be nasty. The heuristics of what to share are also tough -- why would text + data + stack or whatever you're doing be adequate? What if you're in a thread? What if two tasks have their stacks in the same place? I could imagine something like a sigaltstack() mode that lets you set a signal up to also switch mm could be useful. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
RE: [RFC PATCH 07/13] kernel/fork: Split and export 'mm_alloc' and 'mm_init'
From: Linuxppc-dev Till Smejkal > Sent: 13 March 2017 22:14 > The only way until now to create a new memory map was via the exported > function 'mm_alloc'. Unfortunately, this function not only allocates a new > memory map, but also completely initializes it. However, with the > introduction of first class virtual address spaces, some initialization > steps done in 'mm_alloc' are not applicable to the memory maps needed for > this feature and hence would lead to errors in the kernel code. > > Instead of introducing a new function that can allocate and initialize > memory maps for first class virtual address spaces and potentially > duplicate some code, I decided to split the mm_alloc function as well as > the 'mm_init' function that it uses. > > Now there are four functions exported instead of only one. The new > 'mm_alloc' function only allocates a new mm_struct and zeros it out. If one > want to have the old behavior of mm_alloc one can use the newly introduced > function 'mm_alloc_and_setup' which not only allocates a new mm_struct but > also fully initializes it. ... That looks like bugs waiting to happen. You need unchanged code to fail to compile. David ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] clocksource: add missing line break to error messages
On 09/03/17 09:47, Rafał Miłecki wrote: From: Rafał Miłecki Printing with pr_* functions requires adding line break manually. Signed-off-by: Rafał Miłecki I've had a quick look over and there are no obvious errors. I wonder if the of_iomap() and related calls should print an error if they fail as all the examples here are of the form of: ptr = of_iomap(reource) if (!ptr) { pr_err("cannot remap resource\n"); ... return ERR; } Maybe we should look into this post this patch series. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc