Re: Resurrecting the VM_PINNED discussion
On Thu, Mar 05, 2015 at 03:09:42PM -0600, Christoph Lameter wrote: > On Thu, 5 Mar 2015, Peter Zijlstra wrote: > > > > Am I missing something about why it was never merged? > > > > Because I got lost in IB code and didn't manage to bribe anyone into > > fixing that for me. > > Well the complexity increased since then with the on demand pinning, > mmu notifiers etc etc ... Clearly I've not been paying attention, what? Is this that drug induced stuff benh was babbling about a while back? > I thought the clear distinction between pinning and mlocking would do the > trick? Nah, it still leaves the accounting up shit creek. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Thu, 5 Mar 2015, Peter Zijlstra wrote: > > Am I missing something about why it was never merged? > > Because I got lost in IB code and didn't manage to bribe anyone into > fixing that for me. Well the complexity increased since then with the on demand pinning, mmu notifiers etc etc ... I thought the clear distinction between pinning and mlocking would do the trick? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, Mar 03, 2015 at 12:41:05PM -0500, Eric B Munson wrote: > All, > > After LSF/MM last year Peter revived a patch set that would create > infrastructure for pinning pages as opposed to simply locking them. > AFAICT, there was no objection to the set, it just needed some help > from the IB folks. > > Am I missing something about why it was never merged? Because I got lost in IB code and didn't manage to bribe anyone into fixing that for me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Thu, 5 Mar 2015, Peter Zijlstra wrote: Am I missing something about why it was never merged? Because I got lost in IB code and didn't manage to bribe anyone into fixing that for me. Well the complexity increased since then with the on demand pinning, mmu notifiers etc etc ... I thought the clear distinction between pinning and mlocking would do the trick? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Thu, Mar 05, 2015 at 03:09:42PM -0600, Christoph Lameter wrote: On Thu, 5 Mar 2015, Peter Zijlstra wrote: Am I missing something about why it was never merged? Because I got lost in IB code and didn't manage to bribe anyone into fixing that for me. Well the complexity increased since then with the on demand pinning, mmu notifiers etc etc ... Clearly I've not been paying attention, what? Is this that drug induced stuff benh was babbling about a while back? I thought the clear distinction between pinning and mlocking would do the trick? Nah, it still leaves the accounting up shit creek. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, Mar 03, 2015 at 12:41:05PM -0500, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? Because I got lost in IB code and didn't manage to bribe anyone into fixing that for me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Vlastimil Babka wrote: > > No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE > from > isolate_migratepages_range(), which is CMA, not regular compaction. > But I wonder, can we change this even after VM_PINNED is introduced, if > existing > code depends on "no minor faults in mlocked areas", whatever the docs say? On > the other hand, compaction is not the only source of migrations. I wonder what > the NUMA balancing does (not) about mlocked areas... My hope was that we could convince those that depend on mlock() preventing minor faults to move to use the mpin() interface that was discussed in the VM_PINNED thread. If that is not acceptable then we really need to update the man page for mlock() and the vm documentation to be very clear that minor faults are also prevented. Eric signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Vlastimil Babka wrote: snip No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE from isolate_migratepages_range(), which is CMA, not regular compaction. But I wonder, can we change this even after VM_PINNED is introduced, if existing code depends on no minor faults in mlocked areas, whatever the docs say? On the other hand, compaction is not the only source of migrations. I wonder what the NUMA balancing does (not) about mlocked areas... My hope was that we could convince those that depend on mlock() preventing minor faults to move to use the mpin() interface that was discussed in the VM_PINNED thread. If that is not acceptable then we really need to update the man page for mlock() and the vm documentation to be very clear that minor faults are also prevented. Eric signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On 03/03/2015 10:52 PM, Eric B Munson wrote: > On Tue, 03 Mar 2015, Eric B Munson wrote: > >> On Tue, 03 Mar 2015, Vlastimil Babka wrote: >> >> > On 03/03/2015 07:45 PM, Eric B Munson wrote: >> > > On Tue, 03 Mar 2015, Vlastimil Babka wrote: >> > > >> > > Agreed. But as has been discussed in the threads around the VM_PINNED >> > > work, there are people that are relying on the fact that VM_LOCKED >> > > promises no minor faults. Which is why the behavoir has remained. >> > >> > At least in the VM_PINNED thread after last lsf/mm, I don't see this >> > mentioned. >> > I found no references to mlocking in compaction.c, and in migrate.c >> > there's just >> > mlock_migrate_page() with comment: >> > >> > /* >> > * mlock_migrate_page - called only from migrate_page_copy() to >> > * migrate the Mlocked page flag; update statistics. >> > */ >> > >> > It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? >> > Where >> > is this restriction? >> > >> >> I spent quite some time looking for it as well, it is in vmscan.c >> >> int __isolate_lru_page(struct page *page, isolate_mode_t mode) >> { >> ... >> /* Compaction should not handle unevictable pages but CMA can do so >> */ >> if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) >> return ret; >> ... >> >> > > And that demonstrates that I haven't spent enough time with this code, > that isn't the restriction because when this is called from compaction.c > the mode is set to ISOLATE_UNEVICTABLE. So back to reading the code. No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE from isolate_migratepages_range(), which is CMA, not regular compaction. But I wonder, can we change this even after VM_PINNED is introduced, if existing code depends on "no minor faults in mlocked areas", whatever the docs say? On the other hand, compaction is not the only source of migrations. I wonder what the NUMA balancing does (not) about mlocked areas... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Eric B Munson wrote: > On Tue, 03 Mar 2015, Vlastimil Babka wrote: > > > On 03/03/2015 07:45 PM, Eric B Munson wrote: > > > On Tue, 03 Mar 2015, Vlastimil Babka wrote: > > > > > >> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All, > > >> > > > >> > After LSF/MM last year Peter revived a patch set that would create > > >> > infrastructure for pinning pages as opposed to simply locking them. > > >> > AFAICT, there was no objection to the set, it just needed some help > > >> > from the IB folks. > > >> > > > >> > Am I missing something about why it was never merged? I ask because > > >> > Akamai has bumped into the disconnect between the mlock manpage, > > >> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and > > >> > locking. A group working in userspace read those sources and wrote a > > >> > tool that mmaps many files read only and locked, munmapping them when > > >> > they are no longer needed. Locking is used because they cannot afford > > >> > a > > >> > major fault, but they are fine with minor faults. This tends to > > >> > fragment memory badly so when they started looking into using hugetlbfs > > >> > (or anything requiring order > 0 allocations) they found they were not > > >> > able to allocate the memory. They were confused based on the > > >> > referenced > > >> > documentation as to why compaction would continually fail to yield > > >> > appropriately sized contiguous areas when there was more than enough > > >> > free memory. > > >> > > >> So you are saying that mlocking (VM_LOCKED) prevents migration and thus > > >> compaction to do its job? If that's true, I think it's a bug as it is > > >> AFAIK > > >> supposed to work just fine. > > > > > > Agreed. But as has been discussed in the threads around the VM_PINNED > > > work, there are people that are relying on the fact that VM_LOCKED > > > promises no minor faults. Which is why the behavoir has remained. > > > > At least in the VM_PINNED thread after last lsf/mm, I don't see this > > mentioned. > > I found no references to mlocking in compaction.c, and in migrate.c there's > > just > > mlock_migrate_page() with comment: > > > > /* > > * mlock_migrate_page - called only from migrate_page_copy() to > > * migrate the Mlocked page flag; update statistics. > > */ > > > > It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? > > Where > > is this restriction? > > > > I spent quite some time looking for it as well, it is in vmscan.c > > int __isolate_lru_page(struct page *page, isolate_mode_t mode) > { > ... > /* Compaction should not handle unevictable pages but CMA can do so */ > if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) > return ret; > ... > > And that demonstrates that I haven't spent enough time with this code, that isn't the restriction because when this is called from compaction.c the mode is set to ISOLATE_UNEVICTABLE. So back to reading the code. Eric signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Vlastimil Babka wrote: > On 03/03/2015 07:45 PM, Eric B Munson wrote: > > On Tue, 03 Mar 2015, Vlastimil Babka wrote: > > > >> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All, > >> > > >> > After LSF/MM last year Peter revived a patch set that would create > >> > infrastructure for pinning pages as opposed to simply locking them. > >> > AFAICT, there was no objection to the set, it just needed some help > >> > from the IB folks. > >> > > >> > Am I missing something about why it was never merged? I ask because > >> > Akamai has bumped into the disconnect between the mlock manpage, > >> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and > >> > locking. A group working in userspace read those sources and wrote a > >> > tool that mmaps many files read only and locked, munmapping them when > >> > they are no longer needed. Locking is used because they cannot afford a > >> > major fault, but they are fine with minor faults. This tends to > >> > fragment memory badly so when they started looking into using hugetlbfs > >> > (or anything requiring order > 0 allocations) they found they were not > >> > able to allocate the memory. They were confused based on the referenced > >> > documentation as to why compaction would continually fail to yield > >> > appropriately sized contiguous areas when there was more than enough > >> > free memory. > >> > >> So you are saying that mlocking (VM_LOCKED) prevents migration and thus > >> compaction to do its job? If that's true, I think it's a bug as it is AFAIK > >> supposed to work just fine. > > > > Agreed. But as has been discussed in the threads around the VM_PINNED > > work, there are people that are relying on the fact that VM_LOCKED > > promises no minor faults. Which is why the behavoir has remained. > > At least in the VM_PINNED thread after last lsf/mm, I don't see this > mentioned. > I found no references to mlocking in compaction.c, and in migrate.c there's > just > mlock_migrate_page() with comment: > > /* > * mlock_migrate_page - called only from migrate_page_copy() to > * migrate the Mlocked page flag; update statistics. > */ > > It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where > is this restriction? > I spent quite some time looking for it as well, it is in vmscan.c int __isolate_lru_page(struct page *page, isolate_mode_t mode) { ... /* Compaction should not handle unevictable pages but CMA can do so */ if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) return ret; ... signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On 03/03/2015 07:45 PM, Eric B Munson wrote: > On Tue, 03 Mar 2015, Vlastimil Babka wrote: > >> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All, >> > >> > After LSF/MM last year Peter revived a patch set that would create >> > infrastructure for pinning pages as opposed to simply locking them. >> > AFAICT, there was no objection to the set, it just needed some help >> > from the IB folks. >> > >> > Am I missing something about why it was never merged? I ask because >> > Akamai has bumped into the disconnect between the mlock manpage, >> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and >> > locking. A group working in userspace read those sources and wrote a >> > tool that mmaps many files read only and locked, munmapping them when >> > they are no longer needed. Locking is used because they cannot afford a >> > major fault, but they are fine with minor faults. This tends to >> > fragment memory badly so when they started looking into using hugetlbfs >> > (or anything requiring order > 0 allocations) they found they were not >> > able to allocate the memory. They were confused based on the referenced >> > documentation as to why compaction would continually fail to yield >> > appropriately sized contiguous areas when there was more than enough >> > free memory. >> >> So you are saying that mlocking (VM_LOCKED) prevents migration and thus >> compaction to do its job? If that's true, I think it's a bug as it is AFAIK >> supposed to work just fine. > > Agreed. But as has been discussed in the threads around the VM_PINNED > work, there are people that are relying on the fact that VM_LOCKED > promises no minor faults. Which is why the behavoir has remained. At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned. I found no references to mlocking in compaction.c, and in migrate.c there's just mlock_migrate_page() with comment: /* * mlock_migrate_page - called only from migrate_page_copy() to * migrate the Mlocked page flag; update statistics. */ It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where is this restriction? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 3 Mar 2015, Vlastimil Babka wrote: > It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where > is this restriction? Its in the defrag code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 3 Mar 2015, Eric B Munson wrote: > > So you are saying that mlocking (VM_LOCKED) prevents migration and thus > > compaction to do its job? If that's true, I think it's a bug as it is AFAIK > > supposed to work just fine. > > Agreed. But as has been discussed in the threads around the VM_PINNED > work, there are people that are relying on the fact that VM_LOCKED > promises no minor faults. Which is why the behavoir has remained. AFAICT mlocking preventing migration is something that could be taken out. Google removes the restriction. mlocked does not promise no minor faults only that the page will stay resident. The pinning results in no faults. > VM_PINNED itself doesn't help us, but it would allow us to make > VM_LOCKED use only the weaker 'no major fault' semantics while still > providing a way for anyone that needs the stronger 'no minor fault' > promise to get the semantics they need. The semantics for mlock allow migration and therefore defrag as well as thp processing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 2015-03-03 at 19:35 +0100, Vlastimil Babka wrote: > On 03/03/2015 06:41 PM, Eric B Munson wrote:> All, > > > > After LSF/MM last year Peter revived a patch set that would create > > infrastructure for pinning pages as opposed to simply locking them. > > AFAICT, there was no objection to the set, it just needed some help > > from the IB folks. > > > > Am I missing something about why it was never merged? I ask because > > Akamai has bumped into the disconnect between the mlock manpage, > > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and > > locking. A group working in userspace read those sources and wrote a > > tool that mmaps many files read only and locked, munmapping them when > > they are no longer needed. Locking is used because they cannot afford a > > major fault, but they are fine with minor faults. This tends to > > fragment memory badly so when they started looking into using hugetlbfs > > (or anything requiring order > 0 allocations) they found they were not > > able to allocate the memory. They were confused based on the referenced > > documentation as to why compaction would continually fail to yield > > appropriately sized contiguous areas when there was more than enough > > free memory. > > So you are saying that mlocking (VM_LOCKED) prevents migration and thus > compaction to do its job? If that's true, I think it's a bug as it is AFAIK > supposed to work just fine. > > > I would like to see the situation with VM_LOCKED cleared up, ideally the > > documentation would remain and reality adjusted to match and I think > > Peter's VM_PINNED set goes in the right direction for this goal. What > > is missing and how can I help? > > I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves > accounting for the kind of locking (pinning) that *does* prevent page > migration > (unlike mlocking)... quoting the patchset cover letter: > > "These patches introduce VM_PINNED infrastructure, vma tracking of persistent > 'pinned' page ranges. Pinned is anything that has a fixed phys address (as > required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One > popular way to pin pages is through get_user_pages() but that not nessecarily > the only way." Yeah, this also makes it pretty clear: "" Firstly, various subsystems (perf, IB amongst others) 'pin' significant chunks of memory (through holding page refs or custom maps), because this memory is unevictable we must test this against RLIMIT_MEMLOCK. ... Thirdly, because VM_LOCKED does allow unmapping (and therefore page migration) the -rt people are not pleased and would very much like something stronger. "" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Vlastimil Babka wrote: > On 03/03/2015 06:41 PM, Eric B Munson wrote:> All, > > > > After LSF/MM last year Peter revived a patch set that would create > > infrastructure for pinning pages as opposed to simply locking them. > > AFAICT, there was no objection to the set, it just needed some help > > from the IB folks. > > > > Am I missing something about why it was never merged? I ask because > > Akamai has bumped into the disconnect between the mlock manpage, > > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and > > locking. A group working in userspace read those sources and wrote a > > tool that mmaps many files read only and locked, munmapping them when > > they are no longer needed. Locking is used because they cannot afford a > > major fault, but they are fine with minor faults. This tends to > > fragment memory badly so when they started looking into using hugetlbfs > > (or anything requiring order > 0 allocations) they found they were not > > able to allocate the memory. They were confused based on the referenced > > documentation as to why compaction would continually fail to yield > > appropriately sized contiguous areas when there was more than enough > > free memory. > > So you are saying that mlocking (VM_LOCKED) prevents migration and thus > compaction to do its job? If that's true, I think it's a bug as it is AFAIK > supposed to work just fine. Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. > > > I would like to see the situation with VM_LOCKED cleared up, ideally the > > documentation would remain and reality adjusted to match and I think > > Peter's VM_PINNED set goes in the right direction for this goal. What > > is missing and how can I help? > > I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves > accounting for the kind of locking (pinning) that *does* prevent page > migration > (unlike mlocking)... quoting the patchset cover letter: VM_PINNED itself doesn't help us, but it would allow us to make VM_LOCKED use only the weaker 'no major fault' semantics while still providing a way for anyone that needs the stronger 'no minor fault' promise to get the semantics they need. > > "These patches introduce VM_PINNED infrastructure, vma tracking of persistent > 'pinned' page ranges. Pinned is anything that has a fixed phys address (as > required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One > popular way to pin pages is through get_user_pages() but that not nessecarily > the only way." > > > Thanks, > > Eric > > > signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On 03/03/2015 06:41 PM, Eric B Munson wrote:> All, > > After LSF/MM last year Peter revived a patch set that would create > infrastructure for pinning pages as opposed to simply locking them. > AFAICT, there was no objection to the set, it just needed some help > from the IB folks. > > Am I missing something about why it was never merged? I ask because > Akamai has bumped into the disconnect between the mlock manpage, > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and > locking. A group working in userspace read those sources and wrote a > tool that mmaps many files read only and locked, munmapping them when > they are no longer needed. Locking is used because they cannot afford a > major fault, but they are fine with minor faults. This tends to > fragment memory badly so when they started looking into using hugetlbfs > (or anything requiring order > 0 allocations) they found they were not > able to allocate the memory. They were confused based on the referenced > documentation as to why compaction would continually fail to yield > appropriately sized contiguous areas when there was more than enough > free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. > I would like to see the situation with VM_LOCKED cleared up, ideally the > documentation would remain and reality adjusted to match and I think > Peter's VM_PINNED set goes in the right direction for this goal. What > is missing and how can I help? I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves accounting for the kind of locking (pinning) that *does* prevent page migration (unlike mlocking)... quoting the patchset cover letter: "These patches introduce VM_PINNED infrastructure, vma tracking of persistent 'pinned' page ranges. Pinned is anything that has a fixed phys address (as required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One popular way to pin pages is through get_user_pages() but that not nessecarily the only way." > Thanks, > Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 06:41 PM, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? I ask because Akamai has bumped into the disconnect between the mlock manpage, Documentation/vm/unevictable-lru.txt, and reality WRT compaction and locking. A group working in userspace read those sources and wrote a tool that mmaps many files read only and locked, munmapping them when they are no longer needed. Locking is used because they cannot afford a major fault, but they are fine with minor faults. This tends to fragment memory badly so when they started looking into using hugetlbfs (or anything requiring order 0 allocations) they found they were not able to allocate the memory. They were confused based on the referenced documentation as to why compaction would continually fail to yield appropriately sized contiguous areas when there was more than enough free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. I would like to see the situation with VM_LOCKED cleared up, ideally the documentation would remain and reality adjusted to match and I think Peter's VM_PINNED set goes in the right direction for this goal. What is missing and how can I help? I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves accounting for the kind of locking (pinning) that *does* prevent page migration (unlike mlocking)... quoting the patchset cover letter: VM_PINNED itself doesn't help us, but it would allow us to make VM_LOCKED use only the weaker 'no major fault' semantics while still providing a way for anyone that needs the stronger 'no minor fault' promise to get the semantics they need. These patches introduce VM_PINNED infrastructure, vma tracking of persistent 'pinned' page ranges. Pinned is anything that has a fixed phys address (as required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One popular way to pin pages is through get_user_pages() but that not nessecarily the only way. Thanks, Eric signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On Tue, 2015-03-03 at 19:35 +0100, Vlastimil Babka wrote: On 03/03/2015 06:41 PM, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? I ask because Akamai has bumped into the disconnect between the mlock manpage, Documentation/vm/unevictable-lru.txt, and reality WRT compaction and locking. A group working in userspace read those sources and wrote a tool that mmaps many files read only and locked, munmapping them when they are no longer needed. Locking is used because they cannot afford a major fault, but they are fine with minor faults. This tends to fragment memory badly so when they started looking into using hugetlbfs (or anything requiring order 0 allocations) they found they were not able to allocate the memory. They were confused based on the referenced documentation as to why compaction would continually fail to yield appropriately sized contiguous areas when there was more than enough free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. I would like to see the situation with VM_LOCKED cleared up, ideally the documentation would remain and reality adjusted to match and I think Peter's VM_PINNED set goes in the right direction for this goal. What is missing and how can I help? I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves accounting for the kind of locking (pinning) that *does* prevent page migration (unlike mlocking)... quoting the patchset cover letter: These patches introduce VM_PINNED infrastructure, vma tracking of persistent 'pinned' page ranges. Pinned is anything that has a fixed phys address (as required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One popular way to pin pages is through get_user_pages() but that not nessecarily the only way. Yeah, this also makes it pretty clear: Firstly, various subsystems (perf, IB amongst others) 'pin' significant chunks of memory (through holding page refs or custom maps), because this memory is unevictable we must test this against RLIMIT_MEMLOCK. ... Thirdly, because VM_LOCKED does allow unmapping (and therefore page migration) the -rt people are not pleased and would very much like something stronger. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 3 Mar 2015, Eric B Munson wrote: So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. AFAICT mlocking preventing migration is something that could be taken out. Google removes the restriction. mlocked does not promise no minor faults only that the page will stay resident. The pinning results in no faults. VM_PINNED itself doesn't help us, but it would allow us to make VM_LOCKED use only the weaker 'no major fault' semantics while still providing a way for anyone that needs the stronger 'no minor fault' promise to get the semantics they need. The semantics for mlock allow migration and therefore defrag as well as thp processing. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On 03/03/2015 06:41 PM, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? I ask because Akamai has bumped into the disconnect between the mlock manpage, Documentation/vm/unevictable-lru.txt, and reality WRT compaction and locking. A group working in userspace read those sources and wrote a tool that mmaps many files read only and locked, munmapping them when they are no longer needed. Locking is used because they cannot afford a major fault, but they are fine with minor faults. This tends to fragment memory badly so when they started looking into using hugetlbfs (or anything requiring order 0 allocations) they found they were not able to allocate the memory. They were confused based on the referenced documentation as to why compaction would continually fail to yield appropriately sized contiguous areas when there was more than enough free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. I would like to see the situation with VM_LOCKED cleared up, ideally the documentation would remain and reality adjusted to match and I think Peter's VM_PINNED set goes in the right direction for this goal. What is missing and how can I help? I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves accounting for the kind of locking (pinning) that *does* prevent page migration (unlike mlocking)... quoting the patchset cover letter: These patches introduce VM_PINNED infrastructure, vma tracking of persistent 'pinned' page ranges. Pinned is anything that has a fixed phys address (as required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One popular way to pin pages is through get_user_pages() but that not nessecarily the only way. Thanks, Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 07:45 PM, Eric B Munson wrote: On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 06:41 PM, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? I ask because Akamai has bumped into the disconnect between the mlock manpage, Documentation/vm/unevictable-lru.txt, and reality WRT compaction and locking. A group working in userspace read those sources and wrote a tool that mmaps many files read only and locked, munmapping them when they are no longer needed. Locking is used because they cannot afford a major fault, but they are fine with minor faults. This tends to fragment memory badly so when they started looking into using hugetlbfs (or anything requiring order 0 allocations) they found they were not able to allocate the memory. They were confused based on the referenced documentation as to why compaction would continually fail to yield appropriately sized contiguous areas when there was more than enough free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned. I found no references to mlocking in compaction.c, and in migrate.c there's just mlock_migrate_page() with comment: /* * mlock_migrate_page - called only from migrate_page_copy() to * migrate the Mlocked page flag; update statistics. */ It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where is this restriction? I spent quite some time looking for it as well, it is in vmscan.c int __isolate_lru_page(struct page *page, isolate_mode_t mode) { ... /* Compaction should not handle unevictable pages but CMA can do so */ if (PageUnevictable(page) !(mode ISOLATE_UNEVICTABLE)) return ret; ... signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On Tue, 3 Mar 2015, Vlastimil Babka wrote: It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where is this restriction? Its in the defrag code. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On 03/03/2015 07:45 PM, Eric B Munson wrote: On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 06:41 PM, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? I ask because Akamai has bumped into the disconnect between the mlock manpage, Documentation/vm/unevictable-lru.txt, and reality WRT compaction and locking. A group working in userspace read those sources and wrote a tool that mmaps many files read only and locked, munmapping them when they are no longer needed. Locking is used because they cannot afford a major fault, but they are fine with minor faults. This tends to fragment memory badly so when they started looking into using hugetlbfs (or anything requiring order 0 allocations) they found they were not able to allocate the memory. They were confused based on the referenced documentation as to why compaction would continually fail to yield appropriately sized contiguous areas when there was more than enough free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned. I found no references to mlocking in compaction.c, and in migrate.c there's just mlock_migrate_page() with comment: /* * mlock_migrate_page - called only from migrate_page_copy() to * migrate the Mlocked page flag; update statistics. */ It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where is this restriction? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Resurrecting the VM_PINNED discussion
On Tue, 03 Mar 2015, Eric B Munson wrote: On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 07:45 PM, Eric B Munson wrote: On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 06:41 PM, Eric B Munson wrote: All, After LSF/MM last year Peter revived a patch set that would create infrastructure for pinning pages as opposed to simply locking them. AFAICT, there was no objection to the set, it just needed some help from the IB folks. Am I missing something about why it was never merged? I ask because Akamai has bumped into the disconnect between the mlock manpage, Documentation/vm/unevictable-lru.txt, and reality WRT compaction and locking. A group working in userspace read those sources and wrote a tool that mmaps many files read only and locked, munmapping them when they are no longer needed. Locking is used because they cannot afford a major fault, but they are fine with minor faults. This tends to fragment memory badly so when they started looking into using hugetlbfs (or anything requiring order 0 allocations) they found they were not able to allocate the memory. They were confused based on the referenced documentation as to why compaction would continually fail to yield appropriately sized contiguous areas when there was more than enough free memory. So you are saying that mlocking (VM_LOCKED) prevents migration and thus compaction to do its job? If that's true, I think it's a bug as it is AFAIK supposed to work just fine. Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned. I found no references to mlocking in compaction.c, and in migrate.c there's just mlock_migrate_page() with comment: /* * mlock_migrate_page - called only from migrate_page_copy() to * migrate the Mlocked page flag; update statistics. */ It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where is this restriction? I spent quite some time looking for it as well, it is in vmscan.c int __isolate_lru_page(struct page *page, isolate_mode_t mode) { ... /* Compaction should not handle unevictable pages but CMA can do so */ if (PageUnevictable(page) !(mode ISOLATE_UNEVICTABLE)) return ret; ... And that demonstrates that I haven't spent enough time with this code, that isn't the restriction because when this is called from compaction.c the mode is set to ISOLATE_UNEVICTABLE. So back to reading the code. Eric signature.asc Description: Digital signature
Re: Resurrecting the VM_PINNED discussion
On 03/03/2015 10:52 PM, Eric B Munson wrote: On Tue, 03 Mar 2015, Eric B Munson wrote: On Tue, 03 Mar 2015, Vlastimil Babka wrote: On 03/03/2015 07:45 PM, Eric B Munson wrote: On Tue, 03 Mar 2015, Vlastimil Babka wrote: Agreed. But as has been discussed in the threads around the VM_PINNED work, there are people that are relying on the fact that VM_LOCKED promises no minor faults. Which is why the behavoir has remained. At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned. I found no references to mlocking in compaction.c, and in migrate.c there's just mlock_migrate_page() with comment: /* * mlock_migrate_page - called only from migrate_page_copy() to * migrate the Mlocked page flag; update statistics. */ It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where is this restriction? I spent quite some time looking for it as well, it is in vmscan.c int __isolate_lru_page(struct page *page, isolate_mode_t mode) { ... /* Compaction should not handle unevictable pages but CMA can do so */ if (PageUnevictable(page) !(mode ISOLATE_UNEVICTABLE)) return ret; ... And that demonstrates that I haven't spent enough time with this code, that isn't the restriction because when this is called from compaction.c the mode is set to ISOLATE_UNEVICTABLE. So back to reading the code. No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE from isolate_migratepages_range(), which is CMA, not regular compaction. But I wonder, can we change this even after VM_PINNED is introduced, if existing code depends on no minor faults in mlocked areas, whatever the docs say? On the other hand, compaction is not the only source of migrations. I wonder what the NUMA balancing does (not) about mlocked areas... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/