Re: Resurrecting the VM_PINNED discussion

2015-03-05 Thread Peter Zijlstra
On Thu, Mar 05, 2015 at 03:09:42PM -0600, Christoph Lameter wrote:
> On Thu, 5 Mar 2015, Peter Zijlstra wrote:
> 
> > > Am I missing something about why it was never merged?
> >
> > Because I got lost in IB code and didn't manage to bribe anyone into
> > fixing that for me.
> 
> Well the complexity increased since then with the on demand pinning,
> mmu notifiers etc etc ...

Clearly I've not been paying attention, what? Is this that drug induced
stuff benh was babbling about a while back?

> I thought the clear distinction between pinning and mlocking would do the
> trick?

Nah, it still leaves the accounting up shit creek.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-05 Thread Christoph Lameter
On Thu, 5 Mar 2015, Peter Zijlstra wrote:

> > Am I missing something about why it was never merged?
>
> Because I got lost in IB code and didn't manage to bribe anyone into
> fixing that for me.

Well the complexity increased since then with the on demand pinning,
mmu notifiers etc etc ...

I thought the clear distinction between pinning and mlocking would do the
trick?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-05 Thread Peter Zijlstra
On Tue, Mar 03, 2015 at 12:41:05PM -0500, Eric B Munson wrote:
> All,
> 
> After LSF/MM last year Peter revived a patch set that would create
> infrastructure for pinning pages as opposed to simply locking them.
> AFAICT, there was no objection to the set, it just needed some help
> from the IB folks.
> 
> Am I missing something about why it was never merged? 

Because I got lost in IB code and didn't manage to bribe anyone into
fixing that for me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-05 Thread Christoph Lameter
On Thu, 5 Mar 2015, Peter Zijlstra wrote:

  Am I missing something about why it was never merged?

 Because I got lost in IB code and didn't manage to bribe anyone into
 fixing that for me.

Well the complexity increased since then with the on demand pinning,
mmu notifiers etc etc ...

I thought the clear distinction between pinning and mlocking would do the
trick?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-05 Thread Peter Zijlstra
On Thu, Mar 05, 2015 at 03:09:42PM -0600, Christoph Lameter wrote:
 On Thu, 5 Mar 2015, Peter Zijlstra wrote:
 
   Am I missing something about why it was never merged?
 
  Because I got lost in IB code and didn't manage to bribe anyone into
  fixing that for me.
 
 Well the complexity increased since then with the on demand pinning,
 mmu notifiers etc etc ...

Clearly I've not been paying attention, what? Is this that drug induced
stuff benh was babbling about a while back?

 I thought the clear distinction between pinning and mlocking would do the
 trick?

Nah, it still leaves the accounting up shit creek.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-05 Thread Peter Zijlstra
On Tue, Mar 03, 2015 at 12:41:05PM -0500, Eric B Munson wrote:
 All,
 
 After LSF/MM last year Peter revived a patch set that would create
 infrastructure for pinning pages as opposed to simply locking them.
 AFAICT, there was no objection to the set, it just needed some help
 from the IB folks.
 
 Am I missing something about why it was never merged? 

Because I got lost in IB code and didn't manage to bribe anyone into
fixing that for me.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-04 Thread Eric B Munson
On Tue, 03 Mar 2015, Vlastimil Babka wrote:


> 
> No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE 
> from
> isolate_migratepages_range(), which is CMA, not regular compaction.
> But I wonder, can we change this even after VM_PINNED is introduced, if 
> existing
> code depends on "no minor faults in mlocked areas", whatever the docs say? On
> the other hand, compaction is not the only source of migrations. I wonder what
> the NUMA balancing does (not) about mlocked areas...

My hope was that we could convince those that depend on mlock()
preventing minor faults to move to use the mpin() interface that was
discussed in the VM_PINNED thread.  If that is not acceptable then we
really need to update the man page for mlock() and the vm documentation
to be very clear that minor faults are also prevented.

Eric


signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-04 Thread Eric B Munson
On Tue, 03 Mar 2015, Vlastimil Babka wrote:

snip
 
 No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE 
 from
 isolate_migratepages_range(), which is CMA, not regular compaction.
 But I wonder, can we change this even after VM_PINNED is introduced, if 
 existing
 code depends on no minor faults in mlocked areas, whatever the docs say? On
 the other hand, compaction is not the only source of migrations. I wonder what
 the NUMA balancing does (not) about mlocked areas...

My hope was that we could convince those that depend on mlock()
preventing minor faults to move to use the mpin() interface that was
discussed in the VM_PINNED thread.  If that is not acceptable then we
really need to update the man page for mlock() and the vm documentation
to be very clear that minor faults are also prevented.

Eric


signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Vlastimil Babka
On 03/03/2015 10:52 PM, Eric B Munson wrote:
> On Tue, 03 Mar 2015, Eric B Munson wrote:
> 
>> On Tue, 03 Mar 2015, Vlastimil Babka wrote:
>> 
>> > On 03/03/2015 07:45 PM, Eric B Munson wrote:
>> > > On Tue, 03 Mar 2015, Vlastimil Babka wrote:
>> > > 
>> > > Agreed.  But as has been discussed in the threads around the VM_PINNED
>> > > work, there are people that are relying on the fact that VM_LOCKED
>> > > promises no minor faults.  Which is why the behavoir has remained.
>> > 
>> > At least in the VM_PINNED thread after last lsf/mm, I don't see this 
>> > mentioned.
>> > I found no references to mlocking in compaction.c, and in migrate.c 
>> > there's just
>> > mlock_migrate_page() with comment:
>> > 
>> > /*
>> >  * mlock_migrate_page - called only from migrate_page_copy() to
>> >  * migrate the Mlocked page flag; update statistics.
>> >  */
>> > 
>> > It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? 
>> > Where
>> > is this restriction?
>> > 
>> 
>> I spent quite some time looking for it as well, it is in vmscan.c
>> 
>> int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>> {
>> ...
>> /* Compaction should not handle unevictable pages but CMA can do so 
>> */
>> if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
>> return ret;
>> ...
>> 
>> 
> 
> And that demonstrates that I haven't spent enough time with this code,
> that isn't the restriction because when this is called from compaction.c
> the mode is set to ISOLATE_UNEVICTABLE.  So back to reading the code.

No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE from
isolate_migratepages_range(), which is CMA, not regular compaction.
But I wonder, can we change this even after VM_PINNED is introduced, if existing
code depends on "no minor faults in mlocked areas", whatever the docs say? On
the other hand, compaction is not the only source of migrations. I wonder what
the NUMA balancing does (not) about mlocked areas...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Eric B Munson
On Tue, 03 Mar 2015, Eric B Munson wrote:

> On Tue, 03 Mar 2015, Vlastimil Babka wrote:
> 
> > On 03/03/2015 07:45 PM, Eric B Munson wrote:
> > > On Tue, 03 Mar 2015, Vlastimil Babka wrote:
> > > 
> > >> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All,
> > >> >
> > >> > After LSF/MM last year Peter revived a patch set that would create
> > >> > infrastructure for pinning pages as opposed to simply locking them.
> > >> > AFAICT, there was no objection to the set, it just needed some help
> > >> > from the IB folks.
> > >> >
> > >> > Am I missing something about why it was never merged?  I ask because
> > >> > Akamai has bumped into the disconnect between the mlock manpage,
> > >> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
> > >> > locking.  A group working in userspace read those sources and wrote a
> > >> > tool that mmaps many files read only and locked, munmapping them when
> > >> > they are no longer needed.  Locking is used because they cannot afford 
> > >> > a
> > >> > major fault, but they are fine with minor faults.  This tends to
> > >> > fragment memory badly so when they started looking into using hugetlbfs
> > >> > (or anything requiring order > 0 allocations) they found they were not
> > >> > able to allocate the memory.  They were confused based on the 
> > >> > referenced
> > >> > documentation as to why compaction would continually fail to yield
> > >> > appropriately sized contiguous areas when there was more than enough
> > >> > free memory.
> > >> 
> > >> So you are saying that mlocking (VM_LOCKED) prevents migration and thus
> > >> compaction to do its job? If that's true, I think it's a bug as it is 
> > >> AFAIK
> > >> supposed to work just fine.
> > > 
> > > Agreed.  But as has been discussed in the threads around the VM_PINNED
> > > work, there are people that are relying on the fact that VM_LOCKED
> > > promises no minor faults.  Which is why the behavoir has remained.
> > 
> > At least in the VM_PINNED thread after last lsf/mm, I don't see this 
> > mentioned.
> > I found no references to mlocking in compaction.c, and in migrate.c there's 
> > just
> > mlock_migrate_page() with comment:
> > 
> > /*
> >  * mlock_migrate_page - called only from migrate_page_copy() to
> >  * migrate the Mlocked page flag; update statistics.
> >  */
> > 
> > It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? 
> > Where
> > is this restriction?
> > 
> 
> I spent quite some time looking for it as well, it is in vmscan.c
> 
> int __isolate_lru_page(struct page *page, isolate_mode_t mode)
> {
> ...
> /* Compaction should not handle unevictable pages but CMA can do so */
> if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
> return ret;
> ...
> 
> 

And that demonstrates that I haven't spent enough time with this code,
that isn't the restriction because when this is called from compaction.c
the mode is set to ISOLATE_UNEVICTABLE.  So back to reading the code.

Eric


signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Eric B Munson
On Tue, 03 Mar 2015, Vlastimil Babka wrote:

> On 03/03/2015 07:45 PM, Eric B Munson wrote:
> > On Tue, 03 Mar 2015, Vlastimil Babka wrote:
> > 
> >> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All,
> >> >
> >> > After LSF/MM last year Peter revived a patch set that would create
> >> > infrastructure for pinning pages as opposed to simply locking them.
> >> > AFAICT, there was no objection to the set, it just needed some help
> >> > from the IB folks.
> >> >
> >> > Am I missing something about why it was never merged?  I ask because
> >> > Akamai has bumped into the disconnect between the mlock manpage,
> >> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
> >> > locking.  A group working in userspace read those sources and wrote a
> >> > tool that mmaps many files read only and locked, munmapping them when
> >> > they are no longer needed.  Locking is used because they cannot afford a
> >> > major fault, but they are fine with minor faults.  This tends to
> >> > fragment memory badly so when they started looking into using hugetlbfs
> >> > (or anything requiring order > 0 allocations) they found they were not
> >> > able to allocate the memory.  They were confused based on the referenced
> >> > documentation as to why compaction would continually fail to yield
> >> > appropriately sized contiguous areas when there was more than enough
> >> > free memory.
> >> 
> >> So you are saying that mlocking (VM_LOCKED) prevents migration and thus
> >> compaction to do its job? If that's true, I think it's a bug as it is AFAIK
> >> supposed to work just fine.
> > 
> > Agreed.  But as has been discussed in the threads around the VM_PINNED
> > work, there are people that are relying on the fact that VM_LOCKED
> > promises no minor faults.  Which is why the behavoir has remained.
> 
> At least in the VM_PINNED thread after last lsf/mm, I don't see this 
> mentioned.
> I found no references to mlocking in compaction.c, and in migrate.c there's 
> just
> mlock_migrate_page() with comment:
> 
> /*
>  * mlock_migrate_page - called only from migrate_page_copy() to
>  * migrate the Mlocked page flag; update statistics.
>  */
> 
> It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where
> is this restriction?
> 

I spent quite some time looking for it as well, it is in vmscan.c

int __isolate_lru_page(struct page *page, isolate_mode_t mode)
{
...
/* Compaction should not handle unevictable pages but CMA can do so */
if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
return ret;
...




signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Vlastimil Babka
On 03/03/2015 07:45 PM, Eric B Munson wrote:
> On Tue, 03 Mar 2015, Vlastimil Babka wrote:
> 
>> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All,
>> >
>> > After LSF/MM last year Peter revived a patch set that would create
>> > infrastructure for pinning pages as opposed to simply locking them.
>> > AFAICT, there was no objection to the set, it just needed some help
>> > from the IB folks.
>> >
>> > Am I missing something about why it was never merged?  I ask because
>> > Akamai has bumped into the disconnect between the mlock manpage,
>> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
>> > locking.  A group working in userspace read those sources and wrote a
>> > tool that mmaps many files read only and locked, munmapping them when
>> > they are no longer needed.  Locking is used because they cannot afford a
>> > major fault, but they are fine with minor faults.  This tends to
>> > fragment memory badly so when they started looking into using hugetlbfs
>> > (or anything requiring order > 0 allocations) they found they were not
>> > able to allocate the memory.  They were confused based on the referenced
>> > documentation as to why compaction would continually fail to yield
>> > appropriately sized contiguous areas when there was more than enough
>> > free memory.
>> 
>> So you are saying that mlocking (VM_LOCKED) prevents migration and thus
>> compaction to do its job? If that's true, I think it's a bug as it is AFAIK
>> supposed to work just fine.
> 
> Agreed.  But as has been discussed in the threads around the VM_PINNED
> work, there are people that are relying on the fact that VM_LOCKED
> promises no minor faults.  Which is why the behavoir has remained.

At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned.
I found no references to mlocking in compaction.c, and in migrate.c there's just
mlock_migrate_page() with comment:

/*
 * mlock_migrate_page - called only from migrate_page_copy() to
 * migrate the Mlocked page flag; update statistics.
 */

It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where
is this restriction?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Christoph Lameter
On Tue, 3 Mar 2015, Vlastimil Babka wrote:

> It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where
> is this restriction?

Its in the defrag code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Christoph Lameter
On Tue, 3 Mar 2015, Eric B Munson wrote:

> > So you are saying that mlocking (VM_LOCKED) prevents migration and thus
> > compaction to do its job? If that's true, I think it's a bug as it is AFAIK
> > supposed to work just fine.
>
> Agreed.  But as has been discussed in the threads around the VM_PINNED
> work, there are people that are relying on the fact that VM_LOCKED
> promises no minor faults.  Which is why the behavoir has remained.

AFAICT mlocking preventing migration is something that could be taken out.
Google removes the restriction.

mlocked does not promise no minor faults only that the page will stay
resident. The pinning results in no faults.

> VM_PINNED itself doesn't help us, but it would allow us to make
> VM_LOCKED use only the weaker 'no major fault' semantics while still
> providing a way for anyone that needs the stronger 'no minor fault'
> promise to get the semantics they need.

The semantics for mlock allow migration and therefore defrag as well as
thp processing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Davidlohr Bueso
On Tue, 2015-03-03 at 19:35 +0100, Vlastimil Babka wrote:
> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All,
> >
> > After LSF/MM last year Peter revived a patch set that would create
> > infrastructure for pinning pages as opposed to simply locking them.
> > AFAICT, there was no objection to the set, it just needed some help
> > from the IB folks.
> >
> > Am I missing something about why it was never merged?  I ask because
> > Akamai has bumped into the disconnect between the mlock manpage,
> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
> > locking.  A group working in userspace read those sources and wrote a
> > tool that mmaps many files read only and locked, munmapping them when
> > they are no longer needed.  Locking is used because they cannot afford a
> > major fault, but they are fine with minor faults.  This tends to
> > fragment memory badly so when they started looking into using hugetlbfs
> > (or anything requiring order > 0 allocations) they found they were not
> > able to allocate the memory.  They were confused based on the referenced
> > documentation as to why compaction would continually fail to yield
> > appropriately sized contiguous areas when there was more than enough
> > free memory.
> 
> So you are saying that mlocking (VM_LOCKED) prevents migration and thus
> compaction to do its job? If that's true, I think it's a bug as it is AFAIK
> supposed to work just fine.
> 
> > I would like to see the situation with VM_LOCKED cleared up, ideally the
> > documentation would remain and reality adjusted to match and I think
> > Peter's VM_PINNED set goes in the right direction for this goal.  What
> > is missing and how can I help?
> 
> I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves
> accounting for the kind of locking (pinning) that *does* prevent page 
> migration
> (unlike mlocking)... quoting the patchset cover letter:
> 
> "These patches introduce VM_PINNED infrastructure, vma tracking of persistent
> 'pinned' page ranges. Pinned is anything that has a fixed phys address (as
> required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One
> popular way to pin pages is through get_user_pages() but that not nessecarily
> the only way."

Yeah, this also makes it pretty clear:

""
Firstly, various subsystems (perf, IB amongst others) 'pin'
significant chunks of memory (through holding page refs or custom
maps), because this memory is unevictable we must test this against
RLIMIT_MEMLOCK.

...

Thirdly, because VM_LOCKED does allow unmapping (and therefore page
migration) the -rt people are not pleased and would very much like
something stronger.
""

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Eric B Munson
On Tue, 03 Mar 2015, Vlastimil Babka wrote:

> On 03/03/2015 06:41 PM, Eric B Munson wrote:> All,
> >
> > After LSF/MM last year Peter revived a patch set that would create
> > infrastructure for pinning pages as opposed to simply locking them.
> > AFAICT, there was no objection to the set, it just needed some help
> > from the IB folks.
> >
> > Am I missing something about why it was never merged?  I ask because
> > Akamai has bumped into the disconnect between the mlock manpage,
> > Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
> > locking.  A group working in userspace read those sources and wrote a
> > tool that mmaps many files read only and locked, munmapping them when
> > they are no longer needed.  Locking is used because they cannot afford a
> > major fault, but they are fine with minor faults.  This tends to
> > fragment memory badly so when they started looking into using hugetlbfs
> > (or anything requiring order > 0 allocations) they found they were not
> > able to allocate the memory.  They were confused based on the referenced
> > documentation as to why compaction would continually fail to yield
> > appropriately sized contiguous areas when there was more than enough
> > free memory.
> 
> So you are saying that mlocking (VM_LOCKED) prevents migration and thus
> compaction to do its job? If that's true, I think it's a bug as it is AFAIK
> supposed to work just fine.

Agreed.  But as has been discussed in the threads around the VM_PINNED
work, there are people that are relying on the fact that VM_LOCKED
promises no minor faults.  Which is why the behavoir has remained.

> 
> > I would like to see the situation with VM_LOCKED cleared up, ideally the
> > documentation would remain and reality adjusted to match and I think
> > Peter's VM_PINNED set goes in the right direction for this goal.  What
> > is missing and how can I help?
> 
> I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves
> accounting for the kind of locking (pinning) that *does* prevent page 
> migration
> (unlike mlocking)... quoting the patchset cover letter:

VM_PINNED itself doesn't help us, but it would allow us to make
VM_LOCKED use only the weaker 'no major fault' semantics while still
providing a way for anyone that needs the stronger 'no minor fault'
promise to get the semantics they need.

> 
> "These patches introduce VM_PINNED infrastructure, vma tracking of persistent
> 'pinned' page ranges. Pinned is anything that has a fixed phys address (as
> required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One
> popular way to pin pages is through get_user_pages() but that not nessecarily
> the only way."
> 
> > Thanks,
> > Eric
> >
> 


signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Vlastimil Babka
On 03/03/2015 06:41 PM, Eric B Munson wrote:> All,
>
> After LSF/MM last year Peter revived a patch set that would create
> infrastructure for pinning pages as opposed to simply locking them.
> AFAICT, there was no objection to the set, it just needed some help
> from the IB folks.
>
> Am I missing something about why it was never merged?  I ask because
> Akamai has bumped into the disconnect between the mlock manpage,
> Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
> locking.  A group working in userspace read those sources and wrote a
> tool that mmaps many files read only and locked, munmapping them when
> they are no longer needed.  Locking is used because they cannot afford a
> major fault, but they are fine with minor faults.  This tends to
> fragment memory badly so when they started looking into using hugetlbfs
> (or anything requiring order > 0 allocations) they found they were not
> able to allocate the memory.  They were confused based on the referenced
> documentation as to why compaction would continually fail to yield
> appropriately sized contiguous areas when there was more than enough
> free memory.

So you are saying that mlocking (VM_LOCKED) prevents migration and thus
compaction to do its job? If that's true, I think it's a bug as it is AFAIK
supposed to work just fine.

> I would like to see the situation with VM_LOCKED cleared up, ideally the
> documentation would remain and reality adjusted to match and I think
> Peter's VM_PINNED set goes in the right direction for this goal.  What
> is missing and how can I help?

I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves
accounting for the kind of locking (pinning) that *does* prevent page migration
(unlike mlocking)... quoting the patchset cover letter:

"These patches introduce VM_PINNED infrastructure, vma tracking of persistent
'pinned' page ranges. Pinned is anything that has a fixed phys address (as
required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One
popular way to pin pages is through get_user_pages() but that not nessecarily
the only way."

> Thanks,
> Eric
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Eric B Munson
On Tue, 03 Mar 2015, Vlastimil Babka wrote:

 On 03/03/2015 06:41 PM, Eric B Munson wrote: All,
 
  After LSF/MM last year Peter revived a patch set that would create
  infrastructure for pinning pages as opposed to simply locking them.
  AFAICT, there was no objection to the set, it just needed some help
  from the IB folks.
 
  Am I missing something about why it was never merged?  I ask because
  Akamai has bumped into the disconnect between the mlock manpage,
  Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
  locking.  A group working in userspace read those sources and wrote a
  tool that mmaps many files read only and locked, munmapping them when
  they are no longer needed.  Locking is used because they cannot afford a
  major fault, but they are fine with minor faults.  This tends to
  fragment memory badly so when they started looking into using hugetlbfs
  (or anything requiring order  0 allocations) they found they were not
  able to allocate the memory.  They were confused based on the referenced
  documentation as to why compaction would continually fail to yield
  appropriately sized contiguous areas when there was more than enough
  free memory.
 
 So you are saying that mlocking (VM_LOCKED) prevents migration and thus
 compaction to do its job? If that's true, I think it's a bug as it is AFAIK
 supposed to work just fine.

Agreed.  But as has been discussed in the threads around the VM_PINNED
work, there are people that are relying on the fact that VM_LOCKED
promises no minor faults.  Which is why the behavoir has remained.

 
  I would like to see the situation with VM_LOCKED cleared up, ideally the
  documentation would remain and reality adjusted to match and I think
  Peter's VM_PINNED set goes in the right direction for this goal.  What
  is missing and how can I help?
 
 I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves
 accounting for the kind of locking (pinning) that *does* prevent page 
 migration
 (unlike mlocking)... quoting the patchset cover letter:

VM_PINNED itself doesn't help us, but it would allow us to make
VM_LOCKED use only the weaker 'no major fault' semantics while still
providing a way for anyone that needs the stronger 'no minor fault'
promise to get the semantics they need.

 
 These patches introduce VM_PINNED infrastructure, vma tracking of persistent
 'pinned' page ranges. Pinned is anything that has a fixed phys address (as
 required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One
 popular way to pin pages is through get_user_pages() but that not nessecarily
 the only way.
 
  Thanks,
  Eric
 
 


signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Davidlohr Bueso
On Tue, 2015-03-03 at 19:35 +0100, Vlastimil Babka wrote:
 On 03/03/2015 06:41 PM, Eric B Munson wrote: All,
 
  After LSF/MM last year Peter revived a patch set that would create
  infrastructure for pinning pages as opposed to simply locking them.
  AFAICT, there was no objection to the set, it just needed some help
  from the IB folks.
 
  Am I missing something about why it was never merged?  I ask because
  Akamai has bumped into the disconnect between the mlock manpage,
  Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
  locking.  A group working in userspace read those sources and wrote a
  tool that mmaps many files read only and locked, munmapping them when
  they are no longer needed.  Locking is used because they cannot afford a
  major fault, but they are fine with minor faults.  This tends to
  fragment memory badly so when they started looking into using hugetlbfs
  (or anything requiring order  0 allocations) they found they were not
  able to allocate the memory.  They were confused based on the referenced
  documentation as to why compaction would continually fail to yield
  appropriately sized contiguous areas when there was more than enough
  free memory.
 
 So you are saying that mlocking (VM_LOCKED) prevents migration and thus
 compaction to do its job? If that's true, I think it's a bug as it is AFAIK
 supposed to work just fine.
 
  I would like to see the situation with VM_LOCKED cleared up, ideally the
  documentation would remain and reality adjusted to match and I think
  Peter's VM_PINNED set goes in the right direction for this goal.  What
  is missing and how can I help?
 
 I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves
 accounting for the kind of locking (pinning) that *does* prevent page 
 migration
 (unlike mlocking)... quoting the patchset cover letter:
 
 These patches introduce VM_PINNED infrastructure, vma tracking of persistent
 'pinned' page ranges. Pinned is anything that has a fixed phys address (as
 required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One
 popular way to pin pages is through get_user_pages() but that not nessecarily
 the only way.

Yeah, this also makes it pretty clear:


Firstly, various subsystems (perf, IB amongst others) 'pin'
significant chunks of memory (through holding page refs or custom
maps), because this memory is unevictable we must test this against
RLIMIT_MEMLOCK.

...

Thirdly, because VM_LOCKED does allow unmapping (and therefore page
migration) the -rt people are not pleased and would very much like
something stronger.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Christoph Lameter
On Tue, 3 Mar 2015, Eric B Munson wrote:

  So you are saying that mlocking (VM_LOCKED) prevents migration and thus
  compaction to do its job? If that's true, I think it's a bug as it is AFAIK
  supposed to work just fine.

 Agreed.  But as has been discussed in the threads around the VM_PINNED
 work, there are people that are relying on the fact that VM_LOCKED
 promises no minor faults.  Which is why the behavoir has remained.

AFAICT mlocking preventing migration is something that could be taken out.
Google removes the restriction.

mlocked does not promise no minor faults only that the page will stay
resident. The pinning results in no faults.

 VM_PINNED itself doesn't help us, but it would allow us to make
 VM_LOCKED use only the weaker 'no major fault' semantics while still
 providing a way for anyone that needs the stronger 'no minor fault'
 promise to get the semantics they need.

The semantics for mlock allow migration and therefore defrag as well as
thp processing.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Vlastimil Babka
On 03/03/2015 06:41 PM, Eric B Munson wrote: All,

 After LSF/MM last year Peter revived a patch set that would create
 infrastructure for pinning pages as opposed to simply locking them.
 AFAICT, there was no objection to the set, it just needed some help
 from the IB folks.

 Am I missing something about why it was never merged?  I ask because
 Akamai has bumped into the disconnect between the mlock manpage,
 Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
 locking.  A group working in userspace read those sources and wrote a
 tool that mmaps many files read only and locked, munmapping them when
 they are no longer needed.  Locking is used because they cannot afford a
 major fault, but they are fine with minor faults.  This tends to
 fragment memory badly so when they started looking into using hugetlbfs
 (or anything requiring order  0 allocations) they found they were not
 able to allocate the memory.  They were confused based on the referenced
 documentation as to why compaction would continually fail to yield
 appropriately sized contiguous areas when there was more than enough
 free memory.

So you are saying that mlocking (VM_LOCKED) prevents migration and thus
compaction to do its job? If that's true, I think it's a bug as it is AFAIK
supposed to work just fine.

 I would like to see the situation with VM_LOCKED cleared up, ideally the
 documentation would remain and reality adjusted to match and I think
 Peter's VM_PINNED set goes in the right direction for this goal.  What
 is missing and how can I help?

I don't think VM_PINNED would help you. In fact it is VM_PINNED that improves
accounting for the kind of locking (pinning) that *does* prevent page migration
(unlike mlocking)... quoting the patchset cover letter:

These patches introduce VM_PINNED infrastructure, vma tracking of persistent
'pinned' page ranges. Pinned is anything that has a fixed phys address (as
required for say IO DMA engines) and thus cannot use the weaker VM_LOCKED. One
popular way to pin pages is through get_user_pages() but that not nessecarily
the only way.

 Thanks,
 Eric


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Eric B Munson
On Tue, 03 Mar 2015, Vlastimil Babka wrote:

 On 03/03/2015 07:45 PM, Eric B Munson wrote:
  On Tue, 03 Mar 2015, Vlastimil Babka wrote:
  
  On 03/03/2015 06:41 PM, Eric B Munson wrote: All,
  
   After LSF/MM last year Peter revived a patch set that would create
   infrastructure for pinning pages as opposed to simply locking them.
   AFAICT, there was no objection to the set, it just needed some help
   from the IB folks.
  
   Am I missing something about why it was never merged?  I ask because
   Akamai has bumped into the disconnect between the mlock manpage,
   Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
   locking.  A group working in userspace read those sources and wrote a
   tool that mmaps many files read only and locked, munmapping them when
   they are no longer needed.  Locking is used because they cannot afford a
   major fault, but they are fine with minor faults.  This tends to
   fragment memory badly so when they started looking into using hugetlbfs
   (or anything requiring order  0 allocations) they found they were not
   able to allocate the memory.  They were confused based on the referenced
   documentation as to why compaction would continually fail to yield
   appropriately sized contiguous areas when there was more than enough
   free memory.
  
  So you are saying that mlocking (VM_LOCKED) prevents migration and thus
  compaction to do its job? If that's true, I think it's a bug as it is AFAIK
  supposed to work just fine.
  
  Agreed.  But as has been discussed in the threads around the VM_PINNED
  work, there are people that are relying on the fact that VM_LOCKED
  promises no minor faults.  Which is why the behavoir has remained.
 
 At least in the VM_PINNED thread after last lsf/mm, I don't see this 
 mentioned.
 I found no references to mlocking in compaction.c, and in migrate.c there's 
 just
 mlock_migrate_page() with comment:
 
 /*
  * mlock_migrate_page - called only from migrate_page_copy() to
  * migrate the Mlocked page flag; update statistics.
  */
 
 It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where
 is this restriction?
 

I spent quite some time looking for it as well, it is in vmscan.c

int __isolate_lru_page(struct page *page, isolate_mode_t mode)
{
...
/* Compaction should not handle unevictable pages but CMA can do so */
if (PageUnevictable(page)  !(mode  ISOLATE_UNEVICTABLE))
return ret;
...




signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Christoph Lameter
On Tue, 3 Mar 2015, Vlastimil Babka wrote:

 It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where
 is this restriction?

Its in the defrag code.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Vlastimil Babka
On 03/03/2015 07:45 PM, Eric B Munson wrote:
 On Tue, 03 Mar 2015, Vlastimil Babka wrote:
 
 On 03/03/2015 06:41 PM, Eric B Munson wrote: All,
 
  After LSF/MM last year Peter revived a patch set that would create
  infrastructure for pinning pages as opposed to simply locking them.
  AFAICT, there was no objection to the set, it just needed some help
  from the IB folks.
 
  Am I missing something about why it was never merged?  I ask because
  Akamai has bumped into the disconnect between the mlock manpage,
  Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
  locking.  A group working in userspace read those sources and wrote a
  tool that mmaps many files read only and locked, munmapping them when
  they are no longer needed.  Locking is used because they cannot afford a
  major fault, but they are fine with minor faults.  This tends to
  fragment memory badly so when they started looking into using hugetlbfs
  (or anything requiring order  0 allocations) they found they were not
  able to allocate the memory.  They were confused based on the referenced
  documentation as to why compaction would continually fail to yield
  appropriately sized contiguous areas when there was more than enough
  free memory.
 
 So you are saying that mlocking (VM_LOCKED) prevents migration and thus
 compaction to do its job? If that's true, I think it's a bug as it is AFAIK
 supposed to work just fine.
 
 Agreed.  But as has been discussed in the threads around the VM_PINNED
 work, there are people that are relying on the fact that VM_LOCKED
 promises no minor faults.  Which is why the behavoir has remained.

At least in the VM_PINNED thread after last lsf/mm, I don't see this mentioned.
I found no references to mlocking in compaction.c, and in migrate.c there's just
mlock_migrate_page() with comment:

/*
 * mlock_migrate_page - called only from migrate_page_copy() to
 * migrate the Mlocked page flag; update statistics.
 */

It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? Where
is this restriction?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Eric B Munson
On Tue, 03 Mar 2015, Eric B Munson wrote:

 On Tue, 03 Mar 2015, Vlastimil Babka wrote:
 
  On 03/03/2015 07:45 PM, Eric B Munson wrote:
   On Tue, 03 Mar 2015, Vlastimil Babka wrote:
   
   On 03/03/2015 06:41 PM, Eric B Munson wrote: All,
   
After LSF/MM last year Peter revived a patch set that would create
infrastructure for pinning pages as opposed to simply locking them.
AFAICT, there was no objection to the set, it just needed some help
from the IB folks.
   
Am I missing something about why it was never merged?  I ask because
Akamai has bumped into the disconnect between the mlock manpage,
Documentation/vm/unevictable-lru.txt, and reality WRT compaction and
locking.  A group working in userspace read those sources and wrote a
tool that mmaps many files read only and locked, munmapping them when
they are no longer needed.  Locking is used because they cannot afford 
a
major fault, but they are fine with minor faults.  This tends to
fragment memory badly so when they started looking into using hugetlbfs
(or anything requiring order  0 allocations) they found they were not
able to allocate the memory.  They were confused based on the 
referenced
documentation as to why compaction would continually fail to yield
appropriately sized contiguous areas when there was more than enough
free memory.
   
   So you are saying that mlocking (VM_LOCKED) prevents migration and thus
   compaction to do its job? If that's true, I think it's a bug as it is 
   AFAIK
   supposed to work just fine.
   
   Agreed.  But as has been discussed in the threads around the VM_PINNED
   work, there are people that are relying on the fact that VM_LOCKED
   promises no minor faults.  Which is why the behavoir has remained.
  
  At least in the VM_PINNED thread after last lsf/mm, I don't see this 
  mentioned.
  I found no references to mlocking in compaction.c, and in migrate.c there's 
  just
  mlock_migrate_page() with comment:
  
  /*
   * mlock_migrate_page - called only from migrate_page_copy() to
   * migrate the Mlocked page flag; update statistics.
   */
  
  It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? 
  Where
  is this restriction?
  
 
 I spent quite some time looking for it as well, it is in vmscan.c
 
 int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 {
 ...
 /* Compaction should not handle unevictable pages but CMA can do so */
 if (PageUnevictable(page)  !(mode  ISOLATE_UNEVICTABLE))
 return ret;
 ...
 
 

And that demonstrates that I haven't spent enough time with this code,
that isn't the restriction because when this is called from compaction.c
the mode is set to ISOLATE_UNEVICTABLE.  So back to reading the code.

Eric


signature.asc
Description: Digital signature


Re: Resurrecting the VM_PINNED discussion

2015-03-03 Thread Vlastimil Babka
On 03/03/2015 10:52 PM, Eric B Munson wrote:
 On Tue, 03 Mar 2015, Eric B Munson wrote:
 
 On Tue, 03 Mar 2015, Vlastimil Babka wrote:
 
  On 03/03/2015 07:45 PM, Eric B Munson wrote:
   On Tue, 03 Mar 2015, Vlastimil Babka wrote:
   
   Agreed.  But as has been discussed in the threads around the VM_PINNED
   work, there are people that are relying on the fact that VM_LOCKED
   promises no minor faults.  Which is why the behavoir has remained.
  
  At least in the VM_PINNED thread after last lsf/mm, I don't see this 
  mentioned.
  I found no references to mlocking in compaction.c, and in migrate.c 
  there's just
  mlock_migrate_page() with comment:
  
  /*
   * mlock_migrate_page - called only from migrate_page_copy() to
   * migrate the Mlocked page flag; update statistics.
   */
  
  It also passes TTU_IGNORE_MLOCK to try_to_unmap(). So what am I missing? 
  Where
  is this restriction?
  
 
 I spent quite some time looking for it as well, it is in vmscan.c
 
 int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 {
 ...
 /* Compaction should not handle unevictable pages but CMA can do so 
 */
 if (PageUnevictable(page)  !(mode  ISOLATE_UNEVICTABLE))
 return ret;
 ...
 
 
 
 And that demonstrates that I haven't spent enough time with this code,
 that isn't the restriction because when this is called from compaction.c
 the mode is set to ISOLATE_UNEVICTABLE.  So back to reading the code.

No, you were correct and thanks for the hint. It's only ISOLATE_UNEVICTABLE from
isolate_migratepages_range(), which is CMA, not regular compaction.
But I wonder, can we change this even after VM_PINNED is introduced, if existing
code depends on no minor faults in mlocked areas, whatever the docs say? On
the other hand, compaction is not the only source of migrations. I wonder what
the NUMA balancing does (not) about mlocked areas...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/