Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-29 Thread Vlastimil Babka
On 9/28/22 04:28, Suren Baghdasaryan wrote:
> On Sun, Sep 11, 2022 at 2:35 AM Vlastimil Babka  wrote:
>>
>> On 9/2/22 01:26, Suren Baghdasaryan wrote:
>> >
>> >>
>> >> Two complaints so far:
>> >>  - I don't like the vma_mark_locked() name. To me it says that the caller
>> >>already took or is taking the lock and this function is just marking 
>> >> that
>> >>we're holding the lock, but it's really taking a different type of 
>> >> lock. But
>> >>this function can block, it really is taking a lock, so it should say 
>> >> that.
>> >>
>> >>This is AFAIK a new concept, not sure I'm going to have anything good 
>> >> either,
>> >>but perhaps vma_lock_multiple()?
>> >
>> > I'm open to name suggestions but vma_lock_multiple() is a bit
>> > confusing to me. Will wait for more suggestions.
>>
>> Well, it does act like a vma_write_lock(), no? So why not that name. The
>> checking function for it is even called vma_assert_write_locked().
>>
>> We just don't provide a single vma_write_unlock(), but a
>> vma_mark_unlocked_all(), that could be instead named e.g.
>> vma_write_unlock_all().
>> But it's called on a mm, so maybe e.g. mm_vma_write_unlock_all()?
> 
> Thank you for your suggestions, Vlastimil! vma_write_lock() sounds
> good to me. For vma_mark_unlocked_all() replacement, I would prefer
> vma_write_unlock_all() which keeps the vma_write_XXX naming pattern to

OK.

> indicate that these are operating on the same locks. If the fact that
> it accepts mm_struct as a parameter is an issue then maybe
> vma_write_unlock_mm() ?

Sounds good!

>>
>>



Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-27 Thread Suren Baghdasaryan
On Sun, Sep 11, 2022 at 2:35 AM Vlastimil Babka  wrote:
>
> On 9/2/22 01:26, Suren Baghdasaryan wrote:
> > On Thu, Sep 1, 2022 at 1:58 PM Kent Overstreet
> >  wrote:
> >>
> >> On Thu, Sep 01, 2022 at 10:34:48AM -0700, Suren Baghdasaryan wrote:
> >> > Resending to fix the issue with the In-Reply-To tag in the original
> >> > submission at [4].
> >> >
> >> > This is a proof of concept for per-vma locks idea that was discussed
> >> > during SPF [1] discussion at LSF/MM this year [2], which concluded with
> >> > suggestion that “a reader/writer semaphore could be put into the VMA
> >> > itself; that would have the effect of using the VMA as a sort of range
> >> > lock. There would still be contention at the VMA level, but it would be 
> >> > an
> >> > improvement.” This patchset implements this suggested approach.
> >> >
> >> > When handling page faults we lookup the VMA that contains the faulting
> >> > page under RCU protection and try to acquire its lock. If that fails we
> >> > fall back to using mmap_lock, similar to how SPF handled this situation.
> >> >
> >> > One notable way the implementation deviates from the proposal is the way
> >> > VMAs are marked as locked. Because during some of mm updates multiple
> >> > VMAs need to be locked until the end of the update (e.g. vma_merge,
> >> > split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks
> >> > and other complications would make the code more complex. Therefore we
> >> > provide a way to "mark" VMAs as locked and then unmark all locked VMAs
> >> > all at once. This is done using two sequence numbers - one in the
> >> > vm_area_struct and one in the mm_struct. VMA is considered locked when
> >> > these sequence numbers are equal. To mark a VMA as locked we set the
> >> > sequence number in vm_area_struct to be equal to the sequence number
> >> > in mm_struct. To unlock all VMAs we increment mm_struct's seq number.
> >> > This allows for an efficient way to track locked VMAs and to drop the
> >> > locks on all VMAs at the end of the update.
> >>
> >> I like it - the sequence numbers are a stroke of genuius. For what it's 
> >> doing
> >> the patchset seems almost small.
> >
> > Thanks for reviewing it!
> >
> >>
> >> Two complaints so far:
> >>  - I don't like the vma_mark_locked() name. To me it says that the caller
> >>already took or is taking the lock and this function is just marking 
> >> that
> >>we're holding the lock, but it's really taking a different type of 
> >> lock. But
> >>this function can block, it really is taking a lock, so it should say 
> >> that.
> >>
> >>This is AFAIK a new concept, not sure I'm going to have anything good 
> >> either,
> >>but perhaps vma_lock_multiple()?
> >
> > I'm open to name suggestions but vma_lock_multiple() is a bit
> > confusing to me. Will wait for more suggestions.
>
> Well, it does act like a vma_write_lock(), no? So why not that name. The
> checking function for it is even called vma_assert_write_locked().
>
> We just don't provide a single vma_write_unlock(), but a
> vma_mark_unlocked_all(), that could be instead named e.g.
> vma_write_unlock_all().
> But it's called on a mm, so maybe e.g. mm_vma_write_unlock_all()?

Thank you for your suggestions, Vlastimil! vma_write_lock() sounds
good to me. For vma_mark_unlocked_all() replacement, I would prefer
vma_write_unlock_all() which keeps the vma_write_XXX naming pattern to
indicate that these are operating on the same locks. If the fact that
it accepts mm_struct as a parameter is an issue then maybe
vma_write_unlock_mm() ?

>
>


Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-11 Thread Vlastimil Babka
On 9/2/22 01:26, Suren Baghdasaryan wrote:
> On Thu, Sep 1, 2022 at 1:58 PM Kent Overstreet
>  wrote:
>>
>> On Thu, Sep 01, 2022 at 10:34:48AM -0700, Suren Baghdasaryan wrote:
>> > Resending to fix the issue with the In-Reply-To tag in the original
>> > submission at [4].
>> >
>> > This is a proof of concept for per-vma locks idea that was discussed
>> > during SPF [1] discussion at LSF/MM this year [2], which concluded with
>> > suggestion that “a reader/writer semaphore could be put into the VMA
>> > itself; that would have the effect of using the VMA as a sort of range
>> > lock. There would still be contention at the VMA level, but it would be an
>> > improvement.” This patchset implements this suggested approach.
>> >
>> > When handling page faults we lookup the VMA that contains the faulting
>> > page under RCU protection and try to acquire its lock. If that fails we
>> > fall back to using mmap_lock, similar to how SPF handled this situation.
>> >
>> > One notable way the implementation deviates from the proposal is the way
>> > VMAs are marked as locked. Because during some of mm updates multiple
>> > VMAs need to be locked until the end of the update (e.g. vma_merge,
>> > split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks
>> > and other complications would make the code more complex. Therefore we
>> > provide a way to "mark" VMAs as locked and then unmark all locked VMAs
>> > all at once. This is done using two sequence numbers - one in the
>> > vm_area_struct and one in the mm_struct. VMA is considered locked when
>> > these sequence numbers are equal. To mark a VMA as locked we set the
>> > sequence number in vm_area_struct to be equal to the sequence number
>> > in mm_struct. To unlock all VMAs we increment mm_struct's seq number.
>> > This allows for an efficient way to track locked VMAs and to drop the
>> > locks on all VMAs at the end of the update.
>>
>> I like it - the sequence numbers are a stroke of genuius. For what it's doing
>> the patchset seems almost small.
> 
> Thanks for reviewing it!
> 
>>
>> Two complaints so far:
>>  - I don't like the vma_mark_locked() name. To me it says that the caller
>>already took or is taking the lock and this function is just marking that
>>we're holding the lock, but it's really taking a different type of lock. 
>> But
>>this function can block, it really is taking a lock, so it should say 
>> that.
>>
>>This is AFAIK a new concept, not sure I'm going to have anything good 
>> either,
>>but perhaps vma_lock_multiple()?
> 
> I'm open to name suggestions but vma_lock_multiple() is a bit
> confusing to me. Will wait for more suggestions.

Well, it does act like a vma_write_lock(), no? So why not that name. The
checking function for it is even called vma_assert_write_locked().

We just don't provide a single vma_write_unlock(), but a
vma_mark_unlocked_all(), that could be instead named e.g.
vma_write_unlock_all().
But it's called on a mm, so maybe e.g. mm_vma_write_unlock_all()?




Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-05 Thread Kent Overstreet
On Mon, Sep 05, 2022 at 11:32:48AM -0700, Suren Baghdasaryan wrote:
> On Mon, Sep 5, 2022 at 5:32 AM 'Michal Hocko' via kernel-team
>  wrote:
> >
> > Unless I am missing something, this is not based on the Maple tree
> > rewrite, right? Does the change in the data structure makes any
> > difference to the approach? I remember discussions at LSFMM where it has
> > been pointed out that some issues with the vma tree are considerably
> > simpler to handle with the maple tree.
> 
> Correct, this does not use the Maple tree yet but once Maple tree
> transition happens and it supports RCU-safe lookups, my code in
> find_vma_under_rcu() becomes really simple.
> 
> >
> > On Thu 01-09-22 10:34:48, Suren Baghdasaryan wrote:
> > [...]
> > > One notable way the implementation deviates from the proposal is the way
> > > VMAs are marked as locked. Because during some of mm updates multiple
> > > VMAs need to be locked until the end of the update (e.g. vma_merge,
> > > split_vma, etc).
> >
> > I think it would be really helpful to spell out those issues in a greater
> > detail. Not everybody is aware of those vma related subtleties.
> 
> Ack. I'll expand the description of the cases when multiple VMAs need
> to be locked in the same update. The main difficulties are:
> 1. Multiple VMAs might need to be locked within one
> mmap_write_lock/mmap_write_unlock session (will call it an update
> transaction).
> 2. Figuring out when it's safe to unlock a previously locked VMA is
> tricky because that might be happening in different functions and at
> different call levels.
> 
> So, instead of the usual lock/unlock pattern, the proposed solution
> marks a VMA as locked and provides an efficient way to:
> 1. Identify locked VMAs.
> 2. Unlock all locked VMAs in bulk.
> 
> We also postpone unlocking the locked VMAs until the end of the update
> transaction, when we do mmap_write_unlock. Potentially this keeps a
> VMA locked for longer than is absolutely necessary but it results in a
> big reduction of code complexity.

Correct me if I'm wrong, but it looks like any time multiple VMAs need to be
locked we need mmap_lock anyways, which is what makes your approach so sweet.

If however we ever want to lock multiple VMAs without taking mmap_lock, then
deadlock avoidance algorithms aren't that bad - there's the ww_mutex approach,
which is simple and works well when there isn't much expected contention (the
advantage of the ww_mutex approach is that it doesn't have to track all held
locks). I've also written full cycle detection; that approcah gets you fewer
restarts, at the cost of needing a list of all currently held locks.


Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-02 Thread Peter Zijlstra
On Thu, Sep 01, 2022 at 10:34:48AM -0700, Suren Baghdasaryan wrote:
> This is a proof of concept for per-vma locks idea that was discussed
> during SPF [1] discussion at LSF/MM this year [2], which concluded with
> suggestion that “a reader/writer semaphore could be put into the VMA
> itself; that would have the effect of using the VMA as a sort of range
> lock. There would still be contention at the VMA level, but it would be an
> improvement.” This patchset implements this suggested approach.

The whole reason I started the SPF thing waay back when was because one
of the primary reporters at the time had very large VMAs and a per-vma
lock wouldn't actually help anything at all.

IIRC it was either scientific code initializing a huge matrix or a
database with a giant table; I'm sure the archives have better memory
than me.



Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-01 Thread Kent Overstreet
On Thu, Sep 01, 2022 at 10:34:48AM -0700, Suren Baghdasaryan wrote:
> Resending to fix the issue with the In-Reply-To tag in the original
> submission at [4].
> 
> This is a proof of concept for per-vma locks idea that was discussed
> during SPF [1] discussion at LSF/MM this year [2], which concluded with
> suggestion that “a reader/writer semaphore could be put into the VMA
> itself; that would have the effect of using the VMA as a sort of range
> lock. There would still be contention at the VMA level, but it would be an
> improvement.” This patchset implements this suggested approach.
> 
> When handling page faults we lookup the VMA that contains the faulting
> page under RCU protection and try to acquire its lock. If that fails we
> fall back to using mmap_lock, similar to how SPF handled this situation.
> 
> One notable way the implementation deviates from the proposal is the way
> VMAs are marked as locked. Because during some of mm updates multiple
> VMAs need to be locked until the end of the update (e.g. vma_merge,
> split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks
> and other complications would make the code more complex. Therefore we
> provide a way to "mark" VMAs as locked and then unmark all locked VMAs
> all at once. This is done using two sequence numbers - one in the
> vm_area_struct and one in the mm_struct. VMA is considered locked when
> these sequence numbers are equal. To mark a VMA as locked we set the
> sequence number in vm_area_struct to be equal to the sequence number
> in mm_struct. To unlock all VMAs we increment mm_struct's seq number.
> This allows for an efficient way to track locked VMAs and to drop the
> locks on all VMAs at the end of the update.

I like it - the sequence numbers are a stroke of genuius. For what it's doing
the patchset seems almost small.

Two complaints so far:
 - I don't like the vma_mark_locked() name. To me it says that the caller
   already took or is taking the lock and this function is just marking that
   we're holding the lock, but it's really taking a different type of lock. But
   this function can block, it really is taking a lock, so it should say that.
   
   This is AFAIK a new concept, not sure I'm going to have anything good either,
   but perhaps vma_lock_multiple()?

 - I don't like the #ifdef and the separate fallback path in the fault handlers.

   Can we make find_and_lock_anon_vma() do the right thing, and not fail unless
   e.g. there isn't a vma at that address? Just have it wait for vm_lock_seq to
   change and then retry if needed.