Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-10-01 Thread Daniel Jordan
On Sat, Sep 29, 2018 at 08:50:29AM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > The error handling in __swap_duplicate (before this series) still leaves
> > something to be desired IMHO.  Why all the different returns when callers
> > ignore them or only specifically check for -ENOMEM or -EEXIST?  Could maybe
> > stand a cleanup, but outside this series.
> 
> Yes.  Maybe.  I guess you will work on this?

Sure, I'll see how it turns out.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-10-01 Thread Daniel Jordan
On Sat, Sep 29, 2018 at 08:50:29AM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > The error handling in __swap_duplicate (before this series) still leaves
> > something to be desired IMHO.  Why all the different returns when callers
> > ignore them or only specifically check for -ENOMEM or -EEXIST?  Could maybe
> > stand a cleanup, but outside this series.
> 
> Yes.  Maybe.  I guess you will work on this?

Sure, I'll see how it turns out.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-28 Thread Huang, Ying
Daniel Jordan  writes:

> On Fri, Sep 28, 2018 at 04:19:03PM +0800, Huang, Ying wrote:
>> Daniel Jordan  writes:
>> > One way is to change
>> > copy_one_pte's return to int so we can just pass the error code back to
>> > copy_pte_range so it knows whether to try adding the continuation.
>> 
>> There may be even more problems.  After add_swap_count_continuation(),
>> copy_one_pte() will be retried, and the CPU may hang with dead loop.
>
> That's true, it would do that.
>
>> But before the changes in this patchset, the behavior is,
>> __swap_duplicate() return an error that isn't -ENOMEM, such as -EEXIST.
>> Then copy_one_pte() would thought the operation has been done
>> successfully, and go to call set_pte_at().  This will cause the system
>> state become inconsistent, and the system may panic or hang somewhere
>> later.
>> 
>> So per my understanding, if we thought page table corruption isn't a
>> real problem (that is, __swap_duplicate() will never return e.g. -EEXIST
>> if copied by copy_one_pte() indirectly), both the original and the new
>> code should be OK.
>> 
>> If we thought it is a real problem, we need to fix the original code and
>> keep it fixed in the new code.  Do you agree?
>
> Yes, if it was a real problem, which seems less and less the case the more I
> stare at this.
>
>> There's several ways to fix the problem.  But the page table shouldn't
>> be corrupted in practice, unless there's some programming error.  So I
>> suggest to make it as simple as possible via adding,
>> 
>> VM_BUG_ON(error != -ENOMEM);
>> 
>> in swap_duplicate().
>> 
>> Do you agree?
>
> Yes, I'm ok with that, adding in -ENOTDIR along with it.

Sure.  I will do this.

> The error handling in __swap_duplicate (before this series) still leaves
> something to be desired IMHO.  Why all the different returns when callers
> ignore them or only specifically check for -ENOMEM or -EEXIST?  Could maybe
> stand a cleanup, but outside this series.

Yes.  Maybe.  I guess you will work on this?

Best Regards,
Huang, Ying


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-28 Thread Huang, Ying
Daniel Jordan  writes:

> On Fri, Sep 28, 2018 at 04:19:03PM +0800, Huang, Ying wrote:
>> Daniel Jordan  writes:
>> > One way is to change
>> > copy_one_pte's return to int so we can just pass the error code back to
>> > copy_pte_range so it knows whether to try adding the continuation.
>> 
>> There may be even more problems.  After add_swap_count_continuation(),
>> copy_one_pte() will be retried, and the CPU may hang with dead loop.
>
> That's true, it would do that.
>
>> But before the changes in this patchset, the behavior is,
>> __swap_duplicate() return an error that isn't -ENOMEM, such as -EEXIST.
>> Then copy_one_pte() would thought the operation has been done
>> successfully, and go to call set_pte_at().  This will cause the system
>> state become inconsistent, and the system may panic or hang somewhere
>> later.
>> 
>> So per my understanding, if we thought page table corruption isn't a
>> real problem (that is, __swap_duplicate() will never return e.g. -EEXIST
>> if copied by copy_one_pte() indirectly), both the original and the new
>> code should be OK.
>> 
>> If we thought it is a real problem, we need to fix the original code and
>> keep it fixed in the new code.  Do you agree?
>
> Yes, if it was a real problem, which seems less and less the case the more I
> stare at this.
>
>> There's several ways to fix the problem.  But the page table shouldn't
>> be corrupted in practice, unless there's some programming error.  So I
>> suggest to make it as simple as possible via adding,
>> 
>> VM_BUG_ON(error != -ENOMEM);
>> 
>> in swap_duplicate().
>> 
>> Do you agree?
>
> Yes, I'm ok with that, adding in -ENOTDIR along with it.

Sure.  I will do this.

> The error handling in __swap_duplicate (before this series) still leaves
> something to be desired IMHO.  Why all the different returns when callers
> ignore them or only specifically check for -ENOMEM or -EEXIST?  Could maybe
> stand a cleanup, but outside this series.

Yes.  Maybe.  I guess you will work on this?

Best Regards,
Huang, Ying


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-28 Thread Daniel Jordan
On Fri, Sep 28, 2018 at 04:19:03PM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > One way is to change
> > copy_one_pte's return to int so we can just pass the error code back to
> > copy_pte_range so it knows whether to try adding the continuation.
> 
> There may be even more problems.  After add_swap_count_continuation(),
> copy_one_pte() will be retried, and the CPU may hang with dead loop.

That's true, it would do that.

> But before the changes in this patchset, the behavior is,
> __swap_duplicate() return an error that isn't -ENOMEM, such as -EEXIST.
> Then copy_one_pte() would thought the operation has been done
> successfully, and go to call set_pte_at().  This will cause the system
> state become inconsistent, and the system may panic or hang somewhere
> later.
> 
> So per my understanding, if we thought page table corruption isn't a
> real problem (that is, __swap_duplicate() will never return e.g. -EEXIST
> if copied by copy_one_pte() indirectly), both the original and the new
> code should be OK.
> 
> If we thought it is a real problem, we need to fix the original code and
> keep it fixed in the new code.  Do you agree?

Yes, if it was a real problem, which seems less and less the case the more I
stare at this.

> There's several ways to fix the problem.  But the page table shouldn't
> be corrupted in practice, unless there's some programming error.  So I
> suggest to make it as simple as possible via adding,
> 
> VM_BUG_ON(error != -ENOMEM);
> 
> in swap_duplicate().
> 
> Do you agree?

Yes, I'm ok with that, adding in -ENOTDIR along with it.

The error handling in __swap_duplicate (before this series) still leaves
something to be desired IMHO.  Why all the different returns when callers
ignore them or only specifically check for -ENOMEM or -EEXIST?  Could maybe
stand a cleanup, but outside this series.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-28 Thread Daniel Jordan
On Fri, Sep 28, 2018 at 04:19:03PM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > One way is to change
> > copy_one_pte's return to int so we can just pass the error code back to
> > copy_pte_range so it knows whether to try adding the continuation.
> 
> There may be even more problems.  After add_swap_count_continuation(),
> copy_one_pte() will be retried, and the CPU may hang with dead loop.

That's true, it would do that.

> But before the changes in this patchset, the behavior is,
> __swap_duplicate() return an error that isn't -ENOMEM, such as -EEXIST.
> Then copy_one_pte() would thought the operation has been done
> successfully, and go to call set_pte_at().  This will cause the system
> state become inconsistent, and the system may panic or hang somewhere
> later.
> 
> So per my understanding, if we thought page table corruption isn't a
> real problem (that is, __swap_duplicate() will never return e.g. -EEXIST
> if copied by copy_one_pte() indirectly), both the original and the new
> code should be OK.
> 
> If we thought it is a real problem, we need to fix the original code and
> keep it fixed in the new code.  Do you agree?

Yes, if it was a real problem, which seems less and less the case the more I
stare at this.

> There's several ways to fix the problem.  But the page table shouldn't
> be corrupted in practice, unless there's some programming error.  So I
> suggest to make it as simple as possible via adding,
> 
> VM_BUG_ON(error != -ENOMEM);
> 
> in swap_duplicate().
> 
> Do you agree?

Yes, I'm ok with that, adding in -ENOTDIR along with it.

The error handling in __swap_duplicate (before this series) still leaves
something to be desired IMHO.  Why all the different returns when callers
ignore them or only specifically check for -ENOMEM or -EEXIST?  Could maybe
stand a cleanup, but outside this series.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-28 Thread Huang, Ying
Daniel Jordan  writes:

> On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote:
>> Daniel Jordan  writes:
>> > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
>> >> Daniel Jordan  writes:
>> >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
>> >> >>  /*
>> >> >>   * Increase reference count of swap entry by 1.
>> >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
>> >> >> required
>> >> >> - * but could not be atomically allocated.  Returns 0, just as if it 
>> >> >> succeeded,
>> >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or 
>> >> >> -ENOENT), which
>> >> >> - * might occur if a page table entry has got corrupted.
>> >> >> + *
>> >> >> + * Return error code in following case.
>> >> >> + * - success -> 0
>> >> >> + * - swap_count_continuation is required but could not be atomically 
>> >> >> allocated.
>> >> >> + *   *entry is used to return swap entry to call 
>> >> >> add_swap_count_continuation().
>> >> >> + *  
>> >> >> -> ENOMEM
>> >> >> + * - otherwise same as __swap_duplicate()
>> >> >>   */
>> >> >> -int swap_duplicate(swp_entry_t entry)
>> >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>> >> >>  {
>> >> >>int err = 0;
>> >> >>  
>> >> >> -  while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> >> >> -  err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> >> >> +  while (!err &&
>> >> >> + (err = __swap_duplicate(entry, entry_size, 1)) == 
>> >> >> -ENOMEM)
>> >> >> +  err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>> >> >>return err;
>> >> >
>> >> > Now we're returning any error we get from __swap_duplicate, apparently 
>> >> > to
>> >> > accommodate ENOTDIR later in the series, which is a change from the 
>> >> > behavior
>> >> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  
>> >> > This might
>> >> > belong in a separate patch given its potential for side effects.
>> >> 
>> >> I have checked all the calls of the function and found there will be no
>> >> bad effect.  Do you have any side effect?
>> >
>> > Before I was just being vaguely concerned about any unintended side 
>> > effects,
>> > but looking again, yes I do.
>> >
>> > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte 
>> > returns
>> > a (potentially nonzero) entry.val, which copy_pte_range interprets
>> > unconditionally as 'try adding a swap count continuation.'  Not what we 
>> > want
>> > for returns other than -ENOMEM.
>> 
>> Thanks for pointing this out!  Before the change in the patchset, the
>> behavior is,
>> 
>> Something wrong is detected in swap_duplicate(), but the error is
>> ignored.  Then copy_one_pte() will think everything is OK, so that it
>> can proceed to call set_pte_at().  The system will be in inconsistent
>> state and some data may be polluted!
>
> Yes, the part about page table corruption in the comment above swap_duplicate.
>
>> But this doesn't cause any problem in practical.  Per my understanding,
>> because if other part of the kernel works correctly, it's impossible for
>> swap_duplicate() return any error except -ENOMEM before the change in
>> this patchset.
>
> I agree with that, but it's not what I'm trying to explain.  I didn't go into
> enough detail, let me try again.  Hopefully I'm understanding this right.
>
> While running with these patches, say we're at
>
>   copy_pte_range
>copy_one_pte
> swap_duplicate
>  __swap_duplicate
>   __swap_duplicate_locked
> 
> And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as
> -EEXIST.  That means __swap_duplicate and swap_duplicate also return -EEXIST.
> copy_one_pte returns entry.val, which can be and usually is nonzero, so we
> break out of the loop in copy_pte_range and then--erroneously--call
> add_swap_count_continuation.
>
> The add_swap_count_continuation call was added in 570a335b8e22 and relies on
> the assumption that callers can only get -ENOMEM from swap_duplicate.  This
> patch changes that assumption.
>
> Not a big deal: the continuation call just returns early, no harm done, but it
> allocs and frees a page needlessly, so we should fix it.  One way is to change
> copy_one_pte's return to int so we can just pass the error code back to
> copy_pte_range so it knows whether to try adding the continuation.

There may be even more problems.  After add_swap_count_continuation(),
copy_one_pte() will be retried, and the CPU may hang with dead loop.

But before the changes in this patchset, the behavior is,
__swap_duplicate() return an error that isn't -ENOMEM, such as -EEXIST.
Then copy_one_pte() would thought the operation has been done
successfully, and go to call set_pte_at().  This will cause the system
state become inconsistent, and the system may panic or hang somewhere
later.

So per my understanding, if we thought page 

Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-28 Thread Huang, Ying
Daniel Jordan  writes:

> On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote:
>> Daniel Jordan  writes:
>> > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
>> >> Daniel Jordan  writes:
>> >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
>> >> >>  /*
>> >> >>   * Increase reference count of swap entry by 1.
>> >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
>> >> >> required
>> >> >> - * but could not be atomically allocated.  Returns 0, just as if it 
>> >> >> succeeded,
>> >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or 
>> >> >> -ENOENT), which
>> >> >> - * might occur if a page table entry has got corrupted.
>> >> >> + *
>> >> >> + * Return error code in following case.
>> >> >> + * - success -> 0
>> >> >> + * - swap_count_continuation is required but could not be atomically 
>> >> >> allocated.
>> >> >> + *   *entry is used to return swap entry to call 
>> >> >> add_swap_count_continuation().
>> >> >> + *  
>> >> >> -> ENOMEM
>> >> >> + * - otherwise same as __swap_duplicate()
>> >> >>   */
>> >> >> -int swap_duplicate(swp_entry_t entry)
>> >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>> >> >>  {
>> >> >>int err = 0;
>> >> >>  
>> >> >> -  while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> >> >> -  err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> >> >> +  while (!err &&
>> >> >> + (err = __swap_duplicate(entry, entry_size, 1)) == 
>> >> >> -ENOMEM)
>> >> >> +  err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>> >> >>return err;
>> >> >
>> >> > Now we're returning any error we get from __swap_duplicate, apparently 
>> >> > to
>> >> > accommodate ENOTDIR later in the series, which is a change from the 
>> >> > behavior
>> >> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  
>> >> > This might
>> >> > belong in a separate patch given its potential for side effects.
>> >> 
>> >> I have checked all the calls of the function and found there will be no
>> >> bad effect.  Do you have any side effect?
>> >
>> > Before I was just being vaguely concerned about any unintended side 
>> > effects,
>> > but looking again, yes I do.
>> >
>> > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte 
>> > returns
>> > a (potentially nonzero) entry.val, which copy_pte_range interprets
>> > unconditionally as 'try adding a swap count continuation.'  Not what we 
>> > want
>> > for returns other than -ENOMEM.
>> 
>> Thanks for pointing this out!  Before the change in the patchset, the
>> behavior is,
>> 
>> Something wrong is detected in swap_duplicate(), but the error is
>> ignored.  Then copy_one_pte() will think everything is OK, so that it
>> can proceed to call set_pte_at().  The system will be in inconsistent
>> state and some data may be polluted!
>
> Yes, the part about page table corruption in the comment above swap_duplicate.
>
>> But this doesn't cause any problem in practical.  Per my understanding,
>> because if other part of the kernel works correctly, it's impossible for
>> swap_duplicate() return any error except -ENOMEM before the change in
>> this patchset.
>
> I agree with that, but it's not what I'm trying to explain.  I didn't go into
> enough detail, let me try again.  Hopefully I'm understanding this right.
>
> While running with these patches, say we're at
>
>   copy_pte_range
>copy_one_pte
> swap_duplicate
>  __swap_duplicate
>   __swap_duplicate_locked
> 
> And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as
> -EEXIST.  That means __swap_duplicate and swap_duplicate also return -EEXIST.
> copy_one_pte returns entry.val, which can be and usually is nonzero, so we
> break out of the loop in copy_pte_range and then--erroneously--call
> add_swap_count_continuation.
>
> The add_swap_count_continuation call was added in 570a335b8e22 and relies on
> the assumption that callers can only get -ENOMEM from swap_duplicate.  This
> patch changes that assumption.
>
> Not a big deal: the continuation call just returns early, no harm done, but it
> allocs and frees a page needlessly, so we should fix it.  One way is to change
> copy_one_pte's return to int so we can just pass the error code back to
> copy_pte_range so it knows whether to try adding the continuation.

There may be even more problems.  After add_swap_count_continuation(),
copy_one_pte() will be retried, and the CPU may hang with dead loop.

But before the changes in this patchset, the behavior is,
__swap_duplicate() return an error that isn't -ENOMEM, such as -EEXIST.
Then copy_one_pte() would thought the operation has been done
successfully, and go to call set_pte_at().  This will cause the system
state become inconsistent, and the system may panic or hang somewhere
later.

So per my understanding, if we thought page 

Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-27 Thread Daniel Jordan
On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
> >> Daniel Jordan  writes:
> >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> >> >>  /*
> >> >>   * Increase reference count of swap entry by 1.
> >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
> >> >> required
> >> >> - * but could not be atomically allocated.  Returns 0, just as if it 
> >> >> succeeded,
> >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or 
> >> >> -ENOENT), which
> >> >> - * might occur if a page table entry has got corrupted.
> >> >> + *
> >> >> + * Return error code in following case.
> >> >> + * - success -> 0
> >> >> + * - swap_count_continuation is required but could not be atomically 
> >> >> allocated.
> >> >> + *   *entry is used to return swap entry to call 
> >> >> add_swap_count_continuation().
> >> >> + *   
> >> >> -> ENOMEM
> >> >> + * - otherwise same as __swap_duplicate()
> >> >>   */
> >> >> -int swap_duplicate(swp_entry_t entry)
> >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
> >> >>  {
> >> >> int err = 0;
> >> >>  
> >> >> -   while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> >> -   err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> >> +   while (!err &&
> >> >> +  (err = __swap_duplicate(entry, entry_size, 1)) == 
> >> >> -ENOMEM)
> >> >> +   err = add_swap_count_continuation(*entry, GFP_ATOMIC);
> >> >> return err;
> >> >
> >> > Now we're returning any error we get from __swap_duplicate, apparently to
> >> > accommodate ENOTDIR later in the series, which is a change from the 
> >> > behavior
> >> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  
> >> > This might
> >> > belong in a separate patch given its potential for side effects.
> >> 
> >> I have checked all the calls of the function and found there will be no
> >> bad effect.  Do you have any side effect?
> >
> > Before I was just being vaguely concerned about any unintended side effects,
> > but looking again, yes I do.
> >
> > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte 
> > returns
> > a (potentially nonzero) entry.val, which copy_pte_range interprets
> > unconditionally as 'try adding a swap count continuation.'  Not what we want
> > for returns other than -ENOMEM.
> 
> Thanks for pointing this out!  Before the change in the patchset, the
> behavior is,
> 
> Something wrong is detected in swap_duplicate(), but the error is
> ignored.  Then copy_one_pte() will think everything is OK, so that it
> can proceed to call set_pte_at().  The system will be in inconsistent
> state and some data may be polluted!

Yes, the part about page table corruption in the comment above swap_duplicate.

> But this doesn't cause any problem in practical.  Per my understanding,
> because if other part of the kernel works correctly, it's impossible for
> swap_duplicate() return any error except -ENOMEM before the change in
> this patchset.

I agree with that, but it's not what I'm trying to explain.  I didn't go into
enough detail, let me try again.  Hopefully I'm understanding this right.

While running with these patches, say we're at

  copy_pte_range
   copy_one_pte
swap_duplicate
 __swap_duplicate
  __swap_duplicate_locked

And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as
-EEXIST.  That means __swap_duplicate and swap_duplicate also return -EEXIST.
copy_one_pte returns entry.val, which can be and usually is nonzero, so we
break out of the loop in copy_pte_range and then--erroneously--call
add_swap_count_continuation.

The add_swap_count_continuation call was added in 570a335b8e22 and relies on
the assumption that callers can only get -ENOMEM from swap_duplicate.  This
patch changes that assumption.

Not a big deal: the continuation call just returns early, no harm done, but it
allocs and frees a page needlessly, so we should fix it.  One way is to change
copy_one_pte's return to int so we can just pass the error code back to
copy_pte_range so it knows whether to try adding the continuation.

The other swap_duplicate caller, try_to_unmap_one, seems ok.

> But the error may be possible during development, and it
> may serve as some kind of document too.  So I suggest to add
> 
> VM_BUG_ON(error != -ENOMEM);
> 
> in swap_duplicate().  What do you think about that?

That doesn't seem necessary.

> > So it might make sense to have a separate patch that changes 
> > swap_duplicate's
> > return and makes callers handle it.
> 
> Thanks for your help to take a deep look at this.  I want to try to fix
> all potential problems firstly, because the number of the caller is
> quite limited.  Do you agree?

Yes, makes sense to me.

Daniel


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-27 Thread Daniel Jordan
On Thu, Sep 27, 2018 at 09:34:36AM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
> >> Daniel Jordan  writes:
> >> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> >> >>  /*
> >> >>   * Increase reference count of swap entry by 1.
> >> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
> >> >> required
> >> >> - * but could not be atomically allocated.  Returns 0, just as if it 
> >> >> succeeded,
> >> >> - * if __swap_duplicate() fails for another reason (-EINVAL or 
> >> >> -ENOENT), which
> >> >> - * might occur if a page table entry has got corrupted.
> >> >> + *
> >> >> + * Return error code in following case.
> >> >> + * - success -> 0
> >> >> + * - swap_count_continuation is required but could not be atomically 
> >> >> allocated.
> >> >> + *   *entry is used to return swap entry to call 
> >> >> add_swap_count_continuation().
> >> >> + *   
> >> >> -> ENOMEM
> >> >> + * - otherwise same as __swap_duplicate()
> >> >>   */
> >> >> -int swap_duplicate(swp_entry_t entry)
> >> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
> >> >>  {
> >> >> int err = 0;
> >> >>  
> >> >> -   while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> >> -   err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> >> +   while (!err &&
> >> >> +  (err = __swap_duplicate(entry, entry_size, 1)) == 
> >> >> -ENOMEM)
> >> >> +   err = add_swap_count_continuation(*entry, GFP_ATOMIC);
> >> >> return err;
> >> >
> >> > Now we're returning any error we get from __swap_duplicate, apparently to
> >> > accommodate ENOTDIR later in the series, which is a change from the 
> >> > behavior
> >> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  
> >> > This might
> >> > belong in a separate patch given its potential for side effects.
> >> 
> >> I have checked all the calls of the function and found there will be no
> >> bad effect.  Do you have any side effect?
> >
> > Before I was just being vaguely concerned about any unintended side effects,
> > but looking again, yes I do.
> >
> > Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte 
> > returns
> > a (potentially nonzero) entry.val, which copy_pte_range interprets
> > unconditionally as 'try adding a swap count continuation.'  Not what we want
> > for returns other than -ENOMEM.
> 
> Thanks for pointing this out!  Before the change in the patchset, the
> behavior is,
> 
> Something wrong is detected in swap_duplicate(), but the error is
> ignored.  Then copy_one_pte() will think everything is OK, so that it
> can proceed to call set_pte_at().  The system will be in inconsistent
> state and some data may be polluted!

Yes, the part about page table corruption in the comment above swap_duplicate.

> But this doesn't cause any problem in practical.  Per my understanding,
> because if other part of the kernel works correctly, it's impossible for
> swap_duplicate() return any error except -ENOMEM before the change in
> this patchset.

I agree with that, but it's not what I'm trying to explain.  I didn't go into
enough detail, let me try again.  Hopefully I'm understanding this right.

While running with these patches, say we're at

  copy_pte_range
   copy_one_pte
swap_duplicate
 __swap_duplicate
  __swap_duplicate_locked

And say __swap_duplicate_locked returns an error that isn't -ENOMEM, such as
-EEXIST.  That means __swap_duplicate and swap_duplicate also return -EEXIST.
copy_one_pte returns entry.val, which can be and usually is nonzero, so we
break out of the loop in copy_pte_range and then--erroneously--call
add_swap_count_continuation.

The add_swap_count_continuation call was added in 570a335b8e22 and relies on
the assumption that callers can only get -ENOMEM from swap_duplicate.  This
patch changes that assumption.

Not a big deal: the continuation call just returns early, no harm done, but it
allocs and frees a page needlessly, so we should fix it.  One way is to change
copy_one_pte's return to int so we can just pass the error code back to
copy_pte_range so it knows whether to try adding the continuation.

The other swap_duplicate caller, try_to_unmap_one, seems ok.

> But the error may be possible during development, and it
> may serve as some kind of document too.  So I suggest to add
> 
> VM_BUG_ON(error != -ENOMEM);
> 
> in swap_duplicate().  What do you think about that?

That doesn't seem necessary.

> > So it might make sense to have a separate patch that changes 
> > swap_duplicate's
> > return and makes callers handle it.
> 
> Thanks for your help to take a deep look at this.  I want to try to fix
> all potential problems firstly, because the number of the caller is
> quite limited.  Do you agree?

Yes, makes sense to me.

Daniel


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-26 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
>> Daniel Jordan  writes:
>> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
>> >>  /*
>> >>   * Increase reference count of swap entry by 1.
>> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
>> >> required
>> >> - * but could not be atomically allocated.  Returns 0, just as if it 
>> >> succeeded,
>> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), 
>> >> which
>> >> - * might occur if a page table entry has got corrupted.
>> >> + *
>> >> + * Return error code in following case.
>> >> + * - success -> 0
>> >> + * - swap_count_continuation is required but could not be atomically 
>> >> allocated.
>> >> + *   *entry is used to return swap entry to call 
>> >> add_swap_count_continuation().
>> >> + * 
>> >> -> ENOMEM
>> >> + * - otherwise same as __swap_duplicate()
>> >>   */
>> >> -int swap_duplicate(swp_entry_t entry)
>> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>> >>  {
>> >>   int err = 0;
>> >>  
>> >> - while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> >> - err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> >> + while (!err &&
>> >> +(err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
>> >> + err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>> >>   return err;
>> >
>> > Now we're returning any error we get from __swap_duplicate, apparently to
>> > accommodate ENOTDIR later in the series, which is a change from the 
>> > behavior
>> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  This 
>> > might
>> > belong in a separate patch given its potential for side effects.
>> 
>> I have checked all the calls of the function and found there will be no
>> bad effect.  Do you have any side effect?
>
> Before I was just being vaguely concerned about any unintended side effects,
> but looking again, yes I do.
>
> Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns
> a (potentially nonzero) entry.val, which copy_pte_range interprets
> unconditionally as 'try adding a swap count continuation.'  Not what we want
> for returns other than -ENOMEM.

Thanks for pointing this out!  Before the change in the patchset, the
behavior is,

Something wrong is detected in swap_duplicate(), but the error is
ignored.  Then copy_one_pte() will think everything is OK, so that it
can proceed to call set_pte_at().  The system will be in inconsistent
state and some data may be polluted!

But this doesn't cause any problem in practical.  Per my understanding,
because if other part of the kernel works correctly, it's impossible for
swap_duplicate() return any error except -ENOMEM before the change in
this patchset.  But the error may be possible during development, and it
may serve as some kind of document too.  So I suggest to add

VM_BUG_ON(error != -ENOMEM);

in swap_duplicate().  What do you think about that?

> So it might make sense to have a separate patch that changes swap_duplicate's
> return and makes callers handle it.

Thanks for your help to take a deep look at this.  I want to try to fix
all potential problems firstly, because the number of the caller is
quite limited.  Do you agree?

Best Regards,
Huang, Ying


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-26 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
>> Daniel Jordan  writes:
>> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
>> >>  /*
>> >>   * Increase reference count of swap entry by 1.
>> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
>> >> required
>> >> - * but could not be atomically allocated.  Returns 0, just as if it 
>> >> succeeded,
>> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), 
>> >> which
>> >> - * might occur if a page table entry has got corrupted.
>> >> + *
>> >> + * Return error code in following case.
>> >> + * - success -> 0
>> >> + * - swap_count_continuation is required but could not be atomically 
>> >> allocated.
>> >> + *   *entry is used to return swap entry to call 
>> >> add_swap_count_continuation().
>> >> + * 
>> >> -> ENOMEM
>> >> + * - otherwise same as __swap_duplicate()
>> >>   */
>> >> -int swap_duplicate(swp_entry_t entry)
>> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>> >>  {
>> >>   int err = 0;
>> >>  
>> >> - while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> >> - err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> >> + while (!err &&
>> >> +(err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
>> >> + err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>> >>   return err;
>> >
>> > Now we're returning any error we get from __swap_duplicate, apparently to
>> > accommodate ENOTDIR later in the series, which is a change from the 
>> > behavior
>> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  This 
>> > might
>> > belong in a separate patch given its potential for side effects.
>> 
>> I have checked all the calls of the function and found there will be no
>> bad effect.  Do you have any side effect?
>
> Before I was just being vaguely concerned about any unintended side effects,
> but looking again, yes I do.
>
> Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns
> a (potentially nonzero) entry.val, which copy_pte_range interprets
> unconditionally as 'try adding a swap count continuation.'  Not what we want
> for returns other than -ENOMEM.

Thanks for pointing this out!  Before the change in the patchset, the
behavior is,

Something wrong is detected in swap_duplicate(), but the error is
ignored.  Then copy_one_pte() will think everything is OK, so that it
can proceed to call set_pte_at().  The system will be in inconsistent
state and some data may be polluted!

But this doesn't cause any problem in practical.  Per my understanding,
because if other part of the kernel works correctly, it's impossible for
swap_duplicate() return any error except -ENOMEM before the change in
this patchset.  But the error may be possible during development, and it
may serve as some kind of document too.  So I suggest to add

VM_BUG_ON(error != -ENOMEM);

in swap_duplicate().  What do you think about that?

> So it might make sense to have a separate patch that changes swap_duplicate's
> return and makes callers handle it.

Thanks for your help to take a deep look at this.  I want to try to fix
all potential problems firstly, because the number of the caller is
quite limited.  Do you agree?

Best Regards,
Huang, Ying


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-26 Thread Daniel Jordan
On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> >>  /*
> >>   * Increase reference count of swap entry by 1.
> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
> >> required
> >> - * but could not be atomically allocated.  Returns 0, just as if it 
> >> succeeded,
> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), 
> >> which
> >> - * might occur if a page table entry has got corrupted.
> >> + *
> >> + * Return error code in following case.
> >> + * - success -> 0
> >> + * - swap_count_continuation is required but could not be atomically 
> >> allocated.
> >> + *   *entry is used to return swap entry to call 
> >> add_swap_count_continuation().
> >> + *  
> >> -> ENOMEM
> >> + * - otherwise same as __swap_duplicate()
> >>   */
> >> -int swap_duplicate(swp_entry_t entry)
> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
> >>  {
> >>int err = 0;
> >>  
> >> -  while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> -  err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> +  while (!err &&
> >> + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
> >> +  err = add_swap_count_continuation(*entry, GFP_ATOMIC);
> >>return err;
> >
> > Now we're returning any error we get from __swap_duplicate, apparently to
> > accommodate ENOTDIR later in the series, which is a change from the behavior
> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  This 
> > might
> > belong in a separate patch given its potential for side effects.
> 
> I have checked all the calls of the function and found there will be no
> bad effect.  Do you have any side effect?

Before I was just being vaguely concerned about any unintended side effects,
but looking again, yes I do.

Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns
a (potentially nonzero) entry.val, which copy_pte_range interprets
unconditionally as 'try adding a swap count continuation.'  Not what we want
for returns other than -ENOMEM.

So it might make sense to have a separate patch that changes swap_duplicate's
return and makes callers handle it.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-26 Thread Daniel Jordan
On Wed, Sep 26, 2018 at 08:55:59PM +0800, Huang, Ying wrote:
> Daniel Jordan  writes:
> > On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> >>  /*
> >>   * Increase reference count of swap entry by 1.
> >> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
> >> required
> >> - * but could not be atomically allocated.  Returns 0, just as if it 
> >> succeeded,
> >> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), 
> >> which
> >> - * might occur if a page table entry has got corrupted.
> >> + *
> >> + * Return error code in following case.
> >> + * - success -> 0
> >> + * - swap_count_continuation is required but could not be atomically 
> >> allocated.
> >> + *   *entry is used to return swap entry to call 
> >> add_swap_count_continuation().
> >> + *  
> >> -> ENOMEM
> >> + * - otherwise same as __swap_duplicate()
> >>   */
> >> -int swap_duplicate(swp_entry_t entry)
> >> +int swap_duplicate(swp_entry_t *entry, int entry_size)
> >>  {
> >>int err = 0;
> >>  
> >> -  while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> -  err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> +  while (!err &&
> >> + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
> >> +  err = add_swap_count_continuation(*entry, GFP_ATOMIC);
> >>return err;
> >
> > Now we're returning any error we get from __swap_duplicate, apparently to
> > accommodate ENOTDIR later in the series, which is a change from the behavior
> > introduced in 570a335b8e22 ("swap_info: swap count continuations").  This 
> > might
> > belong in a separate patch given its potential for side effects.
> 
> I have checked all the calls of the function and found there will be no
> bad effect.  Do you have any side effect?

Before I was just being vaguely concerned about any unintended side effects,
but looking again, yes I do.

Now when swap_duplicate returns an error in copy_one_pte, copy_one_pte returns
a (potentially nonzero) entry.val, which copy_pte_range interprets
unconditionally as 'try adding a swap count continuation.'  Not what we want
for returns other than -ENOMEM.

So it might make sense to have a separate patch that changes swap_duplicate's
return and makes callers handle it.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-26 Thread Huang, Ying
Daniel Jordan  writes:

> On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
>> @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct 
>> swap_info_struct *p,
>>  }
>>  
>>  /*
>> - * Verify that a swap entry is valid and increment its swap map count.
>> + * Verify that the swap entries from *entry is valid and increment their
>> + * PMD/PTE swap mapping count.
>>   *
>>   * Returns error code in following case.
>>   * - success -> 0
>>   * - swp_entry is invalid -> EINVAL
>> - * - swp_entry is migration entry -> EINVAL
>
> I'm assuming it wasn't possible to hit this error before this patch, and 
> you're
> just removing it now since you're in the area?

Yes.

>>   * - swap-cache reference is requested but there is already one. -> EEXIST
>>   * - swap-cache reference is requested but the entry is not used. -> ENOENT
>>   * - swap-mapped reference requested but needs continued swap count. -> 
>> ENOMEM
>> + * - the huge swap cluster has been split. -> ENOTDIR
>
> Strangely intuitive choice of error code :)

Thanks!  It doesn't match the error exactly, but I have no better choice
now.  Matthew Wilcox have suggested to use an swap specific enum
instead.  I think that is good in general, but we need only one extra
error code, and we need to change the interface of several swap
functions.  So I think that should be in a separate patchset if
necessary.

>>  /*
>>   * Increase reference count of swap entry by 1.
>> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
>> required
>> - * but could not be atomically allocated.  Returns 0, just as if it 
>> succeeded,
>> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), 
>> which
>> - * might occur if a page table entry has got corrupted.
>> + *
>> + * Return error code in following case.
>> + * - success -> 0
>> + * - swap_count_continuation is required but could not be atomically 
>> allocated.
>> + *   *entry is used to return swap entry to call 
>> add_swap_count_continuation().
>> + *-> ENOMEM
>> + * - otherwise same as __swap_duplicate()
>>   */
>> -int swap_duplicate(swp_entry_t entry)
>> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>>  {
>>  int err = 0;
>>  
>> -while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> -err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> +while (!err &&
>> +   (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
>> +err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>>  return err;
>
> Now we're returning any error we get from __swap_duplicate, apparently to
> accommodate ENOTDIR later in the series, which is a change from the behavior
> introduced in 570a335b8e22 ("swap_info: swap count continuations").  This 
> might
> belong in a separate patch given its potential for side effects.

I have checked all the calls of the function and found there will be no
bad effect.  Do you have any side effect?

> Although, I don't understand why 570a335b8e22 ignored errors other than 
> -ENOMEM
> when both swap_duplicate callers _seem_ from a quick read to be able to 
> respond
> gracefully to any error.

Before 570a335b8e22, all errors are ignored in swap_duplicate() (its
type is void).  If my understanding were correct, all errors except
-ENOMEM are impossible before changes in this patchset.  So they are
ignored.

Best Regards,
Huang, Ying


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-26 Thread Huang, Ying
Daniel Jordan  writes:

> On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
>> @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct 
>> swap_info_struct *p,
>>  }
>>  
>>  /*
>> - * Verify that a swap entry is valid and increment its swap map count.
>> + * Verify that the swap entries from *entry is valid and increment their
>> + * PMD/PTE swap mapping count.
>>   *
>>   * Returns error code in following case.
>>   * - success -> 0
>>   * - swp_entry is invalid -> EINVAL
>> - * - swp_entry is migration entry -> EINVAL
>
> I'm assuming it wasn't possible to hit this error before this patch, and 
> you're
> just removing it now since you're in the area?

Yes.

>>   * - swap-cache reference is requested but there is already one. -> EEXIST
>>   * - swap-cache reference is requested but the entry is not used. -> ENOENT
>>   * - swap-mapped reference requested but needs continued swap count. -> 
>> ENOMEM
>> + * - the huge swap cluster has been split. -> ENOTDIR
>
> Strangely intuitive choice of error code :)

Thanks!  It doesn't match the error exactly, but I have no better choice
now.  Matthew Wilcox have suggested to use an swap specific enum
instead.  I think that is good in general, but we need only one extra
error code, and we need to change the interface of several swap
functions.  So I think that should be in a separate patchset if
necessary.

>>  /*
>>   * Increase reference count of swap entry by 1.
>> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is 
>> required
>> - * but could not be atomically allocated.  Returns 0, just as if it 
>> succeeded,
>> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), 
>> which
>> - * might occur if a page table entry has got corrupted.
>> + *
>> + * Return error code in following case.
>> + * - success -> 0
>> + * - swap_count_continuation is required but could not be atomically 
>> allocated.
>> + *   *entry is used to return swap entry to call 
>> add_swap_count_continuation().
>> + *-> ENOMEM
>> + * - otherwise same as __swap_duplicate()
>>   */
>> -int swap_duplicate(swp_entry_t entry)
>> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>>  {
>>  int err = 0;
>>  
>> -while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> -err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> +while (!err &&
>> +   (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
>> +err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>>  return err;
>
> Now we're returning any error we get from __swap_duplicate, apparently to
> accommodate ENOTDIR later in the series, which is a change from the behavior
> introduced in 570a335b8e22 ("swap_info: swap count continuations").  This 
> might
> belong in a separate patch given its potential for side effects.

I have checked all the calls of the function and found there will be no
bad effect.  Do you have any side effect?

> Although, I don't understand why 570a335b8e22 ignored errors other than 
> -ENOMEM
> when both swap_duplicate callers _seem_ from a quick read to be able to 
> respond
> gracefully to any error.

Before 570a335b8e22, all errors are ignored in swap_duplicate() (its
type is void).  If my understanding were correct, all errors except
-ENOMEM are impossible before changes in this patchset.  So they are
ignored.

Best Regards,
Huang, Ying


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-25 Thread Daniel Jordan
On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct 
> swap_info_struct *p,
>  }
>  
>  /*
> - * Verify that a swap entry is valid and increment its swap map count.
> + * Verify that the swap entries from *entry is valid and increment their
> + * PMD/PTE swap mapping count.
>   *
>   * Returns error code in following case.
>   * - success -> 0
>   * - swp_entry is invalid -> EINVAL
> - * - swp_entry is migration entry -> EINVAL

I'm assuming it wasn't possible to hit this error before this patch, and you're
just removing it now since you're in the area?

>   * - swap-cache reference is requested but there is already one. -> EEXIST
>   * - swap-cache reference is requested but the entry is not used. -> ENOENT
>   * - swap-mapped reference requested but needs continued swap count. -> 
> ENOMEM
> + * - the huge swap cluster has been split. -> ENOTDIR

Strangely intuitive choice of error code :)

>  /*
>   * Increase reference count of swap entry by 1.
> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
> - * but could not be atomically allocated.  Returns 0, just as if it 
> succeeded,
> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
> - * might occur if a page table entry has got corrupted.
> + *
> + * Return error code in following case.
> + * - success -> 0
> + * - swap_count_continuation is required but could not be atomically 
> allocated.
> + *   *entry is used to return swap entry to call 
> add_swap_count_continuation().
> + * -> ENOMEM
> + * - otherwise same as __swap_duplicate()
>   */
> -int swap_duplicate(swp_entry_t entry)
> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>  {
>   int err = 0;
>  
> - while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> - err = add_swap_count_continuation(entry, GFP_ATOMIC);
> + while (!err &&
> +(err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
> + err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>   return err;

Now we're returning any error we get from __swap_duplicate, apparently to
accommodate ENOTDIR later in the series, which is a change from the behavior
introduced in 570a335b8e22 ("swap_info: swap count continuations").  This might
belong in a separate patch given its potential for side effects.

Although, I don't understand why 570a335b8e22 ignored errors other than -ENOMEM
when both swap_duplicate callers _seem_ from a quick read to be able to respond
gracefully to any error.


Re: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate()

2018-09-25 Thread Daniel Jordan
On Tue, Sep 25, 2018 at 03:13:30PM +0800, Huang Ying wrote:
> @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct 
> swap_info_struct *p,
>  }
>  
>  /*
> - * Verify that a swap entry is valid and increment its swap map count.
> + * Verify that the swap entries from *entry is valid and increment their
> + * PMD/PTE swap mapping count.
>   *
>   * Returns error code in following case.
>   * - success -> 0
>   * - swp_entry is invalid -> EINVAL
> - * - swp_entry is migration entry -> EINVAL

I'm assuming it wasn't possible to hit this error before this patch, and you're
just removing it now since you're in the area?

>   * - swap-cache reference is requested but there is already one. -> EEXIST
>   * - swap-cache reference is requested but the entry is not used. -> ENOENT
>   * - swap-mapped reference requested but needs continued swap count. -> 
> ENOMEM
> + * - the huge swap cluster has been split. -> ENOTDIR

Strangely intuitive choice of error code :)

>  /*
>   * Increase reference count of swap entry by 1.
> - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
> - * but could not be atomically allocated.  Returns 0, just as if it 
> succeeded,
> - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
> - * might occur if a page table entry has got corrupted.
> + *
> + * Return error code in following case.
> + * - success -> 0
> + * - swap_count_continuation is required but could not be atomically 
> allocated.
> + *   *entry is used to return swap entry to call 
> add_swap_count_continuation().
> + * -> ENOMEM
> + * - otherwise same as __swap_duplicate()
>   */
> -int swap_duplicate(swp_entry_t entry)
> +int swap_duplicate(swp_entry_t *entry, int entry_size)
>  {
>   int err = 0;
>  
> - while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> - err = add_swap_count_continuation(entry, GFP_ATOMIC);
> + while (!err &&
> +(err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM)
> + err = add_swap_count_continuation(*entry, GFP_ATOMIC);
>   return err;

Now we're returning any error we get from __swap_duplicate, apparently to
accommodate ENOTDIR later in the series, which is a change from the behavior
introduced in 570a335b8e22 ("swap_info: swap count continuations").  This might
belong in a separate patch given its potential for side effects.

Although, I don't understand why 570a335b8e22 ignored errors other than -ENOMEM
when both swap_duplicate callers _seem_ from a quick read to be able to respond
gracefully to any error.