Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-06-03 Thread Alexander Aring
Hi,

On Sat, Jun 3, 2023 at 6:03 AM Andreas Gruenbacher  wrote:
>
> On Thu, Jun 1, 2023 at 9:10 PM Alexander Aring  wrote:
> > Hi,
> >
> > On Thu, Jun 1, 2023 at 1:11 PM Andreas Gruenbacher  
> > wrote:
> > >
> > > On Thu, Jun 1, 2023 at 6:28 PM Alexander Aring  
> > > wrote:
> > > > Hi,
> > > >
> > > > On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher 
> > > >  wrote:
> > > > >
> > > > > On Tue, May 30, 2023 at 4:08 PM Alexander Aring  
> > > > > wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring 
> > > > > > >  wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring 
> > > > > > > > >  wrote:
> > > > > > > > > > This patch fixes a possible plock op collisions when using 
> > > > > > > > > > F_SETLKW lock
> > > > > > > > > > requests and fsid, number and owner are not enough to 
> > > > > > > > > > identify a result
> > > > > > > > > > for a pending request. The ltp testcases [0] and [1] are 
> > > > > > > > > > examples when
> > > > > > > > > > this is not enough in case of using classic posix locks 
> > > > > > > > > > with threads and
> > > > > > > > > > open filedescriptor posix locks.
> > > > > > > > > >
> > > > > > > > > > The idea to fix the issue here is to place all lock request 
> > > > > > > > > > in order. In
> > > > > > > > > > case of non F_SETLKW lock request (indicated if wait is set 
> > > > > > > > > > or not) the
> > > > > > > > > > lock requests are ordered inside the recv_list. If a result 
> > > > > > > > > > comes back
> > > > > > > > > > the right plock op can be found by the first plock_op in 
> > > > > > > > > > recv_list which
> > > > > > > > > > has not info.wait set. This can be done only by non 
> > > > > > > > > > F_SETLKW plock ops as
> > > > > > > > > > dlm_controld always reads a specific plock op 
> > > > > > > > > > (list_move_tail() from
> > > > > > > > > > send_list to recv_mlist) and write the result immediately 
> > > > > > > > > > back.
> > > > > > > > > >
> > > > > > > > > > This behaviour is for F_SETLKW not possible as multiple 
> > > > > > > > > > waiters can be
> > > > > > > > > > get a result back in an random order. To avoid a collisions 
> > > > > > > > > > in cases
> > > > > > > > > > like [0] or [1] this patch adds more fields to compare the 
> > > > > > > > > > plock
> > > > > > > > > > operations as the lock request is the same. This is also 
> > > > > > > > > > being made in
> > > > > > > > > > NFS to find an result for an asynchronous F_SETLKW lock 
> > > > > > > > > > request [2][3]. We
> > > > > > > > > > still can't find the exact lock request for a specific 
> > > > > > > > > > result if the
> > > > > > > > > > lock request is the same, but if this is the case we don't 
> > > > > > > > > > care the
> > > > > > > > > > order how the identical lock requests get their result back 
> > > > > > > > > > to grant the
> > > > > > > > > > lock.
> > > > > > > > >
> > > > > > > > > When the recv_list contains multiple indistinguishable 
> > > > > > > > > requests, this
> > > > > > > > > can only be because they originated from multiple threads of 
> > > > > > > > > the same
> > > > > > > > > process. In that case, I agree that it doesn't matter which 
> > > > > > > > > of those
> > > > > > > > > requests we "complete" in dev_write() as long as we only 
> > > > > > > > > complete one
> > > > > > > > > request. We do need to compare the additional request fields 
> > > > > > > > > in
> > > > > > > > > dev_write() to find a suitable request, so that makes sense 
> > > > > > > > > as well.
> > > > > > > > > We need to compare all of the fields that identify a request 
> > > > > > > > > (optype,
> > > > > > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to 
> > > > > > > > > find the
> > > > > > > > > "right" request (or in case there is more than one identical 
> > > > > > > > > request,
> > > > > > > > > a "suitable" request).
> > > > > > > > >
> > > > > > > >
> > > > > > > > In my "definition" why this works is as you said the "identical
> > > > > > > > request". There is a more deeper definition of "when is a 
> > > > > > > > request
> > > > > > > > identical" and in my opinion it is here as: "A request A is 
> > > > > > > > identical
> > > > > > > > to request B when they get granted under the same 'time'" which 
> > > > > > > > is all
> > > > > > > > the fields you mentioned.
> > > > > > > >
> > > > > > > > Even with cancellation (F_SETLKW only) it does not matter which
> > > > > > > > "identical request" you cancel because the kernel and user
> > > > > > > > (dlm_controld) makes no relation between a lock request 
> > > > > > > > instance. You
> > > > > > > > need to have at least the same amount of "results" coming back 
> > > > > > >

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-06-03 Thread Andreas Gruenbacher
On Thu, Jun 1, 2023 at 9:10 PM Alexander Aring  wrote:
> Hi,
>
> On Thu, Jun 1, 2023 at 1:11 PM Andreas Gruenbacher  
> wrote:
> >
> > On Thu, Jun 1, 2023 at 6:28 PM Alexander Aring  wrote:
> > > Hi,
> > >
> > > On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher  
> > > wrote:
> > > >
> > > > On Tue, May 30, 2023 at 4:08 PM Alexander Aring  
> > > > wrote:
> > > > > Hi,
> > > > >
> > > > > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring 
> > > > > >  wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring 
> > > > > > > >  wrote:
> > > > > > > > > This patch fixes a possible plock op collisions when using 
> > > > > > > > > F_SETLKW lock
> > > > > > > > > requests and fsid, number and owner are not enough to 
> > > > > > > > > identify a result
> > > > > > > > > for a pending request. The ltp testcases [0] and [1] are 
> > > > > > > > > examples when
> > > > > > > > > this is not enough in case of using classic posix locks with 
> > > > > > > > > threads and
> > > > > > > > > open filedescriptor posix locks.
> > > > > > > > >
> > > > > > > > > The idea to fix the issue here is to place all lock request 
> > > > > > > > > in order. In
> > > > > > > > > case of non F_SETLKW lock request (indicated if wait is set 
> > > > > > > > > or not) the
> > > > > > > > > lock requests are ordered inside the recv_list. If a result 
> > > > > > > > > comes back
> > > > > > > > > the right plock op can be found by the first plock_op in 
> > > > > > > > > recv_list which
> > > > > > > > > has not info.wait set. This can be done only by non F_SETLKW 
> > > > > > > > > plock ops as
> > > > > > > > > dlm_controld always reads a specific plock op 
> > > > > > > > > (list_move_tail() from
> > > > > > > > > send_list to recv_mlist) and write the result immediately 
> > > > > > > > > back.
> > > > > > > > >
> > > > > > > > > This behaviour is for F_SETLKW not possible as multiple 
> > > > > > > > > waiters can be
> > > > > > > > > get a result back in an random order. To avoid a collisions 
> > > > > > > > > in cases
> > > > > > > > > like [0] or [1] this patch adds more fields to compare the 
> > > > > > > > > plock
> > > > > > > > > operations as the lock request is the same. This is also 
> > > > > > > > > being made in
> > > > > > > > > NFS to find an result for an asynchronous F_SETLKW lock 
> > > > > > > > > request [2][3]. We
> > > > > > > > > still can't find the exact lock request for a specific result 
> > > > > > > > > if the
> > > > > > > > > lock request is the same, but if this is the case we don't 
> > > > > > > > > care the
> > > > > > > > > order how the identical lock requests get their result back 
> > > > > > > > > to grant the
> > > > > > > > > lock.
> > > > > > > >
> > > > > > > > When the recv_list contains multiple indistinguishable 
> > > > > > > > requests, this
> > > > > > > > can only be because they originated from multiple threads of 
> > > > > > > > the same
> > > > > > > > process. In that case, I agree that it doesn't matter which of 
> > > > > > > > those
> > > > > > > > requests we "complete" in dev_write() as long as we only 
> > > > > > > > complete one
> > > > > > > > request. We do need to compare the additional request fields in
> > > > > > > > dev_write() to find a suitable request, so that makes sense as 
> > > > > > > > well.
> > > > > > > > We need to compare all of the fields that identify a request 
> > > > > > > > (optype,
> > > > > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find 
> > > > > > > > the
> > > > > > > > "right" request (or in case there is more than one identical 
> > > > > > > > request,
> > > > > > > > a "suitable" request).
> > > > > > > >
> > > > > > >
> > > > > > > In my "definition" why this works is as you said the "identical
> > > > > > > request". There is a more deeper definition of "when is a request
> > > > > > > identical" and in my opinion it is here as: "A request A is 
> > > > > > > identical
> > > > > > > to request B when they get granted under the same 'time'" which 
> > > > > > > is all
> > > > > > > the fields you mentioned.
> > > > > > >
> > > > > > > Even with cancellation (F_SETLKW only) it does not matter which
> > > > > > > "identical request" you cancel because the kernel and user
> > > > > > > (dlm_controld) makes no relation between a lock request instance. 
> > > > > > > You
> > > > > > > need to have at least the same amount of "results" coming back 
> > > > > > > from
> > > > > > > user space as the amount you are waiting for a result for the same
> > > > > > > "identical request".
> > > > > >
> > > > > > That's not incorrect per se, but cancellations create an additional
> > > > > > difficulty: they can either succeed or fail. To indicate that a
> > > > > > cancellation ha

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-06-01 Thread Alexander Aring
Hi,

On Thu, Jun 1, 2023 at 1:11 PM Andreas Gruenbacher  wrote:
>
> On Thu, Jun 1, 2023 at 6:28 PM Alexander Aring  wrote:
> > Hi,
> >
> > On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher  
> > wrote:
> > >
> > > On Tue, May 30, 2023 at 4:08 PM Alexander Aring  
> > > wrote:
> > > > Hi,
> > > >
> > > > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher 
> > > >  wrote:
> > > > >
> > > > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring 
> > > > >  wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring 
> > > > > > >  wrote:
> > > > > > > > This patch fixes a possible plock op collisions when using 
> > > > > > > > F_SETLKW lock
> > > > > > > > requests and fsid, number and owner are not enough to identify 
> > > > > > > > a result
> > > > > > > > for a pending request. The ltp testcases [0] and [1] are 
> > > > > > > > examples when
> > > > > > > > this is not enough in case of using classic posix locks with 
> > > > > > > > threads and
> > > > > > > > open filedescriptor posix locks.
> > > > > > > >
> > > > > > > > The idea to fix the issue here is to place all lock request in 
> > > > > > > > order. In
> > > > > > > > case of non F_SETLKW lock request (indicated if wait is set or 
> > > > > > > > not) the
> > > > > > > > lock requests are ordered inside the recv_list. If a result 
> > > > > > > > comes back
> > > > > > > > the right plock op can be found by the first plock_op in 
> > > > > > > > recv_list which
> > > > > > > > has not info.wait set. This can be done only by non F_SETLKW 
> > > > > > > > plock ops as
> > > > > > > > dlm_controld always reads a specific plock op (list_move_tail() 
> > > > > > > > from
> > > > > > > > send_list to recv_mlist) and write the result immediately back.
> > > > > > > >
> > > > > > > > This behaviour is for F_SETLKW not possible as multiple waiters 
> > > > > > > > can be
> > > > > > > > get a result back in an random order. To avoid a collisions in 
> > > > > > > > cases
> > > > > > > > like [0] or [1] this patch adds more fields to compare the plock
> > > > > > > > operations as the lock request is the same. This is also being 
> > > > > > > > made in
> > > > > > > > NFS to find an result for an asynchronous F_SETLKW lock request 
> > > > > > > > [2][3]. We
> > > > > > > > still can't find the exact lock request for a specific result 
> > > > > > > > if the
> > > > > > > > lock request is the same, but if this is the case we don't care 
> > > > > > > > the
> > > > > > > > order how the identical lock requests get their result back to 
> > > > > > > > grant the
> > > > > > > > lock.
> > > > > > >
> > > > > > > When the recv_list contains multiple indistinguishable requests, 
> > > > > > > this
> > > > > > > can only be because they originated from multiple threads of the 
> > > > > > > same
> > > > > > > process. In that case, I agree that it doesn't matter which of 
> > > > > > > those
> > > > > > > requests we "complete" in dev_write() as long as we only complete 
> > > > > > > one
> > > > > > > request. We do need to compare the additional request fields in
> > > > > > > dev_write() to find a suitable request, so that makes sense as 
> > > > > > > well.
> > > > > > > We need to compare all of the fields that identify a request 
> > > > > > > (optype,
> > > > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find 
> > > > > > > the
> > > > > > > "right" request (or in case there is more than one identical 
> > > > > > > request,
> > > > > > > a "suitable" request).
> > > > > > >
> > > > > >
> > > > > > In my "definition" why this works is as you said the "identical
> > > > > > request". There is a more deeper definition of "when is a request
> > > > > > identical" and in my opinion it is here as: "A request A is 
> > > > > > identical
> > > > > > to request B when they get granted under the same 'time'" which is 
> > > > > > all
> > > > > > the fields you mentioned.
> > > > > >
> > > > > > Even with cancellation (F_SETLKW only) it does not matter which
> > > > > > "identical request" you cancel because the kernel and user
> > > > > > (dlm_controld) makes no relation between a lock request instance. 
> > > > > > You
> > > > > > need to have at least the same amount of "results" coming back from
> > > > > > user space as the amount you are waiting for a result for the same
> > > > > > "identical request".
> > > > >
> > > > > That's not incorrect per se, but cancellations create an additional
> > > > > difficulty: they can either succeed or fail. To indicate that a
> > > > > cancellation has succeeded, a new type of message can be introduced
> > > > > (say, "CANCELLED"), and it's obvious that a CANCELLED message can only
> > > > > belong to a locking request that is being cancelled. When cancelling a
> > > > > locking request fails, the kernel will see a "locking request granted"
> > > > > message though, 

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-06-01 Thread Andreas Gruenbacher
On Thu, Jun 1, 2023 at 6:28 PM Alexander Aring  wrote:
> Hi,
>
> On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher  
> wrote:
> >
> > On Tue, May 30, 2023 at 4:08 PM Alexander Aring  wrote:
> > > Hi,
> > >
> > > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher  
> > > wrote:
> > > >
> > > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring  
> > > > wrote:
> > > > > Hi,
> > > > >
> > > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring 
> > > > > >  wrote:
> > > > > > > This patch fixes a possible plock op collisions when using 
> > > > > > > F_SETLKW lock
> > > > > > > requests and fsid, number and owner are not enough to identify a 
> > > > > > > result
> > > > > > > for a pending request. The ltp testcases [0] and [1] are examples 
> > > > > > > when
> > > > > > > this is not enough in case of using classic posix locks with 
> > > > > > > threads and
> > > > > > > open filedescriptor posix locks.
> > > > > > >
> > > > > > > The idea to fix the issue here is to place all lock request in 
> > > > > > > order. In
> > > > > > > case of non F_SETLKW lock request (indicated if wait is set or 
> > > > > > > not) the
> > > > > > > lock requests are ordered inside the recv_list. If a result comes 
> > > > > > > back
> > > > > > > the right plock op can be found by the first plock_op in 
> > > > > > > recv_list which
> > > > > > > has not info.wait set. This can be done only by non F_SETLKW 
> > > > > > > plock ops as
> > > > > > > dlm_controld always reads a specific plock op (list_move_tail() 
> > > > > > > from
> > > > > > > send_list to recv_mlist) and write the result immediately back.
> > > > > > >
> > > > > > > This behaviour is for F_SETLKW not possible as multiple waiters 
> > > > > > > can be
> > > > > > > get a result back in an random order. To avoid a collisions in 
> > > > > > > cases
> > > > > > > like [0] or [1] this patch adds more fields to compare the plock
> > > > > > > operations as the lock request is the same. This is also being 
> > > > > > > made in
> > > > > > > NFS to find an result for an asynchronous F_SETLKW lock request 
> > > > > > > [2][3]. We
> > > > > > > still can't find the exact lock request for a specific result if 
> > > > > > > the
> > > > > > > lock request is the same, but if this is the case we don't care 
> > > > > > > the
> > > > > > > order how the identical lock requests get their result back to 
> > > > > > > grant the
> > > > > > > lock.
> > > > > >
> > > > > > When the recv_list contains multiple indistinguishable requests, 
> > > > > > this
> > > > > > can only be because they originated from multiple threads of the 
> > > > > > same
> > > > > > process. In that case, I agree that it doesn't matter which of those
> > > > > > requests we "complete" in dev_write() as long as we only complete 
> > > > > > one
> > > > > > request. We do need to compare the additional request fields in
> > > > > > dev_write() to find a suitable request, so that makes sense as well.
> > > > > > We need to compare all of the fields that identify a request 
> > > > > > (optype,
> > > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> > > > > > "right" request (or in case there is more than one identical 
> > > > > > request,
> > > > > > a "suitable" request).
> > > > > >
> > > > >
> > > > > In my "definition" why this works is as you said the "identical
> > > > > request". There is a more deeper definition of "when is a request
> > > > > identical" and in my opinion it is here as: "A request A is identical
> > > > > to request B when they get granted under the same 'time'" which is all
> > > > > the fields you mentioned.
> > > > >
> > > > > Even with cancellation (F_SETLKW only) it does not matter which
> > > > > "identical request" you cancel because the kernel and user
> > > > > (dlm_controld) makes no relation between a lock request instance. You
> > > > > need to have at least the same amount of "results" coming back from
> > > > > user space as the amount you are waiting for a result for the same
> > > > > "identical request".
> > > >
> > > > That's not incorrect per se, but cancellations create an additional
> > > > difficulty: they can either succeed or fail. To indicate that a
> > > > cancellation has succeeded, a new type of message can be introduced
> > > > (say, "CANCELLED"), and it's obvious that a CANCELLED message can only
> > > > belong to a locking request that is being cancelled. When cancelling a
> > > > locking request fails, the kernel will see a "locking request granted"
> > > > message though, and when multiple identical locking requests are
> > > > queued and only some of them have been cancelled, it won't be obvious
> > > > which locking request a "locking request granted" message should be
> > > > assigned to anymore. You really don't want to mix things up in that
> > > > case.
> > > >
> > > > This complication makes it a whole lot more di

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-06-01 Thread Alexander Aring
Hi,


On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher  wrote:
>
> On Tue, May 30, 2023 at 4:08 PM Alexander Aring  wrote:
> > Hi,
> >
> > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher  
> > wrote:
> > >
> > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring  
> > > wrote:
> > > > Hi,
> > > >
> > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> > > >  wrote:
> > > > >
> > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring  
> > > > > wrote:
> > > > > > This patch fixes a possible plock op collisions when using F_SETLKW 
> > > > > > lock
> > > > > > requests and fsid, number and owner are not enough to identify a 
> > > > > > result
> > > > > > for a pending request. The ltp testcases [0] and [1] are examples 
> > > > > > when
> > > > > > this is not enough in case of using classic posix locks with 
> > > > > > threads and
> > > > > > open filedescriptor posix locks.
> > > > > >
> > > > > > The idea to fix the issue here is to place all lock request in 
> > > > > > order. In
> > > > > > case of non F_SETLKW lock request (indicated if wait is set or not) 
> > > > > > the
> > > > > > lock requests are ordered inside the recv_list. If a result comes 
> > > > > > back
> > > > > > the right plock op can be found by the first plock_op in recv_list 
> > > > > > which
> > > > > > has not info.wait set. This can be done only by non F_SETLKW plock 
> > > > > > ops as
> > > > > > dlm_controld always reads a specific plock op (list_move_tail() from
> > > > > > send_list to recv_mlist) and write the result immediately back.
> > > > > >
> > > > > > This behaviour is for F_SETLKW not possible as multiple waiters can 
> > > > > > be
> > > > > > get a result back in an random order. To avoid a collisions in cases
> > > > > > like [0] or [1] this patch adds more fields to compare the plock
> > > > > > operations as the lock request is the same. This is also being made 
> > > > > > in
> > > > > > NFS to find an result for an asynchronous F_SETLKW lock request 
> > > > > > [2][3]. We
> > > > > > still can't find the exact lock request for a specific result if the
> > > > > > lock request is the same, but if this is the case we don't care the
> > > > > > order how the identical lock requests get their result back to 
> > > > > > grant the
> > > > > > lock.
> > > > >
> > > > > When the recv_list contains multiple indistinguishable requests, this
> > > > > can only be because they originated from multiple threads of the same
> > > > > process. In that case, I agree that it doesn't matter which of those
> > > > > requests we "complete" in dev_write() as long as we only complete one
> > > > > request. We do need to compare the additional request fields in
> > > > > dev_write() to find a suitable request, so that makes sense as well.
> > > > > We need to compare all of the fields that identify a request (optype,
> > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> > > > > "right" request (or in case there is more than one identical request,
> > > > > a "suitable" request).
> > > > >
> > > >
> > > > In my "definition" why this works is as you said the "identical
> > > > request". There is a more deeper definition of "when is a request
> > > > identical" and in my opinion it is here as: "A request A is identical
> > > > to request B when they get granted under the same 'time'" which is all
> > > > the fields you mentioned.
> > > >
> > > > Even with cancellation (F_SETLKW only) it does not matter which
> > > > "identical request" you cancel because the kernel and user
> > > > (dlm_controld) makes no relation between a lock request instance. You
> > > > need to have at least the same amount of "results" coming back from
> > > > user space as the amount you are waiting for a result for the same
> > > > "identical request".
> > >
> > > That's not incorrect per se, but cancellations create an additional
> > > difficulty: they can either succeed or fail. To indicate that a
> > > cancellation has succeeded, a new type of message can be introduced
> > > (say, "CANCELLED"), and it's obvious that a CANCELLED message can only
> > > belong to a locking request that is being cancelled. When cancelling a
> > > locking request fails, the kernel will see a "locking request granted"
> > > message though, and when multiple identical locking requests are
> > > queued and only some of them have been cancelled, it won't be obvious
> > > which locking request a "locking request granted" message should be
> > > assigned to anymore. You really don't want to mix things up in that
> > > case.
> > >
> > > This complication makes it a whole lot more difficult to reason about
> > > the correctness of the code. All that complexity is avoidable by
> > > sticking with a fixed mapping of requests and replies (i.e., a unique
> > > request identifier).
> > >
> > > To put it differently, you can shoot yourself in the foot and still
> > > hop along on the other leg, but it may not be the best of all possible
> > > ideas.
>

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-30 Thread Andreas Gruenbacher
On Tue, May 30, 2023 at 4:08 PM Alexander Aring  wrote:
> Hi,
>
> On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher  
> wrote:
> >
> > On Tue, May 30, 2023 at 12:19 AM Alexander Aring  
> > wrote:
> > > Hi,
> > >
> > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> > >  wrote:
> > > >
> > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring  
> > > > wrote:
> > > > > This patch fixes a possible plock op collisions when using F_SETLKW 
> > > > > lock
> > > > > requests and fsid, number and owner are not enough to identify a 
> > > > > result
> > > > > for a pending request. The ltp testcases [0] and [1] are examples when
> > > > > this is not enough in case of using classic posix locks with threads 
> > > > > and
> > > > > open filedescriptor posix locks.
> > > > >
> > > > > The idea to fix the issue here is to place all lock request in order. 
> > > > > In
> > > > > case of non F_SETLKW lock request (indicated if wait is set or not) 
> > > > > the
> > > > > lock requests are ordered inside the recv_list. If a result comes back
> > > > > the right plock op can be found by the first plock_op in recv_list 
> > > > > which
> > > > > has not info.wait set. This can be done only by non F_SETLKW plock 
> > > > > ops as
> > > > > dlm_controld always reads a specific plock op (list_move_tail() from
> > > > > send_list to recv_mlist) and write the result immediately back.
> > > > >
> > > > > This behaviour is for F_SETLKW not possible as multiple waiters can be
> > > > > get a result back in an random order. To avoid a collisions in cases
> > > > > like [0] or [1] this patch adds more fields to compare the plock
> > > > > operations as the lock request is the same. This is also being made in
> > > > > NFS to find an result for an asynchronous F_SETLKW lock request 
> > > > > [2][3]. We
> > > > > still can't find the exact lock request for a specific result if the
> > > > > lock request is the same, but if this is the case we don't care the
> > > > > order how the identical lock requests get their result back to grant 
> > > > > the
> > > > > lock.
> > > >
> > > > When the recv_list contains multiple indistinguishable requests, this
> > > > can only be because they originated from multiple threads of the same
> > > > process. In that case, I agree that it doesn't matter which of those
> > > > requests we "complete" in dev_write() as long as we only complete one
> > > > request. We do need to compare the additional request fields in
> > > > dev_write() to find a suitable request, so that makes sense as well.
> > > > We need to compare all of the fields that identify a request (optype,
> > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> > > > "right" request (or in case there is more than one identical request,
> > > > a "suitable" request).
> > > >
> > >
> > > In my "definition" why this works is as you said the "identical
> > > request". There is a more deeper definition of "when is a request
> > > identical" and in my opinion it is here as: "A request A is identical
> > > to request B when they get granted under the same 'time'" which is all
> > > the fields you mentioned.
> > >
> > > Even with cancellation (F_SETLKW only) it does not matter which
> > > "identical request" you cancel because the kernel and user
> > > (dlm_controld) makes no relation between a lock request instance. You
> > > need to have at least the same amount of "results" coming back from
> > > user space as the amount you are waiting for a result for the same
> > > "identical request".
> >
> > That's not incorrect per se, but cancellations create an additional
> > difficulty: they can either succeed or fail. To indicate that a
> > cancellation has succeeded, a new type of message can be introduced
> > (say, "CANCELLED"), and it's obvious that a CANCELLED message can only
> > belong to a locking request that is being cancelled. When cancelling a
> > locking request fails, the kernel will see a "locking request granted"
> > message though, and when multiple identical locking requests are
> > queued and only some of them have been cancelled, it won't be obvious
> > which locking request a "locking request granted" message should be
> > assigned to anymore. You really don't want to mix things up in that
> > case.
> >
> > This complication makes it a whole lot more difficult to reason about
> > the correctness of the code. All that complexity is avoidable by
> > sticking with a fixed mapping of requests and replies (i.e., a unique
> > request identifier).
> >
> > To put it differently, you can shoot yourself in the foot and still
> > hop along on the other leg, but it may not be the best of all possible
> > ideas.
> >
>
> It makes things more complicated, I agree and the reason why this
> works now is because there are a lot of "dependencies". I would love
> to have an unique identifier to make it possible that we can follow an
> instance handle of the original lock request.
>
> * an unique identifier which also wor

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-30 Thread Alexander Aring
Hi,

On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher  wrote:
>
> On Tue, May 30, 2023 at 12:19 AM Alexander Aring  wrote:
> > Hi,
> >
> > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
> >  wrote:
> > >
> > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring  
> > > wrote:
> > > > This patch fixes a possible plock op collisions when using F_SETLKW lock
> > > > requests and fsid, number and owner are not enough to identify a result
> > > > for a pending request. The ltp testcases [0] and [1] are examples when
> > > > this is not enough in case of using classic posix locks with threads and
> > > > open filedescriptor posix locks.
> > > >
> > > > The idea to fix the issue here is to place all lock request in order. In
> > > > case of non F_SETLKW lock request (indicated if wait is set or not) the
> > > > lock requests are ordered inside the recv_list. If a result comes back
> > > > the right plock op can be found by the first plock_op in recv_list which
> > > > has not info.wait set. This can be done only by non F_SETLKW plock ops 
> > > > as
> > > > dlm_controld always reads a specific plock op (list_move_tail() from
> > > > send_list to recv_mlist) and write the result immediately back.
> > > >
> > > > This behaviour is for F_SETLKW not possible as multiple waiters can be
> > > > get a result back in an random order. To avoid a collisions in cases
> > > > like [0] or [1] this patch adds more fields to compare the plock
> > > > operations as the lock request is the same. This is also being made in
> > > > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. 
> > > > We
> > > > still can't find the exact lock request for a specific result if the
> > > > lock request is the same, but if this is the case we don't care the
> > > > order how the identical lock requests get their result back to grant the
> > > > lock.
> > >
> > > When the recv_list contains multiple indistinguishable requests, this
> > > can only be because they originated from multiple threads of the same
> > > process. In that case, I agree that it doesn't matter which of those
> > > requests we "complete" in dev_write() as long as we only complete one
> > > request. We do need to compare the additional request fields in
> > > dev_write() to find a suitable request, so that makes sense as well.
> > > We need to compare all of the fields that identify a request (optype,
> > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> > > "right" request (or in case there is more than one identical request,
> > > a "suitable" request).
> > >
> >
> > In my "definition" why this works is as you said the "identical
> > request". There is a more deeper definition of "when is a request
> > identical" and in my opinion it is here as: "A request A is identical
> > to request B when they get granted under the same 'time'" which is all
> > the fields you mentioned.
> >
> > Even with cancellation (F_SETLKW only) it does not matter which
> > "identical request" you cancel because the kernel and user
> > (dlm_controld) makes no relation between a lock request instance. You
> > need to have at least the same amount of "results" coming back from
> > user space as the amount you are waiting for a result for the same
> > "identical request".
>
> That's not incorrect per se, but cancellations create an additional
> difficulty: they can either succeed or fail. To indicate that a
> cancellation has succeeded, a new type of message can be introduced
> (say, "CANCELLED"), and it's obvious that a CANCELLED message can only
> belong to a locking request that is being cancelled. When cancelling a
> locking request fails, the kernel will see a "locking request granted"
> message though, and when multiple identical locking requests are
> queued and only some of them have been cancelled, it won't be obvious
> which locking request a "locking request granted" message should be
> assigned to anymore. You really don't want to mix things up in that
> case.
>
> This complication makes it a whole lot more difficult to reason about
> the correctness of the code. All that complexity is avoidable by
> sticking with a fixed mapping of requests and replies (i.e., a unique
> request identifier).
>
> To put it differently, you can shoot yourself in the foot and still
> hop along on the other leg, but it may not be the best of all possible
> ideas.
>

It makes things more complicated, I agree and the reason why this
works now is because there are a lot of "dependencies". I would love
to have an unique identifier to make it possible that we can follow an
instance handle of the original lock request.

* an unique identifier which also works with the async lock request of
lockd case.

> > > The above patch description doesn't match the code anymore, and the
> > > code doesn't fully revert the recv_list splitting of the previous
> > > version.
> > >
> >
> > This isn't a revert. Is it a new patch version, I did drop the
> > recv_setlkw_list here, dropping

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-30 Thread Andreas Gruenbacher
On Tue, May 30, 2023 at 12:19 AM Alexander Aring  wrote:
> Hi,
>
> On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
>  wrote:
> >
> > On Wed, May 24, 2023 at 6:02 PM Alexander Aring  wrote:
> > > This patch fixes a possible plock op collisions when using F_SETLKW lock
> > > requests and fsid, number and owner are not enough to identify a result
> > > for a pending request. The ltp testcases [0] and [1] are examples when
> > > this is not enough in case of using classic posix locks with threads and
> > > open filedescriptor posix locks.
> > >
> > > The idea to fix the issue here is to place all lock request in order. In
> > > case of non F_SETLKW lock request (indicated if wait is set or not) the
> > > lock requests are ordered inside the recv_list. If a result comes back
> > > the right plock op can be found by the first plock_op in recv_list which
> > > has not info.wait set. This can be done only by non F_SETLKW plock ops as
> > > dlm_controld always reads a specific plock op (list_move_tail() from
> > > send_list to recv_mlist) and write the result immediately back.
> > >
> > > This behaviour is for F_SETLKW not possible as multiple waiters can be
> > > get a result back in an random order. To avoid a collisions in cases
> > > like [0] or [1] this patch adds more fields to compare the plock
> > > operations as the lock request is the same. This is also being made in
> > > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> > > still can't find the exact lock request for a specific result if the
> > > lock request is the same, but if this is the case we don't care the
> > > order how the identical lock requests get their result back to grant the
> > > lock.
> >
> > When the recv_list contains multiple indistinguishable requests, this
> > can only be because they originated from multiple threads of the same
> > process. In that case, I agree that it doesn't matter which of those
> > requests we "complete" in dev_write() as long as we only complete one
> > request. We do need to compare the additional request fields in
> > dev_write() to find a suitable request, so that makes sense as well.
> > We need to compare all of the fields that identify a request (optype,
> > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> > "right" request (or in case there is more than one identical request,
> > a "suitable" request).
> >
>
> In my "definition" why this works is as you said the "identical
> request". There is a more deeper definition of "when is a request
> identical" and in my opinion it is here as: "A request A is identical
> to request B when they get granted under the same 'time'" which is all
> the fields you mentioned.
>
> Even with cancellation (F_SETLKW only) it does not matter which
> "identical request" you cancel because the kernel and user
> (dlm_controld) makes no relation between a lock request instance. You
> need to have at least the same amount of "results" coming back from
> user space as the amount you are waiting for a result for the same
> "identical request".

That's not incorrect per se, but cancellations create an additional
difficulty: they can either succeed or fail. To indicate that a
cancellation has succeeded, a new type of message can be introduced
(say, "CANCELLED"), and it's obvious that a CANCELLED message can only
belong to a locking request that is being cancelled. When cancelling a
locking request fails, the kernel will see a "locking request granted"
message though, and when multiple identical locking requests are
queued and only some of them have been cancelled, it won't be obvious
which locking request a "locking request granted" message should be
assigned to anymore. You really don't want to mix things up in that
case.

This complication makes it a whole lot more difficult to reason about
the correctness of the code. All that complexity is avoidable by
sticking with a fixed mapping of requests and replies (i.e., a unique
request identifier).

To put it differently, you can shoot yourself in the foot and still
hop along on the other leg, but it may not be the best of all possible
ideas.

> > The above patch description doesn't match the code anymore, and the
> > code doesn't fully revert the recv_list splitting of the previous
> > version.
> >
>
> This isn't a revert. Is it a new patch version, I did drop the
> recv_setlkw_list here, dropping in means of removing the
> recv_setlkw_list and handling everything in the recv_list. Although
> there might be a performance impact by splitting the requests in two
> lists as we don't need to jump over all F_SETLKW requests.
>
> > > [0] 
> > > https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
> > > [1] 
> > > https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
> > > [2] 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-29 Thread Alexander Aring
Hi,

On Mon, May 29, 2023 at 6:18 PM Alexander Aring  wrote:
>
> Hi,
>
> On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
>  wrote:
> >
> > On Wed, May 24, 2023 at 6:02 PM Alexander Aring  wrote:
> > > This patch fixes a possible plock op collisions when using F_SETLKW lock
> > > requests and fsid, number and owner are not enough to identify a result
> > > for a pending request. The ltp testcases [0] and [1] are examples when
> > > this is not enough in case of using classic posix locks with threads and
> > > open filedescriptor posix locks.
> > >
> > > The idea to fix the issue here is to place all lock request in order. In
> > > case of non F_SETLKW lock request (indicated if wait is set or not) the
> > > lock requests are ordered inside the recv_list. If a result comes back
> > > the right plock op can be found by the first plock_op in recv_list which
> > > has not info.wait set. This can be done only by non F_SETLKW plock ops as
> > > dlm_controld always reads a specific plock op (list_move_tail() from
> > > send_list to recv_mlist) and write the result immediately back.
> > >
> > > This behaviour is for F_SETLKW not possible as multiple waiters can be
> > > get a result back in an random order. To avoid a collisions in cases
> > > like [0] or [1] this patch adds more fields to compare the plock
> > > operations as the lock request is the same. This is also being made in
> > > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> > > still can't find the exact lock request for a specific result if the
> > > lock request is the same, but if this is the case we don't care the
> > > order how the identical lock requests get their result back to grant the
> > > lock.
> >
> > When the recv_list contains multiple indistinguishable requests, this
> > can only be because they originated from multiple threads of the same
> > process. In that case, I agree that it doesn't matter which of those
> > requests we "complete" in dev_write() as long as we only complete one
> > request. We do need to compare the additional request fields in
> > dev_write() to find a suitable request, so that makes sense as well.
> > We need to compare all of the fields that identify a request (optype,
> > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> > "right" request (or in case there is more than one identical request,
> > a "suitable" request).
> >
>
> In my "definition" why this works is as you said the "identical
> request". There is a more deeper definition of "when is a request
> identical" and in my opinion it is here as: "A request A is identical
> to request B when they get granted under the same 'time'" which is all
> the fields you mentioned.

s/under/at/

at the same 'time' or under the same conditions...

- Alex



Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-29 Thread Alexander Aring
Hi,

On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
 wrote:
>
> On Wed, May 24, 2023 at 6:02 PM Alexander Aring  wrote:
> > This patch fixes a possible plock op collisions when using F_SETLKW lock
> > requests and fsid, number and owner are not enough to identify a result
> > for a pending request. The ltp testcases [0] and [1] are examples when
> > this is not enough in case of using classic posix locks with threads and
> > open filedescriptor posix locks.
> >
> > The idea to fix the issue here is to place all lock request in order. In
> > case of non F_SETLKW lock request (indicated if wait is set or not) the
> > lock requests are ordered inside the recv_list. If a result comes back
> > the right plock op can be found by the first plock_op in recv_list which
> > has not info.wait set. This can be done only by non F_SETLKW plock ops as
> > dlm_controld always reads a specific plock op (list_move_tail() from
> > send_list to recv_mlist) and write the result immediately back.
> >
> > This behaviour is for F_SETLKW not possible as multiple waiters can be
> > get a result back in an random order. To avoid a collisions in cases
> > like [0] or [1] this patch adds more fields to compare the plock
> > operations as the lock request is the same. This is also being made in
> > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> > still can't find the exact lock request for a specific result if the
> > lock request is the same, but if this is the case we don't care the
> > order how the identical lock requests get their result back to grant the
> > lock.
>
> When the recv_list contains multiple indistinguishable requests, this
> can only be because they originated from multiple threads of the same
> process. In that case, I agree that it doesn't matter which of those
> requests we "complete" in dev_write() as long as we only complete one
> request. We do need to compare the additional request fields in
> dev_write() to find a suitable request, so that makes sense as well.
> We need to compare all of the fields that identify a request (optype,
> ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> "right" request (or in case there is more than one identical request,
> a "suitable" request).
>

In my "definition" why this works is as you said the "identical
request". There is a more deeper definition of "when is a request
identical" and in my opinion it is here as: "A request A is identical
to request B when they get granted under the same 'time'" which is all
the fields you mentioned.

Even with cancellation (F_SETLKW only) it does not matter which
"identical request" you cancel because the kernel and user
(dlm_controld) makes no relation between a lock request instance. You
need to have at least the same amount of "results" coming back from
user space as the amount you are waiting for a result for the same
"identical request".

> The above patch description doesn't match the code anymore, and the
> code doesn't fully revert the recv_list splitting of the previous
> version.
>

This isn't a revert. Is it a new patch version, I did drop the
recv_setlkw_list here, dropping in means of removing the
recv_setlkw_list and handling everything in the recv_list. Although
there might be a performance impact by splitting the requests in two
lists as we don't need to jump over all F_SETLKW requests.

> > [0] 
> > https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
> > [1] 
> > https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
> > [2] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
> > [3] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731
> >
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Alexander Aring 
> > ---
> > change since v2:
> >  - don't split recv_list into recv_setlkw_list
> >
> >  fs/dlm/plock.c | 43 ++-
> >  1 file changed, 30 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > index 31bc601ee3d8..53d17dbbb716 100644
> > --- a/fs/dlm/plock.c
> > +++ b/fs/dlm/plock.c
> > @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user 
> > *u, size_t count,
> > if (op->info.flags & DLM_PLOCK_FL_CLOSE)
> > list_del(&op->list);
> > else
> > -   list_move(&op->list, &recv_list);
> > +   list_move_tail(&op->list, &recv_list);
>
> ^ This should be obsolete, but it won't hurt, either.
>

No it is necessary, I tested it and looked deeper into the reason.
dlm_controld handles the lock requests in an ordered way over a
select() mechanism, but it will not always write a result back when
it's read the request out. This is the case for F_SETLKW but also for
al

Re: [Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-25 Thread Andreas Gruenbacher
On Wed, May 24, 2023 at 6:02 PM Alexander Aring  wrote:
> This patch fixes a possible plock op collisions when using F_SETLKW lock
> requests and fsid, number and owner are not enough to identify a result
> for a pending request. The ltp testcases [0] and [1] are examples when
> this is not enough in case of using classic posix locks with threads and
> open filedescriptor posix locks.
>
> The idea to fix the issue here is to place all lock request in order. In
> case of non F_SETLKW lock request (indicated if wait is set or not) the
> lock requests are ordered inside the recv_list. If a result comes back
> the right plock op can be found by the first plock_op in recv_list which
> has not info.wait set. This can be done only by non F_SETLKW plock ops as
> dlm_controld always reads a specific plock op (list_move_tail() from
> send_list to recv_mlist) and write the result immediately back.
>
> This behaviour is for F_SETLKW not possible as multiple waiters can be
> get a result back in an random order. To avoid a collisions in cases
> like [0] or [1] this patch adds more fields to compare the plock
> operations as the lock request is the same. This is also being made in
> NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> still can't find the exact lock request for a specific result if the
> lock request is the same, but if this is the case we don't care the
> order how the identical lock requests get their result back to grant the
> lock.

When the recv_list contains multiple indistinguishable requests, this
can only be because they originated from multiple threads of the same
process. In that case, I agree that it doesn't matter which of those
requests we "complete" in dev_write() as long as we only complete one
request. We do need to compare the additional request fields in
dev_write() to find a suitable request, so that makes sense as well.
We need to compare all of the fields that identify a request (optype,
ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
"right" request (or in case there is more than one identical request,
a "suitable" request).

The above patch description doesn't match the code anymore, and the
code doesn't fully revert the recv_list splitting of the previous
version.

> [0] 
> https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
> [1] 
> https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Alexander Aring 
> ---
> change since v2:
>  - don't split recv_list into recv_setlkw_list
>
>  fs/dlm/plock.c | 43 ++-
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> index 31bc601ee3d8..53d17dbbb716 100644
> --- a/fs/dlm/plock.c
> +++ b/fs/dlm/plock.c
> @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user 
> *u, size_t count,
> if (op->info.flags & DLM_PLOCK_FL_CLOSE)
> list_del(&op->list);
> else
> -   list_move(&op->list, &recv_list);
> +   list_move_tail(&op->list, &recv_list);

^ This should be obsolete, but it won't hurt, either.

> memcpy(&info, &op->info, sizeof(info));
> }
> spin_unlock(&ops_lock);
> @@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char 
> __user *u, size_t count,
> return -EINVAL;
>
> spin_lock(&ops_lock);
> -   list_for_each_entry(iter, &recv_list, list) {
> -   if (iter->info.fsid == info.fsid &&
> -   iter->info.number == info.number &&
> -   iter->info.owner == info.owner) {
> -   list_del_init(&iter->list);
> -   memcpy(&iter->info, &info, sizeof(info));
> -   if (iter->data)
> -   do_callback = 1;
> -   else
> -   iter->done = 1;
> -   op = iter;
> -   break;
> +   if (info.wait) {

We should be able to use the same list_for_each_entry() loop for
F_SETLKW requests (which have info.wait set) as for all other requests
as far as I can see.

> +   list_for_each_entry(iter, &recv_list, list) {
> +   if (iter->info.fsid == info.fsid &&
> +   iter->info.number == info.number &&
> +   iter->info.owner == info.owner &&
> +   iter->info.pid == info.pid &&
> +   iter->info.start == info.start &&
> +

[Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

2023-05-24 Thread Alexander Aring
This patch fixes a possible plock op collisions when using F_SETLKW lock
requests and fsid, number and owner are not enough to identify a result
for a pending request. The ltp testcases [0] and [1] are examples when
this is not enough in case of using classic posix locks with threads and
open filedescriptor posix locks.

The idea to fix the issue here is to place all lock request in order. In
case of non F_SETLKW lock request (indicated if wait is set or not) the
lock requests are ordered inside the recv_list. If a result comes back
the right plock op can be found by the first plock_op in recv_list which
has not info.wait set. This can be done only by non F_SETLKW plock ops as
dlm_controld always reads a specific plock op (list_move_tail() from
send_list to recv_mlist) and write the result immediately back.

This behaviour is for F_SETLKW not possible as multiple waiters can be
get a result back in an random order. To avoid a collisions in cases
like [0] or [1] this patch adds more fields to compare the plock
operations as the lock request is the same. This is also being made in
NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
still can't find the exact lock request for a specific result if the
lock request is the same, but if this is the case we don't care the
order how the identical lock requests get their result back to grant the
lock.

[0] 
https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
[1] 
https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731

Cc: sta...@vger.kernel.org
Signed-off-by: Alexander Aring 
---
change since v2:
 - don't split recv_list into recv_setlkw_list

 fs/dlm/plock.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 31bc601ee3d8..53d17dbbb716 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, 
size_t count,
if (op->info.flags & DLM_PLOCK_FL_CLOSE)
list_del(&op->list);
else
-   list_move(&op->list, &recv_list);
+   list_move_tail(&op->list, &recv_list);
memcpy(&info, &op->info, sizeof(info));
}
spin_unlock(&ops_lock);
@@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char 
__user *u, size_t count,
return -EINVAL;
 
spin_lock(&ops_lock);
-   list_for_each_entry(iter, &recv_list, list) {
-   if (iter->info.fsid == info.fsid &&
-   iter->info.number == info.number &&
-   iter->info.owner == info.owner) {
-   list_del_init(&iter->list);
-   memcpy(&iter->info, &info, sizeof(info));
-   if (iter->data)
-   do_callback = 1;
-   else
-   iter->done = 1;
-   op = iter;
-   break;
+   if (info.wait) {
+   list_for_each_entry(iter, &recv_list, list) {
+   if (iter->info.fsid == info.fsid &&
+   iter->info.number == info.number &&
+   iter->info.owner == info.owner &&
+   iter->info.pid == info.pid &&
+   iter->info.start == info.start &&
+   iter->info.end == info.end &&
+   iter->info.ex == info.ex &&
+   iter->info.wait) {
+   op = iter;
+   break;
+   }
}
+   } else {
+   list_for_each_entry(iter, &recv_list, list) {
+   if (!iter->info.wait) {
+   op = iter;
+   break;
+   }
+   }
+   }
+
+   if (op) {
+   list_del_init(&op->list);
+   memcpy(&op->info, &info, sizeof(info));
+   if (op->data)
+   do_callback = 1;
+   else
+   op->done = 1;
}
spin_unlock(&ops_lock);
 
-- 
2.31.1