Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-23 Thread NeilBrown
On Wed, Aug 22 2018, NeilBrown wrote:

>
> Oh dear.
> nfs4_alloc_lockdata contains:
>   memcpy(>fl, fl, sizeof(p->fl));
>
> so any list_heads that are valid in fl will be invalid in p->fl.
>
> Maybe I should initialize the relevant list_heads at the start of wait
> functions.
> I should look more closely at what filesystems do with locks though.
>

I looked  and  it's complicated.

Some call posix_lock_file() (which doesn't block, I think).
Some call locks_lock_file_wait() (which can block, if FL_SLEEP is given).
Some call both.

Strangely, vfs_lock_file() either calls posix_lock_file(), which doesn't
block, or filp->f_op->lock() which, I think, can.

I'm confused.  However I think this version of the patch should be
safer.
When I make time to test this, this will be part of what I test.

Thanks,
NeilBrown

From: NeilBrown 
Date: Tue, 21 Aug 2018 15:09:06 +1000
Subject: [PATCH] fs/locks: always delete_block after waiting.

Now that requests can block other requests, we
need to be careful to always clean up those blocked
requests.
Any time that we wait for a request, we might have
other requests attached, and when we stop waiting,
we much clean them up.
If the lock was granted, the requests might have been
moved to the new lock, though when merged with a
pre-exiting lock, this might not happen.
No all cases we don't want blocked locks to remain
attached, so we remove them to be safe.

Note that when these locking routines are called without FL_SLEEP set,
the list_head might not be properly initialize.
In that case it is neither safe nor necessary to
call locks_delete_block()

Signed-off-by: NeilBrown 
---
 fs/locks.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index de38bafb7f7b..2af9c657f81f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1276,12 +1276,11 @@ static int posix_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   if (fl->fl_flags & FL_SLEEP)
+   locks_delete_block(fl);
return error;
 }
 
@@ -1971,12 +1970,11 @@ static int flock_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   if (fl->fl_flags & FL_SLEEP)
+   locks_delete_block(fl);
return error;
 }
 
@@ -2250,12 +2248,11 @@ static int do_lock_file_wait(struct file *filp, 
unsigned int cmd,
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   if (fl->fl_flags & FL_SLEEP)
+   locks_delete_block(fl);
 
return error;
 }
-- 
2.14.0.rc0.dirty



signature.asc
Description: PGP signature


Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-23 Thread NeilBrown
On Wed, Aug 22 2018, NeilBrown wrote:

>
> Oh dear.
> nfs4_alloc_lockdata contains:
>   memcpy(>fl, fl, sizeof(p->fl));
>
> so any list_heads that are valid in fl will be invalid in p->fl.
>
> Maybe I should initialize the relevant list_heads at the start of wait
> functions.
> I should look more closely at what filesystems do with locks though.
>

I looked  and  it's complicated.

Some call posix_lock_file() (which doesn't block, I think).
Some call locks_lock_file_wait() (which can block, if FL_SLEEP is given).
Some call both.

Strangely, vfs_lock_file() either calls posix_lock_file(), which doesn't
block, or filp->f_op->lock() which, I think, can.

I'm confused.  However I think this version of the patch should be
safer.
When I make time to test this, this will be part of what I test.

Thanks,
NeilBrown

From: NeilBrown 
Date: Tue, 21 Aug 2018 15:09:06 +1000
Subject: [PATCH] fs/locks: always delete_block after waiting.

Now that requests can block other requests, we
need to be careful to always clean up those blocked
requests.
Any time that we wait for a request, we might have
other requests attached, and when we stop waiting,
we much clean them up.
If the lock was granted, the requests might have been
moved to the new lock, though when merged with a
pre-exiting lock, this might not happen.
No all cases we don't want blocked locks to remain
attached, so we remove them to be safe.

Note that when these locking routines are called without FL_SLEEP set,
the list_head might not be properly initialize.
In that case it is neither safe nor necessary to
call locks_delete_block()

Signed-off-by: NeilBrown 
---
 fs/locks.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index de38bafb7f7b..2af9c657f81f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1276,12 +1276,11 @@ static int posix_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   if (fl->fl_flags & FL_SLEEP)
+   locks_delete_block(fl);
return error;
 }
 
@@ -1971,12 +1970,11 @@ static int flock_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   if (fl->fl_flags & FL_SLEEP)
+   locks_delete_block(fl);
return error;
 }
 
@@ -2250,12 +2248,11 @@ static int do_lock_file_wait(struct file *filp, 
unsigned int cmd,
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   if (fl->fl_flags & FL_SLEEP)
+   locks_delete_block(fl);
 
return error;
 }
-- 
2.14.0.rc0.dirty



signature.asc
Description: PGP signature


Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-21 Thread NeilBrown
On Tue, Aug 21 2018, Jeff Layton wrote:

> On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote:
>> On Thu, Aug 16 2018, NeilBrown wrote:
>> 
>> > On Wed, Aug 15 2018, Jeff Layton wrote:
>> > 
>> > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
>> > > > Hi,
>> > > > 
>> > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
>> > > > ("fs/locks: allow a lock request to block other requests.") to failure
>> > > > boot of NFSv4 with root on several boards.
>> > > > 
>> > > > Log is here:
>> > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
>> > > > 
>> > > > With several errors:
>> > > > kernel BUG at ../fs/locks.c:336!
>> > > > Unable to handle kernel NULL pointer dereference at virtual address 
>> > > > 0004
>> > > > 
>> > > > Configuration:
>> > > > 1. exynos_defconfig
>> > > > 2. Arch ARM Linux
>> > > > 3. Boards:
>> > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
>> > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
>> > > > 4. Systemd: v236, 238
>> > > > 5. All boards boot from TFTP with NFS root (NFSv4)
>> > > > 
>> > > > On Colibri VF50 I got slightly different errors:
>> > > > [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
>> > > > [   12.455273] Unable to handle kernel NULL pointer dereference at
>> > > > virtual address 0004
>> > > > and only with some specific GCC (v6.3) or with other conditions which
>> > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that
>> > > > commit.
>> > > > 
>> > > > Best regards,
>> > > > Krzysztof
>> > 
>> > Thanks a lot for the report Krzysztof!!
>> > 
>> > > 
>> > > The BUG is due to a lock being freed when the fl_blocked list wasn't
>> > > empty (implying that there were still blocked locks waiting on it).
>> > > 
>> > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
>> > > and I don't think the fl_blocked list is being handled properly with all
>> > > of them. It only transplants the blocked locks to a new lock when there
>> > > are surviving locks on the list, and that may not be the case when the
>> > > whole file is being unlocked.
>> > 
>> > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
>> > locks_wake_up_block() which doesn't only wake_up the blocks, but also
>> > detached them. When that function completes, ->fl_blocked must be empty.
>> > 
>> > The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
>> > as the problematic call.
>> > This suggests that do_lock_file_wait() exited with ->fl_blocked
>> > non-empty, which it shouldn't.
>> > 
>> > I think we need to insert a call to locks_wake_up_block() in
>> > do_lock_file_wait() before it returns.
>> > I cannot find a sequence that would make this necessary, but
>> > it isn't surprising that there might be one.
>> > 
>> > I'll dig through the code a bit more later and make sure I understand
>> > what is happening.
>> > 
>> 
>> I think this problem if fixed by the following.  It is probably
>> triggered when the owner already has a lock for part of the requested
>> range.  After waiting for some other lock, the pending request gets
>> merged with the existing lock, and blocked requests aren't moved across
>> in that case.
>> 
>> I still haven't done more testing, so this is just FYI, not a
>> submission.
>> 
>> Thanks,
>> NeilBrown
>> 
>> From: NeilBrown 
>> Date: Tue, 21 Aug 2018 15:09:06 +1000
>> Subject: [PATCH] fs/locks: always delete_block after waiting.
>> 
>> Now that requests can block other requests, we
>> need to be careful to always clean up those blocked
>> requests.
>> Any time that we wait for a request, we might have
>> other requests attached, and when we stop waiting,
>> we much clean them up.
>> If the lock was granted, the requests might have been
>> moved to the new lock, though when merged with a
>> pre-exiting lock, this might not happen.
>> No all cases we don't want blocked locks to remain
>> attached, so we remove them to be safe.
>> 
>> Signed-off-by: NeilBrown 
>> ---
>>  fs/locks.c | 24 +---
>>  1 file changed, 9 insertions(+), 15 deletions(-)
>> 
>> diff --git a/fs/locks.c b/fs/locks.c
>> index de38bafb7f7b..6b310112cf3b 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode 
>> *inode, struct file_lock *fl)
>>  if (error != FILE_LOCK_DEFERRED)
>>  break;
>>  error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>> -if (!error)
>> -continue;
>> -
>> -locks_delete_block(fl);
>> -break;
>> +if (error)
>> +break;
>>  }
>> +locks_delete_block(fl);
>>  return error;
>>  }
>>  
>> @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode 
>> *inode, struct file_lock *fl)
>>  if (error != FILE_LOCK_DEFERRED)
>>  break;
>>

Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-21 Thread NeilBrown
On Tue, Aug 21 2018, Jeff Layton wrote:

> On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote:
>> On Thu, Aug 16 2018, NeilBrown wrote:
>> 
>> > On Wed, Aug 15 2018, Jeff Layton wrote:
>> > 
>> > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
>> > > > Hi,
>> > > > 
>> > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
>> > > > ("fs/locks: allow a lock request to block other requests.") to failure
>> > > > boot of NFSv4 with root on several boards.
>> > > > 
>> > > > Log is here:
>> > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
>> > > > 
>> > > > With several errors:
>> > > > kernel BUG at ../fs/locks.c:336!
>> > > > Unable to handle kernel NULL pointer dereference at virtual address 
>> > > > 0004
>> > > > 
>> > > > Configuration:
>> > > > 1. exynos_defconfig
>> > > > 2. Arch ARM Linux
>> > > > 3. Boards:
>> > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
>> > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
>> > > > 4. Systemd: v236, 238
>> > > > 5. All boards boot from TFTP with NFS root (NFSv4)
>> > > > 
>> > > > On Colibri VF50 I got slightly different errors:
>> > > > [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
>> > > > [   12.455273] Unable to handle kernel NULL pointer dereference at
>> > > > virtual address 0004
>> > > > and only with some specific GCC (v6.3) or with other conditions which
>> > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that
>> > > > commit.
>> > > > 
>> > > > Best regards,
>> > > > Krzysztof
>> > 
>> > Thanks a lot for the report Krzysztof!!
>> > 
>> > > 
>> > > The BUG is due to a lock being freed when the fl_blocked list wasn't
>> > > empty (implying that there were still blocked locks waiting on it).
>> > > 
>> > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
>> > > and I don't think the fl_blocked list is being handled properly with all
>> > > of them. It only transplants the blocked locks to a new lock when there
>> > > are surviving locks on the list, and that may not be the case when the
>> > > whole file is being unlocked.
>> > 
>> > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
>> > locks_wake_up_block() which doesn't only wake_up the blocks, but also
>> > detached them. When that function completes, ->fl_blocked must be empty.
>> > 
>> > The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
>> > as the problematic call.
>> > This suggests that do_lock_file_wait() exited with ->fl_blocked
>> > non-empty, which it shouldn't.
>> > 
>> > I think we need to insert a call to locks_wake_up_block() in
>> > do_lock_file_wait() before it returns.
>> > I cannot find a sequence that would make this necessary, but
>> > it isn't surprising that there might be one.
>> > 
>> > I'll dig through the code a bit more later and make sure I understand
>> > what is happening.
>> > 
>> 
>> I think this problem if fixed by the following.  It is probably
>> triggered when the owner already has a lock for part of the requested
>> range.  After waiting for some other lock, the pending request gets
>> merged with the existing lock, and blocked requests aren't moved across
>> in that case.
>> 
>> I still haven't done more testing, so this is just FYI, not a
>> submission.
>> 
>> Thanks,
>> NeilBrown
>> 
>> From: NeilBrown 
>> Date: Tue, 21 Aug 2018 15:09:06 +1000
>> Subject: [PATCH] fs/locks: always delete_block after waiting.
>> 
>> Now that requests can block other requests, we
>> need to be careful to always clean up those blocked
>> requests.
>> Any time that we wait for a request, we might have
>> other requests attached, and when we stop waiting,
>> we much clean them up.
>> If the lock was granted, the requests might have been
>> moved to the new lock, though when merged with a
>> pre-exiting lock, this might not happen.
>> No all cases we don't want blocked locks to remain
>> attached, so we remove them to be safe.
>> 
>> Signed-off-by: NeilBrown 
>> ---
>>  fs/locks.c | 24 +---
>>  1 file changed, 9 insertions(+), 15 deletions(-)
>> 
>> diff --git a/fs/locks.c b/fs/locks.c
>> index de38bafb7f7b..6b310112cf3b 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode 
>> *inode, struct file_lock *fl)
>>  if (error != FILE_LOCK_DEFERRED)
>>  break;
>>  error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>> -if (!error)
>> -continue;
>> -
>> -locks_delete_block(fl);
>> -break;
>> +if (error)
>> +break;
>>  }
>> +locks_delete_block(fl);
>>  return error;
>>  }
>>  
>> @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode 
>> *inode, struct file_lock *fl)
>>  if (error != FILE_LOCK_DEFERRED)
>>  break;
>>

Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-21 Thread Jeff Layton
On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote:
> On Thu, Aug 16 2018, NeilBrown wrote:
> 
> > On Wed, Aug 15 2018, Jeff Layton wrote:
> > 
> > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> > > > Hi,
> > > > 
> > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> > > > ("fs/locks: allow a lock request to block other requests.") to failure
> > > > boot of NFSv4 with root on several boards.
> > > > 
> > > > Log is here:
> > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> > > > 
> > > > With several errors:
> > > > kernel BUG at ../fs/locks.c:336!
> > > > Unable to handle kernel NULL pointer dereference at virtual address 
> > > > 0004
> > > > 
> > > > Configuration:
> > > > 1. exynos_defconfig
> > > > 2. Arch ARM Linux
> > > > 3. Boards:
> > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> > > > 4. Systemd: v236, 238
> > > > 5. All boards boot from TFTP with NFS root (NFSv4)
> > > > 
> > > > On Colibri VF50 I got slightly different errors:
> > > > [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> > > > [   12.455273] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0004
> > > > and only with some specific GCC (v6.3) or with other conditions which
> > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that
> > > > commit.
> > > > 
> > > > Best regards,
> > > > Krzysztof
> > 
> > Thanks a lot for the report Krzysztof!!
> > 
> > > 
> > > The BUG is due to a lock being freed when the fl_blocked list wasn't
> > > empty (implying that there were still blocked locks waiting on it).
> > > 
> > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
> > > and I don't think the fl_blocked list is being handled properly with all
> > > of them. It only transplants the blocked locks to a new lock when there
> > > are surviving locks on the list, and that may not be the case when the
> > > whole file is being unlocked.
> > 
> > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
> > locks_wake_up_block() which doesn't only wake_up the blocks, but also
> > detached them. When that function completes, ->fl_blocked must be empty.
> > 
> > The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
> > as the problematic call.
> > This suggests that do_lock_file_wait() exited with ->fl_blocked
> > non-empty, which it shouldn't.
> > 
> > I think we need to insert a call to locks_wake_up_block() in
> > do_lock_file_wait() before it returns.
> > I cannot find a sequence that would make this necessary, but
> > it isn't surprising that there might be one.
> > 
> > I'll dig through the code a bit more later and make sure I understand
> > what is happening.
> > 
> 
> I think this problem if fixed by the following.  It is probably
> triggered when the owner already has a lock for part of the requested
> range.  After waiting for some other lock, the pending request gets
> merged with the existing lock, and blocked requests aren't moved across
> in that case.
> 
> I still haven't done more testing, so this is just FYI, not a
> submission.
> 
> Thanks,
> NeilBrown
> 
> From: NeilBrown 
> Date: Tue, 21 Aug 2018 15:09:06 +1000
> Subject: [PATCH] fs/locks: always delete_block after waiting.
> 
> Now that requests can block other requests, we
> need to be careful to always clean up those blocked
> requests.
> Any time that we wait for a request, we might have
> other requests attached, and when we stop waiting,
> we much clean them up.
> If the lock was granted, the requests might have been
> moved to the new lock, though when merged with a
> pre-exiting lock, this might not happen.
> No all cases we don't want blocked locks to remain
> attached, so we remove them to be safe.
> 
> Signed-off-by: NeilBrown 
> ---
>  fs/locks.c | 24 +---
>  1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index de38bafb7f7b..6b310112cf3b 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, 
> struct file_lock *fl)
>   if (error != FILE_LOCK_DEFERRED)
>   break;
>   error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> - if (!error)
> - continue;
> -
> - locks_delete_block(fl);
> - break;
> + if (error)
> + break;
>   }
> + locks_delete_block(fl);
>   return error;
>  }
>  
> @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, 
> struct file_lock *fl)
>   if (error != FILE_LOCK_DEFERRED)
>   break;
>   error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> - if (!error)
> - continue;
> -
> - 

Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-21 Thread Jeff Layton
On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote:
> On Thu, Aug 16 2018, NeilBrown wrote:
> 
> > On Wed, Aug 15 2018, Jeff Layton wrote:
> > 
> > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> > > > Hi,
> > > > 
> > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> > > > ("fs/locks: allow a lock request to block other requests.") to failure
> > > > boot of NFSv4 with root on several boards.
> > > > 
> > > > Log is here:
> > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> > > > 
> > > > With several errors:
> > > > kernel BUG at ../fs/locks.c:336!
> > > > Unable to handle kernel NULL pointer dereference at virtual address 
> > > > 0004
> > > > 
> > > > Configuration:
> > > > 1. exynos_defconfig
> > > > 2. Arch ARM Linux
> > > > 3. Boards:
> > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> > > > 4. Systemd: v236, 238
> > > > 5. All boards boot from TFTP with NFS root (NFSv4)
> > > > 
> > > > On Colibri VF50 I got slightly different errors:
> > > > [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> > > > [   12.455273] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0004
> > > > and only with some specific GCC (v6.3) or with other conditions which
> > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that
> > > > commit.
> > > > 
> > > > Best regards,
> > > > Krzysztof
> > 
> > Thanks a lot for the report Krzysztof!!
> > 
> > > 
> > > The BUG is due to a lock being freed when the fl_blocked list wasn't
> > > empty (implying that there were still blocked locks waiting on it).
> > > 
> > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
> > > and I don't think the fl_blocked list is being handled properly with all
> > > of them. It only transplants the blocked locks to a new lock when there
> > > are surviving locks on the list, and that may not be the case when the
> > > whole file is being unlocked.
> > 
> > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
> > locks_wake_up_block() which doesn't only wake_up the blocks, but also
> > detached them. When that function completes, ->fl_blocked must be empty.
> > 
> > The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
> > as the problematic call.
> > This suggests that do_lock_file_wait() exited with ->fl_blocked
> > non-empty, which it shouldn't.
> > 
> > I think we need to insert a call to locks_wake_up_block() in
> > do_lock_file_wait() before it returns.
> > I cannot find a sequence that would make this necessary, but
> > it isn't surprising that there might be one.
> > 
> > I'll dig through the code a bit more later and make sure I understand
> > what is happening.
> > 
> 
> I think this problem if fixed by the following.  It is probably
> triggered when the owner already has a lock for part of the requested
> range.  After waiting for some other lock, the pending request gets
> merged with the existing lock, and blocked requests aren't moved across
> in that case.
> 
> I still haven't done more testing, so this is just FYI, not a
> submission.
> 
> Thanks,
> NeilBrown
> 
> From: NeilBrown 
> Date: Tue, 21 Aug 2018 15:09:06 +1000
> Subject: [PATCH] fs/locks: always delete_block after waiting.
> 
> Now that requests can block other requests, we
> need to be careful to always clean up those blocked
> requests.
> Any time that we wait for a request, we might have
> other requests attached, and when we stop waiting,
> we much clean them up.
> If the lock was granted, the requests might have been
> moved to the new lock, though when merged with a
> pre-exiting lock, this might not happen.
> No all cases we don't want blocked locks to remain
> attached, so we remove them to be safe.
> 
> Signed-off-by: NeilBrown 
> ---
>  fs/locks.c | 24 +---
>  1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index de38bafb7f7b..6b310112cf3b 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, 
> struct file_lock *fl)
>   if (error != FILE_LOCK_DEFERRED)
>   break;
>   error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> - if (!error)
> - continue;
> -
> - locks_delete_block(fl);
> - break;
> + if (error)
> + break;
>   }
> + locks_delete_block(fl);
>   return error;
>  }
>  
> @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, 
> struct file_lock *fl)
>   if (error != FILE_LOCK_DEFERRED)
>   break;
>   error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> - if (!error)
> - continue;
> -
> - 

Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-20 Thread NeilBrown
On Thu, Aug 16 2018, NeilBrown wrote:

> On Wed, Aug 15 2018, Jeff Layton wrote:
>
>> On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
>>> Hi,
>>> 
>>> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
>>> ("fs/locks: allow a lock request to block other requests.") to failure
>>> boot of NFSv4 with root on several boards.
>>> 
>>> Log is here:
>>> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
>>> 
>>> With several errors:
>>> kernel BUG at ../fs/locks.c:336!
>>> Unable to handle kernel NULL pointer dereference at virtual address 0004
>>> 
>>> Configuration:
>>> 1. exynos_defconfig
>>> 2. Arch ARM Linux
>>> 3. Boards:
>>> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
>>> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
>>> 4. Systemd: v236, 238
>>> 5. All boards boot from TFTP with NFS root (NFSv4)
>>> 
>>> On Colibri VF50 I got slightly different errors:
>>> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
>>> [   12.455273] Unable to handle kernel NULL pointer dereference at
>>> virtual address 0004
>>> and only with some specific GCC (v6.3) or with other conditions which
>>> I did not bisect yet. Maybe Colibri's failure is unrelated to that
>>> commit.
>>> 
>>> Best regards,
>>> Krzysztof
>
> Thanks a lot for the report Krzysztof!!
>
>>
>> The BUG is due to a lock being freed when the fl_blocked list wasn't
>> empty (implying that there were still blocked locks waiting on it).
>>
>> There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
>> and I don't think the fl_blocked list is being handled properly with all
>> of them. It only transplants the blocked locks to a new lock when there
>> are surviving locks on the list, and that may not be the case when the
>> whole file is being unlocked.
>
> locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
> locks_wake_up_block() which doesn't only wake_up the blocks, but also
> detached them. When that function completes, ->fl_blocked must be empty.
>
> The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
> as the problematic call.
> This suggests that do_lock_file_wait() exited with ->fl_blocked
> non-empty, which it shouldn't.
>
> I think we need to insert a call to locks_wake_up_block() in
> do_lock_file_wait() before it returns.
> I cannot find a sequence that would make this necessary, but
> it isn't surprising that there might be one.
>
> I'll dig through the code a bit more later and make sure I understand
> what is happening.
>

I think this problem if fixed by the following.  It is probably
triggered when the owner already has a lock for part of the requested
range.  After waiting for some other lock, the pending request gets
merged with the existing lock, and blocked requests aren't moved across
in that case.

I still haven't done more testing, so this is just FYI, not a
submission.

Thanks,
NeilBrown

From: NeilBrown 
Date: Tue, 21 Aug 2018 15:09:06 +1000
Subject: [PATCH] fs/locks: always delete_block after waiting.

Now that requests can block other requests, we
need to be careful to always clean up those blocked
requests.
Any time that we wait for a request, we might have
other requests attached, and when we stop waiting,
we much clean them up.
If the lock was granted, the requests might have been
moved to the new lock, though when merged with a
pre-exiting lock, this might not happen.
No all cases we don't want blocked locks to remain
attached, so we remove them to be safe.

Signed-off-by: NeilBrown 
---
 fs/locks.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index de38bafb7f7b..6b310112cf3b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   locks_delete_block(fl);
return error;
 }
 
@@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   locks_delete_block(fl);
return error;
 }
 
@@ -2250,12 +2246,10 @@ static int do_lock_file_wait(struct file *filp, 
unsigned int cmd,
if (error != FILE_LOCK_DEFERRED)
break;
error = 

Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-20 Thread NeilBrown
On Thu, Aug 16 2018, NeilBrown wrote:

> On Wed, Aug 15 2018, Jeff Layton wrote:
>
>> On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
>>> Hi,
>>> 
>>> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
>>> ("fs/locks: allow a lock request to block other requests.") to failure
>>> boot of NFSv4 with root on several boards.
>>> 
>>> Log is here:
>>> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
>>> 
>>> With several errors:
>>> kernel BUG at ../fs/locks.c:336!
>>> Unable to handle kernel NULL pointer dereference at virtual address 0004
>>> 
>>> Configuration:
>>> 1. exynos_defconfig
>>> 2. Arch ARM Linux
>>> 3. Boards:
>>> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
>>> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
>>> 4. Systemd: v236, 238
>>> 5. All boards boot from TFTP with NFS root (NFSv4)
>>> 
>>> On Colibri VF50 I got slightly different errors:
>>> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
>>> [   12.455273] Unable to handle kernel NULL pointer dereference at
>>> virtual address 0004
>>> and only with some specific GCC (v6.3) or with other conditions which
>>> I did not bisect yet. Maybe Colibri's failure is unrelated to that
>>> commit.
>>> 
>>> Best regards,
>>> Krzysztof
>
> Thanks a lot for the report Krzysztof!!
>
>>
>> The BUG is due to a lock being freed when the fl_blocked list wasn't
>> empty (implying that there were still blocked locks waiting on it).
>>
>> There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
>> and I don't think the fl_blocked list is being handled properly with all
>> of them. It only transplants the blocked locks to a new lock when there
>> are surviving locks on the list, and that may not be the case when the
>> whole file is being unlocked.
>
> locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
> locks_wake_up_block() which doesn't only wake_up the blocks, but also
> detached them. When that function completes, ->fl_blocked must be empty.
>
> The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
> as the problematic call.
> This suggests that do_lock_file_wait() exited with ->fl_blocked
> non-empty, which it shouldn't.
>
> I think we need to insert a call to locks_wake_up_block() in
> do_lock_file_wait() before it returns.
> I cannot find a sequence that would make this necessary, but
> it isn't surprising that there might be one.
>
> I'll dig through the code a bit more later and make sure I understand
> what is happening.
>

I think this problem if fixed by the following.  It is probably
triggered when the owner already has a lock for part of the requested
range.  After waiting for some other lock, the pending request gets
merged with the existing lock, and blocked requests aren't moved across
in that case.

I still haven't done more testing, so this is just FYI, not a
submission.

Thanks,
NeilBrown

From: NeilBrown 
Date: Tue, 21 Aug 2018 15:09:06 +1000
Subject: [PATCH] fs/locks: always delete_block after waiting.

Now that requests can block other requests, we
need to be careful to always clean up those blocked
requests.
Any time that we wait for a request, we might have
other requests attached, and when we stop waiting,
we much clean them up.
If the lock was granted, the requests might have been
moved to the new lock, though when merged with a
pre-exiting lock, this might not happen.
No all cases we don't want blocked locks to remain
attached, so we remove them to be safe.

Signed-off-by: NeilBrown 
---
 fs/locks.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index de38bafb7f7b..6b310112cf3b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   locks_delete_block(fl);
return error;
 }
 
@@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, 
struct file_lock *fl)
if (error != FILE_LOCK_DEFERRED)
break;
error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-   if (!error)
-   continue;
-
-   locks_delete_block(fl);
-   break;
+   if (error)
+   break;
}
+   locks_delete_block(fl);
return error;
 }
 
@@ -2250,12 +2246,10 @@ static int do_lock_file_wait(struct file *filp, 
unsigned int cmd,
if (error != FILE_LOCK_DEFERRED)
break;
error = 

Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread NeilBrown
On Wed, Aug 15 2018, Jeff Layton wrote:

> On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
>> Hi,
>> 
>> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
>> ("fs/locks: allow a lock request to block other requests.") to failure
>> boot of NFSv4 with root on several boards.
>> 
>> Log is here:
>> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
>> 
>> With several errors:
>> kernel BUG at ../fs/locks.c:336!
>> Unable to handle kernel NULL pointer dereference at virtual address 0004
>> 
>> Configuration:
>> 1. exynos_defconfig
>> 2. Arch ARM Linux
>> 3. Boards:
>> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
>> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
>> 4. Systemd: v236, 238
>> 5. All boards boot from TFTP with NFS root (NFSv4)
>> 
>> On Colibri VF50 I got slightly different errors:
>> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
>> [   12.455273] Unable to handle kernel NULL pointer dereference at
>> virtual address 0004
>> and only with some specific GCC (v6.3) or with other conditions which
>> I did not bisect yet. Maybe Colibri's failure is unrelated to that
>> commit.
>> 
>> Best regards,
>> Krzysztof

Thanks a lot for the report Krzysztof!!

>
> The BUG is due to a lock being freed when the fl_blocked list wasn't
> empty (implying that there were still blocked locks waiting on it).
>
> There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
> and I don't think the fl_blocked list is being handled properly with all
> of them. It only transplants the blocked locks to a new lock when there
> are surviving locks on the list, and that may not be the case when the
> whole file is being unlocked.

locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
locks_wake_up_block() which doesn't only wake_up the blocks, but also
detached them. When that function completes, ->fl_blocked must be empty.

The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
as the problematic call.
This suggests that do_lock_file_wait() exited with ->fl_blocked
non-empty, which it shouldn't.

I think we need to insert a call to locks_wake_up_block() in
do_lock_file_wait() before it returns.
I cannot find a sequence that would make this necessary, but
it isn't surprising that there might be one.

I'll dig through the code a bit more later and make sure I understand
what is happening.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread NeilBrown
On Wed, Aug 15 2018, Jeff Layton wrote:

> On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
>> Hi,
>> 
>> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
>> ("fs/locks: allow a lock request to block other requests.") to failure
>> boot of NFSv4 with root on several boards.
>> 
>> Log is here:
>> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
>> 
>> With several errors:
>> kernel BUG at ../fs/locks.c:336!
>> Unable to handle kernel NULL pointer dereference at virtual address 0004
>> 
>> Configuration:
>> 1. exynos_defconfig
>> 2. Arch ARM Linux
>> 3. Boards:
>> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
>> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
>> 4. Systemd: v236, 238
>> 5. All boards boot from TFTP with NFS root (NFSv4)
>> 
>> On Colibri VF50 I got slightly different errors:
>> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
>> [   12.455273] Unable to handle kernel NULL pointer dereference at
>> virtual address 0004
>> and only with some specific GCC (v6.3) or with other conditions which
>> I did not bisect yet. Maybe Colibri's failure is unrelated to that
>> commit.
>> 
>> Best regards,
>> Krzysztof

Thanks a lot for the report Krzysztof!!

>
> The BUG is due to a lock being freed when the fl_blocked list wasn't
> empty (implying that there were still blocked locks waiting on it).
>
> There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
> and I don't think the fl_blocked list is being handled properly with all
> of them. It only transplants the blocked locks to a new lock when there
> are surviving locks on the list, and that may not be the case when the
> whole file is being unlocked.

locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls
locks_wake_up_block() which doesn't only wake_up the blocks, but also
detached them. When that function completes, ->fl_blocked must be empty.

The trace shows the locks_free_lock() call at the end of fcntl_setlk64()
as the problematic call.
This suggests that do_lock_file_wait() exited with ->fl_blocked
non-empty, which it shouldn't.

I think we need to insert a call to locks_wake_up_block() in
do_lock_file_wait() before it returns.
I cannot find a sequence that would make this necessary, but
it isn't surprising that there might be one.

I'll dig through the code a bit more later and make sure I understand
what is happening.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread Jeff Layton
On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> ("fs/locks: allow a lock request to block other requests.") to failure
> boot of NFSv4 with root on several boards.
> 
> Log is here:
> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> 
> With several errors:
> kernel BUG at ../fs/locks.c:336!
> Unable to handle kernel NULL pointer dereference at virtual address 0004
> 
> Configuration:
> 1. exynos_defconfig
> 2. Arch ARM Linux
> 3. Boards:
> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> 4. Systemd: v236, 238
> 5. All boards boot from TFTP with NFS root (NFSv4)
> 
> On Colibri VF50 I got slightly different errors:
> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> [   12.455273] Unable to handle kernel NULL pointer dereference at
> virtual address 0004
> and only with some specific GCC (v6.3) or with other conditions which
> I did not bisect yet. Maybe Colibri's failure is unrelated to that
> commit.
> 
> Best regards,
> Krzysztof

The BUG is due to a lock being freed when the fl_blocked list wasn't
empty (implying that there were still blocked locks waiting on it).

There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
and I don't think the fl_blocked list is being handled properly with all
of them. It only transplants the blocked locks to a new lock when there
are surviving locks on the list, and that may not be the case when the
whole file is being unlocked.

-- 
Jeff Layton 



Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread Jeff Layton
On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> ("fs/locks: allow a lock request to block other requests.") to failure
> boot of NFSv4 with root on several boards.
> 
> Log is here:
> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> 
> With several errors:
> kernel BUG at ../fs/locks.c:336!
> Unable to handle kernel NULL pointer dereference at virtual address 0004
> 
> Configuration:
> 1. exynos_defconfig
> 2. Arch ARM Linux
> 3. Boards:
> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> 4. Systemd: v236, 238
> 5. All boards boot from TFTP with NFS root (NFSv4)
> 
> On Colibri VF50 I got slightly different errors:
> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> [   12.455273] Unable to handle kernel NULL pointer dereference at
> virtual address 0004
> and only with some specific GCC (v6.3) or with other conditions which
> I did not bisect yet. Maybe Colibri's failure is unrelated to that
> commit.
> 
> Best regards,
> Krzysztof

The BUG is due to a lock being freed when the fl_blocked list wasn't
empty (implying that there were still blocked locks waiting on it).

There are a number of calls to locks_delete_lock_ctx in posix_lock_inode
and I don't think the fl_blocked list is being handled properly with all
of them. It only transplants the blocked locks to a new lock when there
are surviving locks on the list, and that may not be the case when the
whole file is being unlocked.

-- 
Jeff Layton 



Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread Jeff Layton
On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> ("fs/locks: allow a lock request to block other requests.") to failure
> boot of NFSv4 with root on several boards.
> 
> Log is here:
> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> 
> With several errors:
> kernel BUG at ../fs/locks.c:336!
> Unable to handle kernel NULL pointer dereference at virtual address 0004
> 
> Configuration:
> 1. exynos_defconfig
> 2. Arch ARM Linux
> 3. Boards:
> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> 4. Systemd: v236, 238
> 5. All boards boot from TFTP with NFS root (NFSv4)
> 
> On Colibri VF50 I got slightly different errors:
> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> [   12.455273] Unable to handle kernel NULL pointer dereference at
> virtual address 0004
> and only with some specific GCC (v6.3) or with other conditions which
> I did not bisect yet. Maybe Colibri's failure is unrelated to that
> commit.
> 
> Best regards,
> Krzysztof

Thanks Krzysztof,

I or Neil will see if this is reproducible and see about coming up with
a fix. For now, I'll take this out of -next.

Thanks,
-- 
Jeff Layton 



Re: [BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread Jeff Layton
On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
> ("fs/locks: allow a lock request to block other requests.") to failure
> boot of NFSv4 with root on several boards.
> 
> Log is here:
> https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0
> 
> With several errors:
> kernel BUG at ../fs/locks.c:336!
> Unable to handle kernel NULL pointer dereference at virtual address 0004
> 
> Configuration:
> 1. exynos_defconfig
> 2. Arch ARM Linux
> 3. Boards:
> a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
> b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
> 4. Systemd: v236, 238
> 5. All boards boot from TFTP with NFS root (NFSv4)
> 
> On Colibri VF50 I got slightly different errors:
> [   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
> [   12.455273] Unable to handle kernel NULL pointer dereference at
> virtual address 0004
> and only with some specific GCC (v6.3) or with other conditions which
> I did not bisect yet. Maybe Colibri's failure is unrelated to that
> commit.
> 
> Best regards,
> Krzysztof

Thanks Krzysztof,

I or Neil will see if this is reproducible and see about coming up with
a fix. For now, I'll take this out of -next.

Thanks,
-- 
Jeff Layton 



[BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread Krzysztof Kozlowski
Hi,

Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
("fs/locks: allow a lock request to block other requests.") to failure
boot of NFSv4 with root on several boards.

Log is here:
https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0

With several errors:
kernel BUG at ../fs/locks.c:336!
Unable to handle kernel NULL pointer dereference at virtual address 0004

Configuration:
1. exynos_defconfig
2. Arch ARM Linux
3. Boards:
a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
4. Systemd: v236, 238
5. All boards boot from TFTP with NFS root (NFSv4)

On Colibri VF50 I got slightly different errors:
[   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
[   12.455273] Unable to handle kernel NULL pointer dereference at
virtual address 0004
and only with some specific GCC (v6.3) or with other conditions which
I did not bisect yet. Maybe Colibri's failure is unrelated to that
commit.

Best regards,
Krzysztof


[BUG][BISECT] NFSv4 root failures after "fs/locks: allow a lock request to block other requests."

2018-08-15 Thread Krzysztof Kozlowski
Hi,

Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d
("fs/locks: allow a lock request to block other requests.") to failure
boot of NFSv4 with root on several boards.

Log is here:
https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0

With several errors:
kernel BUG at ../fs/locks.c:336!
Unable to handle kernel NULL pointer dereference at virtual address 0004

Configuration:
1. exynos_defconfig
2. Arch ARM Linux
3. Boards:
a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC)
b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5)
4. Systemd: v236, 238
5. All boards boot from TFTP with NFS root (NFSv4)

On Colibri VF50 I got slightly different errors:
[   11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM
[   12.455273] Unable to handle kernel NULL pointer dereference at
virtual address 0004
and only with some specific GCC (v6.3) or with other conditions which
I did not bisect yet. Maybe Colibri's failure is unrelated to that
commit.

Best regards,
Krzysztof