RE: kernel panics with 4.14.X versions

2018-04-20 Thread Dexuan Cui
> From: Jan Kara 
> Sent: Friday, April 20, 2018 03:22
> On Thu 19-04-18 21:37:25, Dexuan Cui wrote:
> > > From: Jan Kara
> > > Sent: Thursday, April 19, 2018 13:23
> > > Good news guys, Robert has just spotted a bug which looks like what I'd
> > > expect can cause your lockups / crashes. I've merged his patch to my tree
> > > and will push it to Linus for -rc3 so eventually it should land in
> > > appropriate stable trees as well. If you are too eager to test it out, it
> > > is attached for you to try.
> > >
> > > Jan Kara
> >
> > The patch's changelog says "... this behavior results in a kernel panic."
> > This sounds like a reference to corrupt memory causes a page fault or
> > general protection fault.
> >
> > But what I saw is only a lockup rather than a kernel panic:
> > watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]"
> >
> > So I guess what I saw can be a different unresolved issue?
> 
> Actually I don't think so. The list iteration simply went through stray
> pointer. That can crash but it can also end in an infinite loop, or it can
> just randomly corrupt memory. I've seen all these situations with similar
> problems. So the fix is definitely worth trying.
> 
> Jan Kara

Thanks for the explanation! It sounds promising!

We haven't been able to reproduce the issue by ourselves.
If our customer still keeps the setup to reproduce the issue, we'll try to
test the patch. 

-- Dexuan
 


RE: kernel panics with 4.14.X versions

2018-04-20 Thread Dexuan Cui
> From: Jan Kara 
> Sent: Friday, April 20, 2018 03:22
> On Thu 19-04-18 21:37:25, Dexuan Cui wrote:
> > > From: Jan Kara
> > > Sent: Thursday, April 19, 2018 13:23
> > > Good news guys, Robert has just spotted a bug which looks like what I'd
> > > expect can cause your lockups / crashes. I've merged his patch to my tree
> > > and will push it to Linus for -rc3 so eventually it should land in
> > > appropriate stable trees as well. If you are too eager to test it out, it
> > > is attached for you to try.
> > >
> > > Jan Kara
> >
> > The patch's changelog says "... this behavior results in a kernel panic."
> > This sounds like a reference to corrupt memory causes a page fault or
> > general protection fault.
> >
> > But what I saw is only a lockup rather than a kernel panic:
> > watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]"
> >
> > So I guess what I saw can be a different unresolved issue?
> 
> Actually I don't think so. The list iteration simply went through stray
> pointer. That can crash but it can also end in an infinite loop, or it can
> just randomly corrupt memory. I've seen all these situations with similar
> problems. So the fix is definitely worth trying.
> 
> Jan Kara

Thanks for the explanation! It sounds promising!

We haven't been able to reproduce the issue by ourselves.
If our customer still keeps the setup to reproduce the issue, we'll try to
test the patch. 

-- Dexuan
 


Re: kernel panics with 4.14.X versions

2018-04-20 Thread Jan Kara
On Thu 19-04-18 21:37:25, Dexuan Cui wrote:
> > From: Jan Kara
> > Sent: Thursday, April 19, 2018 13:23
> > Good news guys, Robert has just spotted a bug which looks like what I'd
> > expect can cause your lockups / crashes. I've merged his patch to my tree
> > and will push it to Linus for -rc3 so eventually it should land in
> > appropriate stable trees as well. If you are too eager to test it out, it
> > is attached for you to try.
> > 
> > Jan Kara
> 
> The patch's changelog says "... this behavior results in a kernel panic." 
> This sounds like a reference to corrupt memory causes a page fault or
> general protection fault.
> 
> But what I saw is only a lockup rather than a kernel panic:
> watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]"
> 
> So I guess what I saw can be a different unresolved issue?

Actually I don't think so. The list iteration simply went through stray
pointer. That can crash but it can also end in an infinite loop, or it can
just randomly corrupt memory. I've seen all these situations with similar
problems. So the fix is definitely worth trying.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: kernel panics with 4.14.X versions

2018-04-20 Thread Jan Kara
On Thu 19-04-18 21:37:25, Dexuan Cui wrote:
> > From: Jan Kara
> > Sent: Thursday, April 19, 2018 13:23
> > Good news guys, Robert has just spotted a bug which looks like what I'd
> > expect can cause your lockups / crashes. I've merged his patch to my tree
> > and will push it to Linus for -rc3 so eventually it should land in
> > appropriate stable trees as well. If you are too eager to test it out, it
> > is attached for you to try.
> > 
> > Jan Kara
> 
> The patch's changelog says "... this behavior results in a kernel panic." 
> This sounds like a reference to corrupt memory causes a page fault or
> general protection fault.
> 
> But what I saw is only a lockup rather than a kernel panic:
> watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]"
> 
> So I guess what I saw can be a different unresolved issue?

Actually I don't think so. The list iteration simply went through stray
pointer. That can crash but it can also end in an infinite loop, or it can
just randomly corrupt memory. I've seen all these situations with similar
problems. So the fix is definitely worth trying.

Honza
-- 
Jan Kara 
SUSE Labs, CR


RE: kernel panics with 4.14.X versions

2018-04-19 Thread Dexuan Cui
> From: Jan Kara
> Sent: Thursday, April 19, 2018 13:23
> Good news guys, Robert has just spotted a bug which looks like what I'd
> expect can cause your lockups / crashes. I've merged his patch to my tree
> and will push it to Linus for -rc3 so eventually it should land in
> appropriate stable trees as well. If you are too eager to test it out, it
> is attached for you to try.
> 
> Jan Kara

The patch's changelog says "... this behavior results in a kernel panic." 
This sounds like a reference to corrupt memory causes a page fault or
general protection fault.

But what I saw is only a lockup rather than a kernel panic:
watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]"

So I guess what I saw can be a different unresolved issue?

-- Dexuan


RE: kernel panics with 4.14.X versions

2018-04-19 Thread Dexuan Cui
> From: Jan Kara
> Sent: Thursday, April 19, 2018 13:23
> Good news guys, Robert has just spotted a bug which looks like what I'd
> expect can cause your lockups / crashes. I've merged his patch to my tree
> and will push it to Linus for -rc3 so eventually it should land in
> appropriate stable trees as well. If you are too eager to test it out, it
> is attached for you to try.
> 
> Jan Kara

The patch's changelog says "... this behavior results in a kernel panic." 
This sounds like a reference to corrupt memory causes a page fault or
general protection fault.

But what I saw is only a lockup rather than a kernel panic:
watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]"

So I guess what I saw can be a different unresolved issue?

-- Dexuan


Re: kernel panics with 4.14.X versions

2018-04-19 Thread Robert Kolchmeyer
On Thu, Apr 19, 2018 at 2:16 PM, Pavlos Parissis
 wrote:
> On 19/04/2018 10:23 μμ, Jan Kara wrote:
>> On Wed 18-04-18 10:32:21, Pavlos Parissis wrote:
>>> On 17/04/2018 02:12 μμ, Jan Kara wrote:
 On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
> On 16/04/2018 04:40 μμ, Jan Kara wrote:

 

>> How easily can you hit this?
>
> Very easily, I only need to wait 1-2 days for a crash to occur.

 I wouldn't call that very easily but opinions may differ :). Anyway it's
 good (at least for debugging) that it's reproducible.

>>>
>>> Unfortunately, I can't reproduce it, so waiting 1-2 days is the only
>>> option I have.
>>
>> Good news guys, Robert has just spotted a bug which looks like what I'd
>> expect can cause your lockups / crashes. I've merged his patch to my tree
>> and will push it to Linus for -rc3 so eventually it should land in
>> appropriate stable trees as well. If you are too eager to test it out, it
>> is attached for you to try.
>>
>>   Honza
>>
>
> Can I apply it on top of 4.14.32?
> I am on vacation for the coming 11 days, so I can only provide feedback when 
> I am back.
>
> Thanks a lot for your support on this issue.
>
> Cheers,
> Pavlos
>

I expect that the patch will apply cleanly to 4.14.32.

-Robert


Re: kernel panics with 4.14.X versions

2018-04-19 Thread Robert Kolchmeyer
On Thu, Apr 19, 2018 at 2:16 PM, Pavlos Parissis
 wrote:
> On 19/04/2018 10:23 μμ, Jan Kara wrote:
>> On Wed 18-04-18 10:32:21, Pavlos Parissis wrote:
>>> On 17/04/2018 02:12 μμ, Jan Kara wrote:
 On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
> On 16/04/2018 04:40 μμ, Jan Kara wrote:

 

>> How easily can you hit this?
>
> Very easily, I only need to wait 1-2 days for a crash to occur.

 I wouldn't call that very easily but opinions may differ :). Anyway it's
 good (at least for debugging) that it's reproducible.

>>>
>>> Unfortunately, I can't reproduce it, so waiting 1-2 days is the only
>>> option I have.
>>
>> Good news guys, Robert has just spotted a bug which looks like what I'd
>> expect can cause your lockups / crashes. I've merged his patch to my tree
>> and will push it to Linus for -rc3 so eventually it should land in
>> appropriate stable trees as well. If you are too eager to test it out, it
>> is attached for you to try.
>>
>>   Honza
>>
>
> Can I apply it on top of 4.14.32?
> I am on vacation for the coming 11 days, so I can only provide feedback when 
> I am back.
>
> Thanks a lot for your support on this issue.
>
> Cheers,
> Pavlos
>

I expect that the patch will apply cleanly to 4.14.32.

-Robert


Re: kernel panics with 4.14.X versions

2018-04-19 Thread Pavlos Parissis
On 19/04/2018 10:23 μμ, Jan Kara wrote:
> On Wed 18-04-18 10:32:21, Pavlos Parissis wrote:
>> On 17/04/2018 02:12 μμ, Jan Kara wrote:
>>> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
 On 16/04/2018 04:40 μμ, Jan Kara wrote:
>>>
>>> 
>>>
> How easily can you hit this?

 Very easily, I only need to wait 1-2 days for a crash to occur.
>>>
>>> I wouldn't call that very easily but opinions may differ :). Anyway it's
>>> good (at least for debugging) that it's reproducible.
>>>
>>
>> Unfortunately, I can't reproduce it, so waiting 1-2 days is the only
>> option I have.
> 
> Good news guys, Robert has just spotted a bug which looks like what I'd
> expect can cause your lockups / crashes. I've merged his patch to my tree
> and will push it to Linus for -rc3 so eventually it should land in
> appropriate stable trees as well. If you are too eager to test it out, it
> is attached for you to try.
> 
>   Honza
> 

Can I apply it on top of 4.14.32?
I am on vacation for the coming 11 days, so I can only provide feedback when I 
am back.

Thanks a lot for your support on this issue.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: kernel panics with 4.14.X versions

2018-04-19 Thread Pavlos Parissis
On 19/04/2018 10:23 μμ, Jan Kara wrote:
> On Wed 18-04-18 10:32:21, Pavlos Parissis wrote:
>> On 17/04/2018 02:12 μμ, Jan Kara wrote:
>>> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
 On 16/04/2018 04:40 μμ, Jan Kara wrote:
>>>
>>> 
>>>
> How easily can you hit this?

 Very easily, I only need to wait 1-2 days for a crash to occur.
>>>
>>> I wouldn't call that very easily but opinions may differ :). Anyway it's
>>> good (at least for debugging) that it's reproducible.
>>>
>>
>> Unfortunately, I can't reproduce it, so waiting 1-2 days is the only
>> option I have.
> 
> Good news guys, Robert has just spotted a bug which looks like what I'd
> expect can cause your lockups / crashes. I've merged his patch to my tree
> and will push it to Linus for -rc3 so eventually it should land in
> appropriate stable trees as well. If you are too eager to test it out, it
> is attached for you to try.
> 
>   Honza
> 

Can I apply it on top of 4.14.32?
I am on vacation for the coming 11 days, so I can only provide feedback when I 
am back.

Thanks a lot for your support on this issue.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: kernel panics with 4.14.X versions

2018-04-19 Thread Jan Kara
On Wed 18-04-18 10:32:21, Pavlos Parissis wrote:
> On 17/04/2018 02:12 μμ, Jan Kara wrote:
> > On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
> >> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> > 
> > 
> > 
> >>> How easily can you hit this?
> >>
> >> Very easily, I only need to wait 1-2 days for a crash to occur.
> > 
> > I wouldn't call that very easily but opinions may differ :). Anyway it's
> > good (at least for debugging) that it's reproducible.
> > 
> 
> Unfortunately, I can't reproduce it, so waiting 1-2 days is the only
> option I have.

Good news guys, Robert has just spotted a bug which looks like what I'd
expect can cause your lockups / crashes. I've merged his patch to my tree
and will push it to Linus for -rc3 so eventually it should land in
appropriate stable trees as well. If you are too eager to test it out, it
is attached for you to try.

Honza

-- 
Jan Kara 
SUSE Labs, CR
>From d90a10e2444ba5a351fa695917258ff4c5709fa5 Mon Sep 17 00:00:00 2001
From: Robert Kolchmeyer 
Date: Thu, 19 Apr 2018 10:44:33 -0700
Subject: [PATCH] fsnotify: Fix fsnotify_mark_connector race

fsnotify() acquires a reference to a fsnotify_mark_connector through
the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it
appears that no precautions are taken in fsnotify_put_mark() to
ensure that fsnotify() drops its reference to this
fsnotify_mark_connector before assigning a value to its 'destroy_next'
field. This can result in fsnotify_put_mark() assigning a value
to a connector's 'destroy_next' field right before fsnotify() tries to
traverse the linked list referenced by the connector's 'list' field.
Since these two fields are members of the same union, this behavior
results in a kernel panic.

This issue is resolved by moving the connector's 'destroy_next' field
into the object pointer union. This should work since the object pointer
access is protected by both a spinlock and the value of the 'flags'
field, and the 'flags' field is cleared while holding the spinlock in
fsnotify_put_mark() before 'destroy_next' is updated. It shouldn't be
possible for another thread to accidentally read from the object pointer
after the 'destroy_next' field is updated.

The offending behavior here is extremely unlikely; since
fsnotify_put_mark() removes references to a connector (specifically,
it ensures that the connector is unreachable from the inode it was
formerly attached to) before updating its 'destroy_next' field, a
sizeable chunk of code in fsnotify_put_mark() has to execute in the
short window between when fsnotify() acquires the connector reference
and saves the value of its 'list' field. On the HEAD kernel, I've only
been able to reproduce this by inserting a udelay(1) in fsnotify().
However, I've been able to reproduce this issue without inserting a
udelay(1) anywhere on older unmodified release kernels, so I believe
it's worth fixing at HEAD.

References: https://bugzilla.kernel.org/show_bug.cgi?id=199437
Fixes: 08991e83b7286635167bab40927665a90fb00d81
CC: sta...@vger.kernel.org
Signed-off-by: Robert Kolchmeyer 
Signed-off-by: Jan Kara 
---
 include/linux/fsnotify_backend.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 9f1edb92c97e..a3d13d874fd1 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -217,12 +217,10 @@ struct fsnotify_mark_connector {
 	union {	/* Object pointer [lock] */
 		struct inode *inode;
 		struct vfsmount *mnt;
-	};
-	union {
-		struct hlist_head list;
 		/* Used listing heads to free after srcu period expires */
 		struct fsnotify_mark_connector *destroy_next;
 	};
+	struct hlist_head list;
 };
 
 /*
-- 
2.13.6



Re: kernel panics with 4.14.X versions

2018-04-19 Thread Jan Kara
On Wed 18-04-18 10:32:21, Pavlos Parissis wrote:
> On 17/04/2018 02:12 μμ, Jan Kara wrote:
> > On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
> >> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> > 
> > 
> > 
> >>> How easily can you hit this?
> >>
> >> Very easily, I only need to wait 1-2 days for a crash to occur.
> > 
> > I wouldn't call that very easily but opinions may differ :). Anyway it's
> > good (at least for debugging) that it's reproducible.
> > 
> 
> Unfortunately, I can't reproduce it, so waiting 1-2 days is the only
> option I have.

Good news guys, Robert has just spotted a bug which looks like what I'd
expect can cause your lockups / crashes. I've merged his patch to my tree
and will push it to Linus for -rc3 so eventually it should land in
appropriate stable trees as well. If you are too eager to test it out, it
is attached for you to try.

Honza

-- 
Jan Kara 
SUSE Labs, CR
>From d90a10e2444ba5a351fa695917258ff4c5709fa5 Mon Sep 17 00:00:00 2001
From: Robert Kolchmeyer 
Date: Thu, 19 Apr 2018 10:44:33 -0700
Subject: [PATCH] fsnotify: Fix fsnotify_mark_connector race

fsnotify() acquires a reference to a fsnotify_mark_connector through
the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it
appears that no precautions are taken in fsnotify_put_mark() to
ensure that fsnotify() drops its reference to this
fsnotify_mark_connector before assigning a value to its 'destroy_next'
field. This can result in fsnotify_put_mark() assigning a value
to a connector's 'destroy_next' field right before fsnotify() tries to
traverse the linked list referenced by the connector's 'list' field.
Since these two fields are members of the same union, this behavior
results in a kernel panic.

This issue is resolved by moving the connector's 'destroy_next' field
into the object pointer union. This should work since the object pointer
access is protected by both a spinlock and the value of the 'flags'
field, and the 'flags' field is cleared while holding the spinlock in
fsnotify_put_mark() before 'destroy_next' is updated. It shouldn't be
possible for another thread to accidentally read from the object pointer
after the 'destroy_next' field is updated.

The offending behavior here is extremely unlikely; since
fsnotify_put_mark() removes references to a connector (specifically,
it ensures that the connector is unreachable from the inode it was
formerly attached to) before updating its 'destroy_next' field, a
sizeable chunk of code in fsnotify_put_mark() has to execute in the
short window between when fsnotify() acquires the connector reference
and saves the value of its 'list' field. On the HEAD kernel, I've only
been able to reproduce this by inserting a udelay(1) in fsnotify().
However, I've been able to reproduce this issue without inserting a
udelay(1) anywhere on older unmodified release kernels, so I believe
it's worth fixing at HEAD.

References: https://bugzilla.kernel.org/show_bug.cgi?id=199437
Fixes: 08991e83b7286635167bab40927665a90fb00d81
CC: sta...@vger.kernel.org
Signed-off-by: Robert Kolchmeyer 
Signed-off-by: Jan Kara 
---
 include/linux/fsnotify_backend.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 9f1edb92c97e..a3d13d874fd1 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -217,12 +217,10 @@ struct fsnotify_mark_connector {
 	union {	/* Object pointer [lock] */
 		struct inode *inode;
 		struct vfsmount *mnt;
-	};
-	union {
-		struct hlist_head list;
 		/* Used listing heads to free after srcu period expires */
 		struct fsnotify_mark_connector *destroy_next;
 	};
+	struct hlist_head list;
 };
 
 /*
-- 
2.13.6



Re: kernel panics with 4.14.X versions

2018-04-18 Thread Pavlos Parissis
On 17/04/2018 02:12 μμ, Jan Kara wrote:
> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
>> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> 
> 
> 
>>> How easily can you hit this?
>>
>> Very easily, I only need to wait 1-2 days for a crash to occur.
> 
> I wouldn't call that very easily but opinions may differ :). Anyway it's
> good (at least for debugging) that it's reproducible.
> 

Unfortunately, I can't reproduce it, so waiting 1-2 days is the only option I 
have.

>>> Are you able to run debug kernels
>>
>> Well, I was under the impression I do as I have:
>>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>>   CONFIG_DEBUG_INFO=y
>>   # CONFIG_DEBUG_INFO_REDUCED is not set
>>   # CONFIG_DEBUG_INFO_SPLIT is not set
>>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>>   CONFIG_DEBUG_KERNEL=y
>>
>> Do you think that my kernel doesn't produce a proper crash dump?
>> I have a production cluster where I can run any kernel we need, so if I need
>> to compile again with different settings I can certainly do that.
> 
> OK, good. So please try running 4.16 as you mention below to verify whether
> this is just a -stable regression or also a problem in the current upstream
> kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
> apply on top of 4.14.32 so that we can debug this further.
> 
>>> / inspect
>>> crash dumps when the issue occurs?
>>
>> I can't do that as the server isn't responsive and I can only power cycle it.
> 
> Well, kernel crash dumps work in that situation as well - when the kernel
> panics, it will kexec into a new kernel and dump memory of the old kernel
> to disk. It can then be investigated with the 'crash' utility. But
> obviously you don't have this set up and don't have experience with this so
> let's go via a standard 'debug patch' route.
> 
>>> Also testing with the latest mainline
>>> kernel (4.16) would be welcome whether this isn't just an issue with the
>>> backport of fsnotify fixes from Miklos.
>>
>> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).
> 
> Yes, that would be good.
> 

I have production server running 4.16.2 and no kernel crash dumps yet.
Let's wait another day before we say anything.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: kernel panics with 4.14.X versions

2018-04-18 Thread Pavlos Parissis
On 17/04/2018 02:12 μμ, Jan Kara wrote:
> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
>> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> 
> 
> 
>>> How easily can you hit this?
>>
>> Very easily, I only need to wait 1-2 days for a crash to occur.
> 
> I wouldn't call that very easily but opinions may differ :). Anyway it's
> good (at least for debugging) that it's reproducible.
> 

Unfortunately, I can't reproduce it, so waiting 1-2 days is the only option I 
have.

>>> Are you able to run debug kernels
>>
>> Well, I was under the impression I do as I have:
>>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>>   CONFIG_DEBUG_INFO=y
>>   # CONFIG_DEBUG_INFO_REDUCED is not set
>>   # CONFIG_DEBUG_INFO_SPLIT is not set
>>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>>   CONFIG_DEBUG_KERNEL=y
>>
>> Do you think that my kernel doesn't produce a proper crash dump?
>> I have a production cluster where I can run any kernel we need, so if I need
>> to compile again with different settings I can certainly do that.
> 
> OK, good. So please try running 4.16 as you mention below to verify whether
> this is just a -stable regression or also a problem in the current upstream
> kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
> apply on top of 4.14.32 so that we can debug this further.
> 
>>> / inspect
>>> crash dumps when the issue occurs?
>>
>> I can't do that as the server isn't responsive and I can only power cycle it.
> 
> Well, kernel crash dumps work in that situation as well - when the kernel
> panics, it will kexec into a new kernel and dump memory of the old kernel
> to disk. It can then be investigated with the 'crash' utility. But
> obviously you don't have this set up and don't have experience with this so
> let's go via a standard 'debug patch' route.
> 
>>> Also testing with the latest mainline
>>> kernel (4.16) would be welcome whether this isn't just an issue with the
>>> backport of fsnotify fixes from Miklos.
>>
>> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).
> 
> Yes, that would be good.
> 

I have production server running 4.16.2 and no kernel crash dumps yet.
Let's wait another day before we say anything.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


RE: kernel panics with 4.14.X versions

2018-04-17 Thread Dexuan Cui
> From: Greg KH 
> Sent: Tuesday, April 17, 2018 03:34
> On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
> > > From: Jan Kara 
> > > Sent: Monday, April 16, 2018 07:41
> > > ...
> > > How easily can you hit this? Are you able to run debug kernels / inspect
> > > crash dumps when the issue occurs? Also testing with the latest mainline
> > > kernel (4.16) would be welcome whether this isn't just an issue with the
> > > backport of fsnotify fixes from Miklos.
> >
> > It's not easy for us to reproduce the fsnotify() lockup issue, and actually
> > we still don't have an easy & reliable way to reproduce it.
> >
> > According to our tests, v4.16 doesn't have the issue.
> > And v4.15 doesn't have the issue either, if I recall correctly.
> > I only know the issue happens to v4.14.x and 4.13.x kernels
> 
> Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?
> greg k-h

Sorry, I'm afraid not. 
It was only reproducible in a customer VM with a specific workload and it took
days to hit the issue. The customer VM doesn't exist any more. And we even
can't find a way to reproduce the issue with fio tests + programs that test 
fsnotify().

-- Dexuan


RE: kernel panics with 4.14.X versions

2018-04-17 Thread Dexuan Cui
> From: Greg KH 
> Sent: Tuesday, April 17, 2018 03:34
> On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
> > > From: Jan Kara 
> > > Sent: Monday, April 16, 2018 07:41
> > > ...
> > > How easily can you hit this? Are you able to run debug kernels / inspect
> > > crash dumps when the issue occurs? Also testing with the latest mainline
> > > kernel (4.16) would be welcome whether this isn't just an issue with the
> > > backport of fsnotify fixes from Miklos.
> >
> > It's not easy for us to reproduce the fsnotify() lockup issue, and actually
> > we still don't have an easy & reliable way to reproduce it.
> >
> > According to our tests, v4.16 doesn't have the issue.
> > And v4.15 doesn't have the issue either, if I recall correctly.
> > I only know the issue happens to v4.14.x and 4.13.x kernels
> 
> Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?
> greg k-h

Sorry, I'm afraid not. 
It was only reproducible in a customer VM with a specific workload and it took
days to hit the issue. The customer VM doesn't exist any more. And we even
can't find a way to reproduce the issue with fio tests + programs that test 
fsnotify().

-- Dexuan


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Jan Kara
On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
> On 16/04/2018 04:40 μμ, Jan Kara wrote:



> > How easily can you hit this?
> 
> Very easily, I only need to wait 1-2 days for a crash to occur.

I wouldn't call that very easily but opinions may differ :). Anyway it's
good (at least for debugging) that it's reproducible.

> > Are you able to run debug kernels
> 
> Well, I was under the impression I do as I have:
>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>   CONFIG_DEBUG_INFO=y
>   # CONFIG_DEBUG_INFO_REDUCED is not set
>   # CONFIG_DEBUG_INFO_SPLIT is not set
>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>   CONFIG_DEBUG_KERNEL=y
> 
> Do you think that my kernel doesn't produce a proper crash dump?
> I have a production cluster where I can run any kernel we need, so if I need
> to compile again with different settings I can certainly do that.

OK, good. So please try running 4.16 as you mention below to verify whether
this is just a -stable regression or also a problem in the current upstream
kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
apply on top of 4.14.32 so that we can debug this further.

> > / inspect
> > crash dumps when the issue occurs?
> 
> I can't do that as the server isn't responsive and I can only power cycle it.

Well, kernel crash dumps work in that situation as well - when the kernel
panics, it will kexec into a new kernel and dump memory of the old kernel
to disk. It can then be investigated with the 'crash' utility. But
obviously you don't have this set up and don't have experience with this so
let's go via a standard 'debug patch' route.

> > Also testing with the latest mainline
> > kernel (4.16) would be welcome whether this isn't just an issue with the
> > backport of fsnotify fixes from Miklos.
> 
> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).

Yes, that would be good.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Jan Kara
On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
> On 16/04/2018 04:40 μμ, Jan Kara wrote:



> > How easily can you hit this?
> 
> Very easily, I only need to wait 1-2 days for a crash to occur.

I wouldn't call that very easily but opinions may differ :). Anyway it's
good (at least for debugging) that it's reproducible.

> > Are you able to run debug kernels
> 
> Well, I was under the impression I do as I have:
>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>   CONFIG_DEBUG_INFO=y
>   # CONFIG_DEBUG_INFO_REDUCED is not set
>   # CONFIG_DEBUG_INFO_SPLIT is not set
>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>   CONFIG_DEBUG_KERNEL=y
> 
> Do you think that my kernel doesn't produce a proper crash dump?
> I have a production cluster where I can run any kernel we need, so if I need
> to compile again with different settings I can certainly do that.

OK, good. So please try running 4.16 as you mention below to verify whether
this is just a -stable regression or also a problem in the current upstream
kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
apply on top of 4.14.32 so that we can debug this further.

> > / inspect
> > crash dumps when the issue occurs?
> 
> I can't do that as the server isn't responsive and I can only power cycle it.

Well, kernel crash dumps work in that situation as well - when the kernel
panics, it will kexec into a new kernel and dump memory of the old kernel
to disk. It can then be investigated with the 'crash' utility. But
obviously you don't have this set up and don't have experience with this so
let's go via a standard 'debug patch' route.

> > Also testing with the latest mainline
> > kernel (4.16) would be welcome whether this isn't just an issue with the
> > backport of fsnotify fixes from Miklos.
> 
> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).

Yes, that would be good.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Jan Kara
On Tue 17-04-18 13:18:46, Pavlos Parissis wrote:
> On 17/04/2018 01:31 πμ, Pavlos Parissis wrote:
> > On 16/04/2018 04:40 μμ, Jan Kara wrote:
> >> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
> >>> Fwiw, there have been already reports of similar soft lockups in
> >>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
> >>>
> >>> We have also noticed similar softlockups with 4.14.22 here.
> >>
> >> Yeah.
> >>  
> >>> On 16 Apr 13:54, Pavlos Parissis wrote:
> 
>  Hi all,
> 
> > 
> > [..snip..]
> > 
>  [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
>  [kube-apiserver:24261]
>  [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
>  dccp tcp_diag udp_diag
>  inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs 
>  libcrc32c loop x86_pkg_temp_thermal
>  intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
>  crc32_pclmul ghash_clmulni_intel
>  pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
>  intel_rapl_perf iTCO_wdt ses
>  iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
>  ipmi_devintf lpc_ich sg mei
>  ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
>  auth_rpcgss nfs_acl lockd grace
>  sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper 
>  syscopyarea sysfillrect sysimgblt
>  fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
>  mpt3sas ptp libata raid_class
>  pps_core scsi_transport_sas
>  [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
>  [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
>  4.14.32-1.el7.x86_64 #1
>  [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 
>  2.4.3 01/17/2017
>  [373782.567486] task: 882f66d28000 task.stack: c9002120c000
>  [373782.583441] RIP: 0010:fsnotify+0x197/0x510
>  [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
>  ff10
>  [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
>  0002
>  [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
>  8269a4e0
>  [373782.650616] RBP: c9002120fe98 R08:  R09: 
>  
>  [373782.668287] R10:  R11:  R12: 
>  
>  [373782.685918] R13:  R14:  R15: 
>  
>  [373782.703302] FS:  00c42009f090() GS:882fbf90() 
>  knlGS:
>  [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
>  [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
>  003606e0
>  [373782.755247] DR0:  DR1:  DR2: 
>  
>  [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
>  0400
>  [373782.790043] Call Trace:
>  [373782.802041]  vfs_write+0x151/0x1b0
>  [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
>  [373782.829175]  SyS_write+0x55/0xc0
>  [373782.841870]  do_syscall_64+0x79/0x1b0
>  [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >>
> >> Can you please run RIP through ./scripts/faddr2line to see where exactly
> >> are we looping? I expect the loop iterating over marks to notify but better
> >> be sure.
> >>
> > 
> > I am very newbie on this and I tried with:
> >  ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
> > 0010:fsnotify+0x197/0x510
> > readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
> > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> > no match for 0010:fsnotify+0x197/0x510
> > 
> > Obviously, I am doing something very wrong.
> > 
> 
> I produced an uncompressed image(the error above caused by giving a
> compressed image to faddr2line) by compiling 4.14.32 with config which we
> have in production and now faddr2line reports:
> 
>  ../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510
> no match for 0010:fsnotify+0x197/0x510
> 
> 
>  ../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510
> 
> 
> 
> skipping fsnotify address at 0x8129baf7 due to size mismatch (0x510 
> != 0x520)
> no match for fsnotify+0x197/0x510
> 
> what am I doing wrong?

Apparently the 

Re: kernel panics with 4.14.X versions

2018-04-17 Thread Jan Kara
On Tue 17-04-18 13:18:46, Pavlos Parissis wrote:
> On 17/04/2018 01:31 πμ, Pavlos Parissis wrote:
> > On 16/04/2018 04:40 μμ, Jan Kara wrote:
> >> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
> >>> Fwiw, there have been already reports of similar soft lockups in
> >>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
> >>>
> >>> We have also noticed similar softlockups with 4.14.22 here.
> >>
> >> Yeah.
> >>  
> >>> On 16 Apr 13:54, Pavlos Parissis wrote:
> 
>  Hi all,
> 
> > 
> > [..snip..]
> > 
>  [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
>  [kube-apiserver:24261]
>  [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
>  dccp tcp_diag udp_diag
>  inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs 
>  libcrc32c loop x86_pkg_temp_thermal
>  intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
>  crc32_pclmul ghash_clmulni_intel
>  pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
>  intel_rapl_perf iTCO_wdt ses
>  iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
>  ipmi_devintf lpc_ich sg mei
>  ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
>  auth_rpcgss nfs_acl lockd grace
>  sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper 
>  syscopyarea sysfillrect sysimgblt
>  fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
>  mpt3sas ptp libata raid_class
>  pps_core scsi_transport_sas
>  [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
>  [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
>  4.14.32-1.el7.x86_64 #1
>  [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 
>  2.4.3 01/17/2017
>  [373782.567486] task: 882f66d28000 task.stack: c9002120c000
>  [373782.583441] RIP: 0010:fsnotify+0x197/0x510
>  [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
>  ff10
>  [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
>  0002
>  [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
>  8269a4e0
>  [373782.650616] RBP: c9002120fe98 R08:  R09: 
>  
>  [373782.668287] R10:  R11:  R12: 
>  
>  [373782.685918] R13:  R14:  R15: 
>  
>  [373782.703302] FS:  00c42009f090() GS:882fbf90() 
>  knlGS:
>  [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
>  [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
>  003606e0
>  [373782.755247] DR0:  DR1:  DR2: 
>  
>  [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
>  0400
>  [373782.790043] Call Trace:
>  [373782.802041]  vfs_write+0x151/0x1b0
>  [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
>  [373782.829175]  SyS_write+0x55/0xc0
>  [373782.841870]  do_syscall_64+0x79/0x1b0
>  [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >>
> >> Can you please run RIP through ./scripts/faddr2line to see where exactly
> >> are we looping? I expect the loop iterating over marks to notify but better
> >> be sure.
> >>
> > 
> > I am very newbie on this and I tried with:
> >  ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
> > 0010:fsnotify+0x197/0x510
> > readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
> > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> > IMAGE_SCN_MEM_NOT_PAGED in section .bss
> > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> > no match for 0010:fsnotify+0x197/0x510
> > 
> > Obviously, I am doing something very wrong.
> > 
> 
> I produced an uncompressed image(the error above caused by giving a
> compressed image to faddr2line) by compiling 4.14.32 with config which we
> have in production and now faddr2line reports:
> 
>  ../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510
> no match for 0010:fsnotify+0x197/0x510
> 
> 
>  ../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510
> 
> 
> 
> skipping fsnotify address at 0x8129baf7 due to size mismatch (0x510 
> != 0x520)
> no match for fsnotify+0x197/0x510
> 
> what am I doing wrong?

Apparently the 

Re: kernel panics with 4.14.X versions

2018-04-17 Thread Jan Kara
On Tue 17-04-18 14:48:35, Amir Goldstein wrote:
> On Tue, Apr 17, 2018 at 1:33 PM, Greg KH  wrote:
> > On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
> >> > From: Jan Kara 
> >> > Sent: Monday, April 16, 2018 07:41
> >> > ...
> >> > How easily can you hit this? Are you able to run debug kernels / inspect
> >> > crash dumps when the issue occurs? Also testing with the latest mainline
> >> > kernel (4.16) would be welcome whether this isn't just an issue with the
> >> > backport of fsnotify fixes from Miklos.
> >>
> >> It's not easy for us to reproduce the fsnotify() lockup issue, and actually
> >> we still don't have an easy & reliable way to reproduce it.
> >>
> >> According to our tests, v4.16 doesn't have the issue.
> >> And v4.15 doesn't have the issue either, if I recall correctly.
> >> I only know the issue happens to v4.14.x and 4.13.x kernels
> >
> > Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?
> >
> 
> Looking at the changes between 4.14 and 4.15, that are not in 4.14.32,
> the only viable suspects are:
> 9cf90cef362d fsnotify: Protect bail out path of fsnotify_add_mark_locked()
> properly
> 3427ce715541 fsnotify: clean up fsnotify()
> 
> Both don't claim to fix a known issue.

Yeah, and the second one is just a code refactorization and I don't see how
the first fix could lead to anything like what's reported. So I don't think
picking these to 4.14 stable is really the right solution. We first need to
understand what's going wrong.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Jan Kara
On Tue 17-04-18 14:48:35, Amir Goldstein wrote:
> On Tue, Apr 17, 2018 at 1:33 PM, Greg KH  wrote:
> > On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
> >> > From: Jan Kara 
> >> > Sent: Monday, April 16, 2018 07:41
> >> > ...
> >> > How easily can you hit this? Are you able to run debug kernels / inspect
> >> > crash dumps when the issue occurs? Also testing with the latest mainline
> >> > kernel (4.16) would be welcome whether this isn't just an issue with the
> >> > backport of fsnotify fixes from Miklos.
> >>
> >> It's not easy for us to reproduce the fsnotify() lockup issue, and actually
> >> we still don't have an easy & reliable way to reproduce it.
> >>
> >> According to our tests, v4.16 doesn't have the issue.
> >> And v4.15 doesn't have the issue either, if I recall correctly.
> >> I only know the issue happens to v4.14.x and 4.13.x kernels
> >
> > Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?
> >
> 
> Looking at the changes between 4.14 and 4.15, that are not in 4.14.32,
> the only viable suspects are:
> 9cf90cef362d fsnotify: Protect bail out path of fsnotify_add_mark_locked()
> properly
> 3427ce715541 fsnotify: clean up fsnotify()
> 
> Both don't claim to fix a known issue.

Yeah, and the second one is just a code refactorization and I don't see how
the first fix could lead to anything like what's reported. So I don't think
picking these to 4.14 stable is really the right solution. We first need to
understand what's going wrong.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Amir Goldstein
On Tue, Apr 17, 2018 at 1:33 PM, Greg KH  wrote:
> On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
>> > From: Jan Kara 
>> > Sent: Monday, April 16, 2018 07:41
>> > ...
>> > How easily can you hit this? Are you able to run debug kernels / inspect
>> > crash dumps when the issue occurs? Also testing with the latest mainline
>> > kernel (4.16) would be welcome whether this isn't just an issue with the
>> > backport of fsnotify fixes from Miklos.
>>
>> It's not easy for us to reproduce the fsnotify() lockup issue, and actually
>> we still don't have an easy & reliable way to reproduce it.
>>
>> According to our tests, v4.16 doesn't have the issue.
>> And v4.15 doesn't have the issue either, if I recall correctly.
>> I only know the issue happens to v4.14.x and 4.13.x kernels
>
> Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?
>

Looking at the changes between 4.14 and 4.15, that are not in 4.14.32,
the only viable suspects are:
9cf90cef362d fsnotify: Protect bail out path of fsnotify_add_mark_locked()
properly
3427ce715541 fsnotify: clean up fsnotify()

Both don't claim to fix a known issue.
Both look like they could be applied cleanly to 4.14.y.

Thanks,
Amir.


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Amir Goldstein
On Tue, Apr 17, 2018 at 1:33 PM, Greg KH  wrote:
> On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
>> > From: Jan Kara 
>> > Sent: Monday, April 16, 2018 07:41
>> > ...
>> > How easily can you hit this? Are you able to run debug kernels / inspect
>> > crash dumps when the issue occurs? Also testing with the latest mainline
>> > kernel (4.16) would be welcome whether this isn't just an issue with the
>> > backport of fsnotify fixes from Miklos.
>>
>> It's not easy for us to reproduce the fsnotify() lockup issue, and actually
>> we still don't have an easy & reliable way to reproduce it.
>>
>> According to our tests, v4.16 doesn't have the issue.
>> And v4.15 doesn't have the issue either, if I recall correctly.
>> I only know the issue happens to v4.14.x and 4.13.x kernels
>
> Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?
>

Looking at the changes between 4.14 and 4.15, that are not in 4.14.32,
the only viable suspects are:
9cf90cef362d fsnotify: Protect bail out path of fsnotify_add_mark_locked()
properly
3427ce715541 fsnotify: clean up fsnotify()

Both don't claim to fix a known issue.
Both look like they could be applied cleanly to 4.14.y.

Thanks,
Amir.


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Pavlos Parissis
On 17/04/2018 01:31 πμ, Pavlos Parissis wrote:
> On 16/04/2018 04:40 μμ, Jan Kara wrote:
>> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
>>> Fwiw, there have been already reports of similar soft lockups in
>>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>>>
>>> We have also noticed similar softlockups with 4.14.22 here.
>>
>> Yeah.
>>  
>>> On 16 Apr 13:54, Pavlos Parissis wrote:

 Hi all,

> 
> [..snip..]
> 
 [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
 [kube-apiserver:24261]
 [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
 dccp tcp_diag udp_diag
 inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c 
 loop x86_pkg_temp_thermal
 intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
 crc32_pclmul ghash_clmulni_intel
 pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
 intel_rapl_perf iTCO_wdt ses
 iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
 ipmi_devintf lpc_ich sg mei
 ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
 auth_rpcgss nfs_acl lockd grace
 sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea 
 sysfillrect sysimgblt
 fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
 mpt3sas ptp libata raid_class
 pps_core scsi_transport_sas
 [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
 [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
 4.14.32-1.el7.x86_64 #1
 [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 
 01/17/2017
 [373782.567486] task: 882f66d28000 task.stack: c9002120c000
 [373782.583441] RIP: 0010:fsnotify+0x197/0x510
 [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
 ff10
 [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
 0002
 [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
 8269a4e0
 [373782.650616] RBP: c9002120fe98 R08:  R09: 
 
 [373782.668287] R10:  R11:  R12: 
 
 [373782.685918] R13:  R14:  R15: 
 
 [373782.703302] FS:  00c42009f090() GS:882fbf90() 
 knlGS:
 [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
 [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
 003606e0
 [373782.755247] DR0:  DR1:  DR2: 
 
 [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
 0400
 [373782.790043] Call Trace:
 [373782.802041]  vfs_write+0x151/0x1b0
 [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
 [373782.829175]  SyS_write+0x55/0xc0
 [373782.841870]  do_syscall_64+0x79/0x1b0
 [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> Can you please run RIP through ./scripts/faddr2line to see where exactly
>> are we looping? I expect the loop iterating over marks to notify but better
>> be sure.
>>
> 
> I am very newbie on this and I tried with:
>  ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
> 0010:fsnotify+0x197/0x510
> readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
> size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> no match for 0010:fsnotify+0x197/0x510
> 
> Obviously, I am doing something very wrong.
> 

I produced an uncompressed image(the error above caused by giving a compressed
image to faddr2line) by compiling 4.14.32 with config which we have in 
production
and now faddr2line reports:

 ../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510
no match for 0010:fsnotify+0x197/0x510


 ../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510



skipping fsnotify address at 0x8129baf7 due to size mismatch (0x510 != 
0x520)
no match for fsnotify+0x197/0x510

what am I doing wrong?

Cheers,
Pavlos




signature.asc
Description: OpenPGP digital signature


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Pavlos Parissis
On 17/04/2018 01:31 πμ, Pavlos Parissis wrote:
> On 16/04/2018 04:40 μμ, Jan Kara wrote:
>> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
>>> Fwiw, there have been already reports of similar soft lockups in
>>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>>>
>>> We have also noticed similar softlockups with 4.14.22 here.
>>
>> Yeah.
>>  
>>> On 16 Apr 13:54, Pavlos Parissis wrote:

 Hi all,

> 
> [..snip..]
> 
 [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
 [kube-apiserver:24261]
 [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
 dccp tcp_diag udp_diag
 inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c 
 loop x86_pkg_temp_thermal
 intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
 crc32_pclmul ghash_clmulni_intel
 pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
 intel_rapl_perf iTCO_wdt ses
 iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
 ipmi_devintf lpc_ich sg mei
 ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
 auth_rpcgss nfs_acl lockd grace
 sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea 
 sysfillrect sysimgblt
 fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
 mpt3sas ptp libata raid_class
 pps_core scsi_transport_sas
 [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
 [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
 4.14.32-1.el7.x86_64 #1
 [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 
 01/17/2017
 [373782.567486] task: 882f66d28000 task.stack: c9002120c000
 [373782.583441] RIP: 0010:fsnotify+0x197/0x510
 [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
 ff10
 [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
 0002
 [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
 8269a4e0
 [373782.650616] RBP: c9002120fe98 R08:  R09: 
 
 [373782.668287] R10:  R11:  R12: 
 
 [373782.685918] R13:  R14:  R15: 
 
 [373782.703302] FS:  00c42009f090() GS:882fbf90() 
 knlGS:
 [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
 [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
 003606e0
 [373782.755247] DR0:  DR1:  DR2: 
 
 [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
 0400
 [373782.790043] Call Trace:
 [373782.802041]  vfs_write+0x151/0x1b0
 [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
 [373782.829175]  SyS_write+0x55/0xc0
 [373782.841870]  do_syscall_64+0x79/0x1b0
 [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> Can you please run RIP through ./scripts/faddr2line to see where exactly
>> are we looping? I expect the loop iterating over marks to notify but better
>> be sure.
>>
> 
> I am very newbie on this and I tried with:
>  ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
> 0010:fsnotify+0x197/0x510
> readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
> size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> no match for 0010:fsnotify+0x197/0x510
> 
> Obviously, I am doing something very wrong.
> 

I produced an uncompressed image(the error above caused by giving a compressed
image to faddr2line) by compiling 4.14.32 with config which we have in 
production
and now faddr2line reports:

 ../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510
no match for 0010:fsnotify+0x197/0x510


 ../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510



skipping fsnotify address at 0x8129baf7 due to size mismatch (0x510 != 
0x520)
no match for fsnotify+0x197/0x510

what am I doing wrong?

Cheers,
Pavlos




signature.asc
Description: OpenPGP digital signature


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
> > From: Jan Kara 
> > Sent: Monday, April 16, 2018 07:41
> > ...
> > How easily can you hit this? Are you able to run debug kernels / inspect
> > crash dumps when the issue occurs? Also testing with the latest mainline
> > kernel (4.16) would be welcome whether this isn't just an issue with the
> > backport of fsnotify fixes from Miklos.
> 
> It's not easy for us to reproduce the fsnotify() lockup issue, and actually
> we still don't have an easy & reliable way to reproduce it.
> 
> According to our tests, v4.16 doesn't have the issue.
> And v4.15 doesn't have the issue either, if I recall correctly.
> I only know the issue happens to v4.14.x and 4.13.x kernels

Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?

thanks,

greg k-h


Re: kernel panics with 4.14.X versions

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 09:10:35PM +, Dexuan Cui wrote:
> > From: Jan Kara 
> > Sent: Monday, April 16, 2018 07:41
> > ...
> > How easily can you hit this? Are you able to run debug kernels / inspect
> > crash dumps when the issue occurs? Also testing with the latest mainline
> > kernel (4.16) would be welcome whether this isn't just an issue with the
> > backport of fsnotify fixes from Miklos.
> 
> It's not easy for us to reproduce the fsnotify() lockup issue, and actually
> we still don't have an easy & reliable way to reproduce it.
> 
> According to our tests, v4.16 doesn't have the issue.
> And v4.15 doesn't have the issue either, if I recall correctly.
> I only know the issue happens to v4.14.x and 4.13.x kernels

Any chance to run 'git bisect' between 4.14 and 4.15 to find the fix?

thanks,

greg k-h


Re: kernel panics with 4.14.X versions

2018-04-16 Thread Pavlos Parissis
On 16/04/2018 04:40 μμ, Jan Kara wrote:
> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
>> Fwiw, there have been already reports of similar soft lockups in
>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>>
>> We have also noticed similar softlockups with 4.14.22 here.
> 
> Yeah.
>  
>> On 16 Apr 13:54, Pavlos Parissis wrote:
>>>
>>> Hi all,
>>>

[..snip..]

>>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
>>> [kube-apiserver:24261]
>>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
>>> dccp tcp_diag udp_diag
>>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c 
>>> loop x86_pkg_temp_thermal
>>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
>>> crc32_pclmul ghash_clmulni_intel
>>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
>>> intel_rapl_perf iTCO_wdt ses
>>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
>>> ipmi_devintf lpc_ich sg mei
>>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
>>> auth_rpcgss nfs_acl lockd grace
>>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea 
>>> sysfillrect sysimgblt
>>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
>>> mpt3sas ptp libata raid_class
>>> pps_core scsi_transport_sas
>>> [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
>>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
>>> 4.14.32-1.el7.x86_64 #1
>>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 
>>> 01/17/2017
>>> [373782.567486] task: 882f66d28000 task.stack: c9002120c000
>>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510
>>> [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
>>> ff10
>>> [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
>>> 0002
>>> [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
>>> 8269a4e0
>>> [373782.650616] RBP: c9002120fe98 R08:  R09: 
>>> 
>>> [373782.668287] R10:  R11:  R12: 
>>> 
>>> [373782.685918] R13:  R14:  R15: 
>>> 
>>> [373782.703302] FS:  00c42009f090() GS:882fbf90() 
>>> knlGS:
>>> [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
>>> [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
>>> 003606e0
>>> [373782.755247] DR0:  DR1:  DR2: 
>>> 
>>> [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
>>> 0400
>>> [373782.790043] Call Trace:
>>> [373782.802041]  vfs_write+0x151/0x1b0
>>> [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
>>> [373782.829175]  SyS_write+0x55/0xc0
>>> [373782.841870]  do_syscall_64+0x79/0x1b0
>>> [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Can you please run RIP through ./scripts/faddr2line to see where exactly
> are we looping? I expect the loop iterating over marks to notify but better
> be sure.
> 

I am very newbie on this and I tried with:
 ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
0010:fsnotify+0x197/0x510
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
no match for 0010:fsnotify+0x197/0x510

Obviously, I am doing something very wrong.

> How easily can you hit this?

Very easily, I only need to wait 1-2 days for a crash to occur.

> Are you able to run debug kernels

Well, I was under the impression I do as I have:
  grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
  CONFIG_DEBUG_INFO=y
  # CONFIG_DEBUG_INFO_REDUCED is not set
  # CONFIG_DEBUG_INFO_SPLIT is not set
  # CONFIG_DEBUG_INFO_DWARF4 is not set
  CONFIG_DEBUG_KERNEL=y

Do you think that my kernel doesn't produce a proper crash dump?
I have a production cluster where I can run any kernel we need, so if I need
to compile again with different settings I can certainly do that.

> / inspect
> crash dumps when the issue occurs?

I can't do that as the server isn't responsive and I can only power cycle it.

> Also testing with the latest mainline
> kernel (4.16) would be welcome whether this isn't just an issue with the
> backport of fsnotify fixes from Miklos.

I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).


Re: kernel panics with 4.14.X versions

2018-04-16 Thread Pavlos Parissis
On 16/04/2018 04:40 μμ, Jan Kara wrote:
> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
>> Fwiw, there have been already reports of similar soft lockups in
>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>>
>> We have also noticed similar softlockups with 4.14.22 here.
> 
> Yeah.
>  
>> On 16 Apr 13:54, Pavlos Parissis wrote:
>>>
>>> Hi all,
>>>

[..snip..]

>>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
>>> [kube-apiserver:24261]
>>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
>>> dccp tcp_diag udp_diag
>>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c 
>>> loop x86_pkg_temp_thermal
>>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
>>> crc32_pclmul ghash_clmulni_intel
>>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
>>> intel_rapl_perf iTCO_wdt ses
>>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
>>> ipmi_devintf lpc_ich sg mei
>>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
>>> auth_rpcgss nfs_acl lockd grace
>>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea 
>>> sysfillrect sysimgblt
>>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
>>> mpt3sas ptp libata raid_class
>>> pps_core scsi_transport_sas
>>> [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
>>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
>>> 4.14.32-1.el7.x86_64 #1
>>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 
>>> 01/17/2017
>>> [373782.567486] task: 882f66d28000 task.stack: c9002120c000
>>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510
>>> [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
>>> ff10
>>> [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
>>> 0002
>>> [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
>>> 8269a4e0
>>> [373782.650616] RBP: c9002120fe98 R08:  R09: 
>>> 
>>> [373782.668287] R10:  R11:  R12: 
>>> 
>>> [373782.685918] R13:  R14:  R15: 
>>> 
>>> [373782.703302] FS:  00c42009f090() GS:882fbf90() 
>>> knlGS:
>>> [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
>>> [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
>>> 003606e0
>>> [373782.755247] DR0:  DR1:  DR2: 
>>> 
>>> [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
>>> 0400
>>> [373782.790043] Call Trace:
>>> [373782.802041]  vfs_write+0x151/0x1b0
>>> [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
>>> [373782.829175]  SyS_write+0x55/0xc0
>>> [373782.841870]  do_syscall_64+0x79/0x1b0
>>> [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Can you please run RIP through ./scripts/faddr2line to see where exactly
> are we looping? I expect the loop iterating over marks to notify but better
> be sure.
> 

I am very newbie on this and I tried with:
 ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
0010:fsnotify+0x197/0x510
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
IMAGE_SCN_MEM_NOT_PAGED in section .bss
nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
no match for 0010:fsnotify+0x197/0x510

Obviously, I am doing something very wrong.

> How easily can you hit this?

Very easily, I only need to wait 1-2 days for a crash to occur.

> Are you able to run debug kernels

Well, I was under the impression I do as I have:
  grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
  CONFIG_DEBUG_INFO=y
  # CONFIG_DEBUG_INFO_REDUCED is not set
  # CONFIG_DEBUG_INFO_SPLIT is not set
  # CONFIG_DEBUG_INFO_DWARF4 is not set
  CONFIG_DEBUG_KERNEL=y

Do you think that my kernel doesn't produce a proper crash dump?
I have a production cluster where I can run any kernel we need, so if I need
to compile again with different settings I can certainly do that.

> / inspect
> crash dumps when the issue occurs?

I can't do that as the server isn't responsive and I can only power cycle it.

> Also testing with the latest mainline
> kernel (4.16) would be welcome whether this isn't just an issue with the
> backport of fsnotify fixes from Miklos.

I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).


RE: kernel panics with 4.14.X versions

2018-04-16 Thread Dexuan Cui
> From: Jan Kara 
> Sent: Monday, April 16, 2018 07:41
> ...
> How easily can you hit this? Are you able to run debug kernels / inspect
> crash dumps when the issue occurs? Also testing with the latest mainline
> kernel (4.16) would be welcome whether this isn't just an issue with the
> backport of fsnotify fixes from Miklos.

It's not easy for us to reproduce the fsnotify() lockup issue, and actually
we still don't have an easy & reliable way to reproduce it.

According to our tests, v4.16 doesn't have the issue.
And v4.15 doesn't have the issue either, if I recall correctly.
I only know the issue happens to v4.14.x and 4.13.x kernels

Thanks,
-- Dexuan


RE: kernel panics with 4.14.X versions

2018-04-16 Thread Dexuan Cui
> From: Jan Kara 
> Sent: Monday, April 16, 2018 07:41
> ...
> How easily can you hit this? Are you able to run debug kernels / inspect
> crash dumps when the issue occurs? Also testing with the latest mainline
> kernel (4.16) would be welcome whether this isn't just an issue with the
> backport of fsnotify fixes from Miklos.

It's not easy for us to reproduce the fsnotify() lockup issue, and actually
we still don't have an easy & reliable way to reproduce it.

According to our tests, v4.16 doesn't have the issue.
And v4.15 doesn't have the issue either, if I recall correctly.
I only know the issue happens to v4.14.x and 4.13.x kernels

Thanks,
-- Dexuan


Re: kernel panics with 4.14.X versions

2018-04-16 Thread Guillaume Morin
On 16 Apr 16:40, Jan Kara wrote:
> Can you please run RIP through ./scripts/faddr2line to see where exactly
> are we looping? I expect the loop iterating over marks to notify but better
> be sure.
> 
> How easily can you hit this? Are you able to run debug kernels / inspect
> crash dumps when the issue occurs? Also testing with the latest mainline
> kernel (4.16) would be welcome whether this isn't just an issue with the
> backport of fsnotify fixes from Miklos.

I do have one proper kernel crash dump for one of the lockups we saw

PID: 30407  TASK: 9584913b2180  CPU: 8   COMMAND: "python"
 #0 [959cb7883d80] machine_kexec at 890561ff
 #1 [959cb7883dd8] __crash_kexec at 890f6dde
 #2 [959cb7883e90] panic at 89074f03
 #3 [959cb7883f10] watchdog_timer_fn at 89117388
 #4 [959cb7883f40] __hrtimer_run_queues at 890dc65c
 #5 [959cb7883f88] hrtimer_interrupt at 890dcb76
 #6 [959cb7883fd8] smp_apic_timer_interrupt at 89802f6a
 #7 [959cb7883ff0] apic_timer_interrupt at 8980227d
---  ---
 #8 [afa5c894f880] apic_timer_interrupt at 8980227d
[exception RIP: unknown or invalid address]
RIP:   RSP: 8a696820  RFLAGS: 0002
RAX: 95908f520c20  RBX:   RCX: 
RDX: 959c83c4d000  RSI:   RDI: afa5c894f9f8
RBP: 53411000   R8:    R9: 95908f520c48
R10:   R11:   R12: 1000
R13: 1000  R14: 1000  R15: 5341
ORIG_RAX:   CS:   SS: ff10
bt: WARNING: possibly bogus exception frame
 #9 [afa5c894f928] fsnotify at 892293e7
#10 [afa5c894f9e8] __fsnotify_parent at 89229686
#11 [afa5c894fa48] __kernel_write at 891e9962
#12 [afa5c894fa70] dump_emit at 892445af
#13 [afa5c894faa8] elf_core_dump at 8923f546
#14 [afa5c894fc60] do_coredump at 89244c3f
#15 [afa5c894fda0] get_signal at 89083ed0
#16 [afa5c894fe18] do_signal at 89028323
#17 [afa5c894ff10] exit_to_usermode_loop at 8900308c
#18 [afa5c894ff38] prepare_exit_to_usermode at 89003753
RIP: 7f69706935c3  RSP: 7ffeb8c1b4a8  RFLAGS: 00010206
RAX: 7f686d200034  RBX: 5591f24f0170  RCX: 7f68cb80
RDX: 7f696d20  RSI: 0061  RDI: 7f686d200034
RBP: 7f686d200010   R8:    R9: 00ff
R10: e0a9a400  R11: 0246  R12: 0001
R13: 0001  R14:   R15: 0083
ORIG_RAX:   CS: 0033  SS: 002b

faddr2line gives "fsnotify at fs/notify/fsnotify.c:368" (it's a 4.14.22).  So
it does seem that you were right about the location.

This happens with systemd handling coredumps.  It's using fsnotify to learn
about new dumps.

Note that on this machine, the dumps are on a loop mount:
/dev/loop0 /usr/cores ext4 rw,relatime,data=ordered 0 0

-- 
Guillaume Morin 


Re: kernel panics with 4.14.X versions

2018-04-16 Thread Guillaume Morin
On 16 Apr 16:40, Jan Kara wrote:
> Can you please run RIP through ./scripts/faddr2line to see where exactly
> are we looping? I expect the loop iterating over marks to notify but better
> be sure.
> 
> How easily can you hit this? Are you able to run debug kernels / inspect
> crash dumps when the issue occurs? Also testing with the latest mainline
> kernel (4.16) would be welcome whether this isn't just an issue with the
> backport of fsnotify fixes from Miklos.

I do have one proper kernel crash dump for one of the lockups we saw

PID: 30407  TASK: 9584913b2180  CPU: 8   COMMAND: "python"
 #0 [959cb7883d80] machine_kexec at 890561ff
 #1 [959cb7883dd8] __crash_kexec at 890f6dde
 #2 [959cb7883e90] panic at 89074f03
 #3 [959cb7883f10] watchdog_timer_fn at 89117388
 #4 [959cb7883f40] __hrtimer_run_queues at 890dc65c
 #5 [959cb7883f88] hrtimer_interrupt at 890dcb76
 #6 [959cb7883fd8] smp_apic_timer_interrupt at 89802f6a
 #7 [959cb7883ff0] apic_timer_interrupt at 8980227d
---  ---
 #8 [afa5c894f880] apic_timer_interrupt at 8980227d
[exception RIP: unknown or invalid address]
RIP:   RSP: 8a696820  RFLAGS: 0002
RAX: 95908f520c20  RBX:   RCX: 
RDX: 959c83c4d000  RSI:   RDI: afa5c894f9f8
RBP: 53411000   R8:    R9: 95908f520c48
R10:   R11:   R12: 1000
R13: 1000  R14: 1000  R15: 5341
ORIG_RAX:   CS:   SS: ff10
bt: WARNING: possibly bogus exception frame
 #9 [afa5c894f928] fsnotify at 892293e7
#10 [afa5c894f9e8] __fsnotify_parent at 89229686
#11 [afa5c894fa48] __kernel_write at 891e9962
#12 [afa5c894fa70] dump_emit at 892445af
#13 [afa5c894faa8] elf_core_dump at 8923f546
#14 [afa5c894fc60] do_coredump at 89244c3f
#15 [afa5c894fda0] get_signal at 89083ed0
#16 [afa5c894fe18] do_signal at 89028323
#17 [afa5c894ff10] exit_to_usermode_loop at 8900308c
#18 [afa5c894ff38] prepare_exit_to_usermode at 89003753
RIP: 7f69706935c3  RSP: 7ffeb8c1b4a8  RFLAGS: 00010206
RAX: 7f686d200034  RBX: 5591f24f0170  RCX: 7f68cb80
RDX: 7f696d20  RSI: 0061  RDI: 7f686d200034
RBP: 7f686d200010   R8:    R9: 00ff
R10: e0a9a400  R11: 0246  R12: 0001
R13: 0001  R14:   R15: 0083
ORIG_RAX:   CS: 0033  SS: 002b

faddr2line gives "fsnotify at fs/notify/fsnotify.c:368" (it's a 4.14.22).  So
it does seem that you were right about the location.

This happens with systemd handling coredumps.  It's using fsnotify to learn
about new dumps.

Note that on this machine, the dumps are on a loop mount:
/dev/loop0 /usr/cores ext4 rw,relatime,data=ordered 0 0

-- 
Guillaume Morin 


Re: kernel panics with 4.14.X versions

2018-04-16 Thread Jan Kara
On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
> Fwiw, there have been already reports of similar soft lockups in
> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>
> We have also noticed similar softlockups with 4.14.22 here.

Yeah.
 
> On 16 Apr 13:54, Pavlos Parissis wrote:
> >
> > Hi all,
> > 
> > We have observed kernel panics on several master kubernetes clusters, where 
> > we run
> > kubernetes API services and not application workloads.
> > 
> > Those clusters use kernel version 4.14.14 and 4.14.32, but we switched 
> > everything
> > to kernel version 4.14.32 as a way to address the issue.
> > 
> > We have HP and Dell hardware on those clusters, and network cards are also 
> > different,
> > we have bnx2x and mlx5_core in use.
> > 
> > We also run kernel version 4.14.32 on different type of workloads, software 
> > load
> > balancing using HAProxy, and we don't have any crashes there.
> > 
> > Since the crash happens on different hardware, we think it could be a 
> > kernel issue,
> > but we aren't sure about it. Thus, I am contacting kernel people in order 
> > to get some
> > hint, which can help us to figure out what causes this.
> > 
> > In our kubernetes clusters, we have instructed the kernel to panic upon 
> > soft lockup,
> > we use 'kernel.softlockup_panic=1', 'kernel.hung_task_panic=1' and 
> > 'kernel.watchdog_thresh=10'.
> > Thus, we see the stack traces. Today, we have disabled this, later I will 
> > explain why.
> > 
> > I believe we have two discint types of panics, one is trigger upon soft 
> > lockup and another one
> > where the call trace is about scheduler("sched: Unexpected reschedule of 
> > offline CPU#8!)
> > 
> > 
> > Let me walk you through the kernel panics and some observations.
> > 
> > The followin series of stack traces are happening when one CPU (CPU 24) is 
> > stuck for ~22 seconds.
> > watchdog_thresh is set to 10 and as far as I remember softlockup threshold 
> > is (2 * watchdog_thresh),
> > so it makes sense to see the kernel crashing after ~20seconds.
> > 
> > After the stack trace, we have the output of sar for CPU#24 and we see that 
> > just before the
> > crash CPU utilization for system level went to 100%. Now let's move to 
> > another panic.
> > 
> > [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
> > [kube-apiserver:24261]
> > [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
> > dccp tcp_diag udp_diag
> > inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c 
> > loop x86_pkg_temp_thermal
> > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel
> > pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
> > intel_rapl_perf iTCO_wdt ses
> > iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
> > ipmi_devintf lpc_ich sg mei
> > ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
> > auth_rpcgss nfs_acl lockd grace
> > sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea 
> > sysfillrect sysimgblt
> > fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
> > mpt3sas ptp libata raid_class
> > pps_core scsi_transport_sas
> > [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
> > [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
> > 4.14.32-1.el7.x86_64 #1
> > [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 
> > 01/17/2017
> > [373782.567486] task: 882f66d28000 task.stack: c9002120c000
> > [373782.583441] RIP: 0010:fsnotify+0x197/0x510
> > [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
> > ff10
> > [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
> > 0002
> > [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
> > 8269a4e0
> > [373782.650616] RBP: c9002120fe98 R08:  R09: 
> > 
> > [373782.668287] R10:  R11:  R12: 
> > 
> > [373782.685918] R13:  R14:  R15: 
> > 
> > [373782.703302] FS:  00c42009f090() GS:882fbf90() 
> > knlGS:
> > [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
> > [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
> > 003606e0
> > [373782.755247] DR0:  DR1:  DR2: 
> > 
> > [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [373782.790043] Call Trace:
> > [373782.802041]  vfs_write+0x151/0x1b0
> > [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
> > [373782.829175]  SyS_write+0x55/0xc0
> > [373782.841870]  do_syscall_64+0x79/0x1b0
> > [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Can you please run RIP through ./scripts/faddr2line to see where exactly
are we looping? I expect 

Re: kernel panics with 4.14.X versions

2018-04-16 Thread Jan Kara
On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
> Fwiw, there have been already reports of similar soft lockups in
> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>
> We have also noticed similar softlockups with 4.14.22 here.

Yeah.
 
> On 16 Apr 13:54, Pavlos Parissis wrote:
> >
> > Hi all,
> > 
> > We have observed kernel panics on several master kubernetes clusters, where 
> > we run
> > kubernetes API services and not application workloads.
> > 
> > Those clusters use kernel version 4.14.14 and 4.14.32, but we switched 
> > everything
> > to kernel version 4.14.32 as a way to address the issue.
> > 
> > We have HP and Dell hardware on those clusters, and network cards are also 
> > different,
> > we have bnx2x and mlx5_core in use.
> > 
> > We also run kernel version 4.14.32 on different type of workloads, software 
> > load
> > balancing using HAProxy, and we don't have any crashes there.
> > 
> > Since the crash happens on different hardware, we think it could be a 
> > kernel issue,
> > but we aren't sure about it. Thus, I am contacting kernel people in order 
> > to get some
> > hint, which can help us to figure out what causes this.
> > 
> > In our kubernetes clusters, we have instructed the kernel to panic upon 
> > soft lockup,
> > we use 'kernel.softlockup_panic=1', 'kernel.hung_task_panic=1' and 
> > 'kernel.watchdog_thresh=10'.
> > Thus, we see the stack traces. Today, we have disabled this, later I will 
> > explain why.
> > 
> > I believe we have two discint types of panics, one is trigger upon soft 
> > lockup and another one
> > where the call trace is about scheduler("sched: Unexpected reschedule of 
> > offline CPU#8!)
> > 
> > 
> > Let me walk you through the kernel panics and some observations.
> > 
> > The followin series of stack traces are happening when one CPU (CPU 24) is 
> > stuck for ~22 seconds.
> > watchdog_thresh is set to 10 and as far as I remember softlockup threshold 
> > is (2 * watchdog_thresh),
> > so it makes sense to see the kernel crashing after ~20seconds.
> > 
> > After the stack trace, we have the output of sar for CPU#24 and we see that 
> > just before the
> > crash CPU utilization for system level went to 100%. Now let's move to 
> > another panic.
> > 
> > [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
> > [kube-apiserver:24261]
> > [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag 
> > dccp tcp_diag udp_diag
> > inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c 
> > loop x86_pkg_temp_thermal
> > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel
> > pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate 
> > intel_rapl_perf iTCO_wdt ses
> > iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr 
> > ipmi_devintf lpc_ich sg mei
> > ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd 
> > auth_rpcgss nfs_acl lockd grace
> > sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea 
> > sysfillrect sysimgblt
> > fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw 
> > mpt3sas ptp libata raid_class
> > pps_core scsi_transport_sas
> > [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
> > [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 
> > 4.14.32-1.el7.x86_64 #1
> > [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 
> > 01/17/2017
> > [373782.567486] task: 882f66d28000 task.stack: c9002120c000
> > [373782.583441] RIP: 0010:fsnotify+0x197/0x510
> > [373782.597319] RSP: 0018:c9002120fdb8 EFLAGS: 0286 ORIG_RAX: 
> > ff10
> > [373782.615308] RAX:  RBX: 882f9ec65c20 RCX: 
> > 0002
> > [373782.632950] RDX: 00028700 RSI: 0002 RDI: 
> > 8269a4e0
> > [373782.650616] RBP: c9002120fe98 R08:  R09: 
> > 
> > [373782.668287] R10:  R11:  R12: 
> > 
> > [373782.685918] R13:  R14:  R15: 
> > 
> > [373782.703302] FS:  00c42009f090() GS:882fbf90() 
> > knlGS:
> > [373782.721887] CS:  0010 DS:  ES:  CR0: 80050033
> > [373782.737741] CR2: 7f82b6539244 CR3: 002f3de2a005 CR4: 
> > 003606e0
> > [373782.755247] DR0:  DR1:  DR2: 
> > 
> > [373782.772722] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [373782.790043] Call Trace:
> > [373782.802041]  vfs_write+0x151/0x1b0
> > [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
> > [373782.829175]  SyS_write+0x55/0xc0
> > [373782.841870]  do_syscall_64+0x79/0x1b0
> > [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Can you please run RIP through ./scripts/faddr2line to see where exactly
are we looping? I expect