Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-08-21 Thread Василий Ангапов
Yeah, switched to 4.7 recently and no issues so far.

2016-08-21 6:09 GMT+03:00 Alex Gorbachev <a...@iss-integration.com>:
> On Tue, Jul 19, 2016 at 12:04 PM, Alex Gorbachev <a...@iss-integration.com> 
> wrote:
>> On Mon, Jul 18, 2016 at 4:41 AM, Василий Ангапов <anga...@gmail.com> wrote:
>>> Guys,
>>>
>>> This bug is hitting me constantly, may be once per several days. Does
>>> anyone know is there a solution already?
>>
>>
>> I see there is a fix available, and am waiting for a backport to a
>> longterm kernel:
>>
>> https://lkml.org/lkml/2016/7/12/919
>>
>> https://lkml.org/lkml/2016/7/12/297
>>
>> --
>> Alex Gorbachev
>> Storcium
>
>
> No more issues on the latest kernel builds.
>
> Alex
>
>>
>>
>>
>>
>>>
>>> 2016-07-05 11:47 GMT+03:00 Nick Fisk <n...@fisk.me.uk>:
>>>>> -Original Message-
>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>>>> Alex Gorbachev
>>>>> Sent: 04 July 2016 20:50
>>>>> To: Campbell Steven <caste...@gmail.com>
>>>>> Cc: ceph-users <ceph-users@lists.ceph.com>; Tim Bishop >>>> li...@bishnet.net>
>>>>> Subject: Re: [ceph-users] Is anyone seeing iissues with
>>>>> task_numa_find_cpu?
>>>>>
>>>>> On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven <caste...@gmail.com>
>>>>> wrote:
>>>>> > Hi Alex/Stefan,
>>>>> >
>>>>> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
>>>>> > once and for all this particular issue has been completely resolved by
>>>>> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
>>>>> > us anyway the patches that Stefan applied did not solve the issue and
>>>>> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
>>>>> > does the trick for you. We could get about 4 hours uptime before
>>>>> > things went haywire for us.
>>>>> >
>>>>> > It's interesting how it seems the CEPH workload triggers this bug so
>>>>> > well as it's quite a long standing issue that's only just been
>>>>> > resolved, another user chimed in on the lkml thread a couple of days
>>>>> > ago as well and again his trace had ceph-osd in it as well.
>>>>> >
>>>>> > https://lkml.org/lkml/headers/2016/6/21/491
>>>>> >
>>>>> > Campbell
>>>>>
>>>>> Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I am
>>>>> having trouble booting it on an ubuntu box due to some other unrelated
>>>>> problem.  So dropping to kernel 4.2.0 for now, which does not seem to have
>>>>> this load related problem.
>>>>>
>>>>> I looked at the fair.c code in kernel source tree 4.4.14 and it is quite
>>>> different
>>>>> than Peter's patch (assuming 4.5.x source), so the patch does not apply
>>>>> cleanly.  Maybe another 4.4.x kernel will get the update.
>>>>
>>>> I put in a new 16.04 node yesterday and went straight to 4.7.rc6. It's been
>>>> backfilling for just under 24 hours now with no drama. Disks are set to use
>>>> CFQ.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>>
>>>>>
>>>>> >
>>>>> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>>>>> > <s.pri...@profihost.ag> wrote:
>>>>> >>
>>>>> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>>>>> >>> Hi Stefan,
>>>>> >>>
>>>>> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>>>> >>> <s.pri...@profihost.ag> wrote:
>>>>> >>>> Please be aware that you may need even more patches. Overall this
>>>>> >>>> needs 3 patches. Where the first two try to fix a bug and the 3rd
>>>>> >>>> one fixes the fixes + even more bugs related to the scheduler. I've
>>>>> >>>> no idea on which patch level Ubuntu is.
>>>>> >>>
>>>>> >>> Stefan, would you be able to please point to the other two patches
>

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-08-20 Thread Alex Gorbachev
On Tue, Jul 19, 2016 at 12:04 PM, Alex Gorbachev <a...@iss-integration.com> 
wrote:
> On Mon, Jul 18, 2016 at 4:41 AM, Василий Ангапов <anga...@gmail.com> wrote:
>> Guys,
>>
>> This bug is hitting me constantly, may be once per several days. Does
>> anyone know is there a solution already?
>
>
> I see there is a fix available, and am waiting for a backport to a
> longterm kernel:
>
> https://lkml.org/lkml/2016/7/12/919
>
> https://lkml.org/lkml/2016/7/12/297
>
> --
> Alex Gorbachev
> Storcium


No more issues on the latest kernel builds.

Alex

>
>
>
>
>>
>> 2016-07-05 11:47 GMT+03:00 Nick Fisk <n...@fisk.me.uk>:
>>>> -Original Message-
>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>>> Alex Gorbachev
>>>> Sent: 04 July 2016 20:50
>>>> To: Campbell Steven <caste...@gmail.com>
>>>> Cc: ceph-users <ceph-users@lists.ceph.com>; Tim Bishop >>> li...@bishnet.net>
>>>> Subject: Re: [ceph-users] Is anyone seeing iissues with
>>>> task_numa_find_cpu?
>>>>
>>>> On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven <caste...@gmail.com>
>>>> wrote:
>>>> > Hi Alex/Stefan,
>>>> >
>>>> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
>>>> > once and for all this particular issue has been completely resolved by
>>>> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
>>>> > us anyway the patches that Stefan applied did not solve the issue and
>>>> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
>>>> > does the trick for you. We could get about 4 hours uptime before
>>>> > things went haywire for us.
>>>> >
>>>> > It's interesting how it seems the CEPH workload triggers this bug so
>>>> > well as it's quite a long standing issue that's only just been
>>>> > resolved, another user chimed in on the lkml thread a couple of days
>>>> > ago as well and again his trace had ceph-osd in it as well.
>>>> >
>>>> > https://lkml.org/lkml/headers/2016/6/21/491
>>>> >
>>>> > Campbell
>>>>
>>>> Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I am
>>>> having trouble booting it on an ubuntu box due to some other unrelated
>>>> problem.  So dropping to kernel 4.2.0 for now, which does not seem to have
>>>> this load related problem.
>>>>
>>>> I looked at the fair.c code in kernel source tree 4.4.14 and it is quite
>>> different
>>>> than Peter's patch (assuming 4.5.x source), so the patch does not apply
>>>> cleanly.  Maybe another 4.4.x kernel will get the update.
>>>
>>> I put in a new 16.04 node yesterday and went straight to 4.7.rc6. It's been
>>> backfilling for just under 24 hours now with no drama. Disks are set to use
>>> CFQ.
>>>
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>>
>>>> >
>>>> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>>>> > <s.pri...@profihost.ag> wrote:
>>>> >>
>>>> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>>>> >>> Hi Stefan,
>>>> >>>
>>>> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>>> >>> <s.pri...@profihost.ag> wrote:
>>>> >>>> Please be aware that you may need even more patches. Overall this
>>>> >>>> needs 3 patches. Where the first two try to fix a bug and the 3rd
>>>> >>>> one fixes the fixes + even more bugs related to the scheduler. I've
>>>> >>>> no idea on which patch level Ubuntu is.
>>>> >>>
>>>> >>> Stefan, would you be able to please point to the other two patches
>>>> >>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>>>> >>
>>>> >> Sorry sure yes:
>>>> >>
>>>> >> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
>>>> >> bounded value")
>>>> >>
>>>> >> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
>>>> >> post_init_entity_util_avg() serialization")
>>

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-19 Thread Alex Gorbachev
On Mon, Jul 18, 2016 at 4:41 AM, Василий Ангапов <anga...@gmail.com> wrote:
> Guys,
>
> This bug is hitting me constantly, may be once per several days. Does
> anyone know is there a solution already?


I see there is a fix available, and am waiting for a backport to a
longterm kernel:

https://lkml.org/lkml/2016/7/12/919

https://lkml.org/lkml/2016/7/12/297

--
Alex Gorbachev
Storcium




>
> 2016-07-05 11:47 GMT+03:00 Nick Fisk <n...@fisk.me.uk>:
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Alex Gorbachev
>>> Sent: 04 July 2016 20:50
>>> To: Campbell Steven <caste...@gmail.com>
>>> Cc: ceph-users <ceph-users@lists.ceph.com>; Tim Bishop >> li...@bishnet.net>
>>> Subject: Re: [ceph-users] Is anyone seeing iissues with
>>> task_numa_find_cpu?
>>>
>>> On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven <caste...@gmail.com>
>>> wrote:
>>> > Hi Alex/Stefan,
>>> >
>>> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
>>> > once and for all this particular issue has been completely resolved by
>>> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
>>> > us anyway the patches that Stefan applied did not solve the issue and
>>> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
>>> > does the trick for you. We could get about 4 hours uptime before
>>> > things went haywire for us.
>>> >
>>> > It's interesting how it seems the CEPH workload triggers this bug so
>>> > well as it's quite a long standing issue that's only just been
>>> > resolved, another user chimed in on the lkml thread a couple of days
>>> > ago as well and again his trace had ceph-osd in it as well.
>>> >
>>> > https://lkml.org/lkml/headers/2016/6/21/491
>>> >
>>> > Campbell
>>>
>>> Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I am
>>> having trouble booting it on an ubuntu box due to some other unrelated
>>> problem.  So dropping to kernel 4.2.0 for now, which does not seem to have
>>> this load related problem.
>>>
>>> I looked at the fair.c code in kernel source tree 4.4.14 and it is quite
>> different
>>> than Peter's patch (assuming 4.5.x source), so the patch does not apply
>>> cleanly.  Maybe another 4.4.x kernel will get the update.
>>
>> I put in a new 16.04 node yesterday and went straight to 4.7.rc6. It's been
>> backfilling for just under 24 hours now with no drama. Disks are set to use
>> CFQ.
>>
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>>
>>> >
>>> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>>> > <s.pri...@profihost.ag> wrote:
>>> >>
>>> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>>> >>> Hi Stefan,
>>> >>>
>>> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>> >>> <s.pri...@profihost.ag> wrote:
>>> >>>> Please be aware that you may need even more patches. Overall this
>>> >>>> needs 3 patches. Where the first two try to fix a bug and the 3rd
>>> >>>> one fixes the fixes + even more bugs related to the scheduler. I've
>>> >>>> no idea on which patch level Ubuntu is.
>>> >>>
>>> >>> Stefan, would you be able to please point to the other two patches
>>> >>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>>> >>
>>> >> Sorry sure yes:
>>> >>
>>> >> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
>>> >> bounded value")
>>> >>
>>> >> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
>>> >> post_init_entity_util_avg() serialization")
>>> >>
>>> >> 3.) the one listed at lkml.
>>> >>
>>> >> Stefan
>>> >>
>>> >>>
>>> >>> Thank you,
>>> >>> Alex
>>> >>>
>>> >>>>
>>> >>>> Stefan
>>> >>>>
>>> >>>> Excuse my typo sent from my mobile phone.
>>> >>>>
>>> >>>> Am 28.06.2016 um 17:59 schrieb Tim Bishop <ti

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-18 Thread Василий Ангапов
Guys,

This bug is hitting me constantly, may be once per several days. Does
anyone know is there a solution already?

2016-07-05 11:47 GMT+03:00 Nick Fisk <n...@fisk.me.uk>:
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Alex Gorbachev
>> Sent: 04 July 2016 20:50
>> To: Campbell Steven <caste...@gmail.com>
>> Cc: ceph-users <ceph-users@lists.ceph.com>; Tim Bishop > li...@bishnet.net>
>> Subject: Re: [ceph-users] Is anyone seeing iissues with
>> task_numa_find_cpu?
>>
>> On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven <caste...@gmail.com>
>> wrote:
>> > Hi Alex/Stefan,
>> >
>> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
>> > once and for all this particular issue has been completely resolved by
>> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
>> > us anyway the patches that Stefan applied did not solve the issue and
>> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
>> > does the trick for you. We could get about 4 hours uptime before
>> > things went haywire for us.
>> >
>> > It's interesting how it seems the CEPH workload triggers this bug so
>> > well as it's quite a long standing issue that's only just been
>> > resolved, another user chimed in on the lkml thread a couple of days
>> > ago as well and again his trace had ceph-osd in it as well.
>> >
>> > https://lkml.org/lkml/headers/2016/6/21/491
>> >
>> > Campbell
>>
>> Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I am
>> having trouble booting it on an ubuntu box due to some other unrelated
>> problem.  So dropping to kernel 4.2.0 for now, which does not seem to have
>> this load related problem.
>>
>> I looked at the fair.c code in kernel source tree 4.4.14 and it is quite
> different
>> than Peter's patch (assuming 4.5.x source), so the patch does not apply
>> cleanly.  Maybe another 4.4.x kernel will get the update.
>
> I put in a new 16.04 node yesterday and went straight to 4.7.rc6. It's been
> backfilling for just under 24 hours now with no drama. Disks are set to use
> CFQ.
>
>>
>> Thanks,
>> Alex
>>
>>
>>
>> >
>> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>> > <s.pri...@profihost.ag> wrote:
>> >>
>> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>> >>> Hi Stefan,
>> >>>
>> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>> >>> <s.pri...@profihost.ag> wrote:
>> >>>> Please be aware that you may need even more patches. Overall this
>> >>>> needs 3 patches. Where the first two try to fix a bug and the 3rd
>> >>>> one fixes the fixes + even more bugs related to the scheduler. I've
>> >>>> no idea on which patch level Ubuntu is.
>> >>>
>> >>> Stefan, would you be able to please point to the other two patches
>> >>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>> >>
>> >> Sorry sure yes:
>> >>
>> >> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
>> >> bounded value")
>> >>
>> >> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
>> >> post_init_entity_util_avg() serialization")
>> >>
>> >> 3.) the one listed at lkml.
>> >>
>> >> Stefan
>> >>
>> >>>
>> >>> Thank you,
>> >>> Alex
>> >>>
>> >>>>
>> >>>> Stefan
>> >>>>
>> >>>> Excuse my typo sent from my mobile phone.
>> >>>>
>> >>>> Am 28.06.2016 um 17:59 schrieb Tim Bishop <tim-li...@bishnet.net>:
>> >>>>
>> >>>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel.
>> >>>> No useful information to add other than it's not just you.
>> >>>>
>> >>>> Tim.
>> >>>>
>> >>>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>> >>>>
>> >>>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>> >>>>
>> >>>> these issues where an OSD would fail with the stack below.  I
>> &

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-05 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Alex Gorbachev
> Sent: 04 July 2016 20:50
> To: Campbell Steven <caste...@gmail.com>
> Cc: ceph-users <ceph-users@lists.ceph.com>; Tim Bishop  li...@bishnet.net>
> Subject: Re: [ceph-users] Is anyone seeing iissues with
> task_numa_find_cpu?
> 
> On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven <caste...@gmail.com>
> wrote:
> > Hi Alex/Stefan,
> >
> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
> > once and for all this particular issue has been completely resolved by
> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
> > us anyway the patches that Stefan applied did not solve the issue and
> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
> > does the trick for you. We could get about 4 hours uptime before
> > things went haywire for us.
> >
> > It's interesting how it seems the CEPH workload triggers this bug so
> > well as it's quite a long standing issue that's only just been
> > resolved, another user chimed in on the lkml thread a couple of days
> > ago as well and again his trace had ceph-osd in it as well.
> >
> > https://lkml.org/lkml/headers/2016/6/21/491
> >
> > Campbell
> 
> Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I am
> having trouble booting it on an ubuntu box due to some other unrelated
> problem.  So dropping to kernel 4.2.0 for now, which does not seem to have
> this load related problem.
> 
> I looked at the fair.c code in kernel source tree 4.4.14 and it is quite
different
> than Peter's patch (assuming 4.5.x source), so the patch does not apply
> cleanly.  Maybe another 4.4.x kernel will get the update.

I put in a new 16.04 node yesterday and went straight to 4.7.rc6. It's been
backfilling for just under 24 hours now with no drama. Disks are set to use
CFQ.

> 
> Thanks,
> Alex
> 
> 
> 
> >
> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
> > <s.pri...@profihost.ag> wrote:
> >>
> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
> >>> Hi Stefan,
> >>>
> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
> >>> <s.pri...@profihost.ag> wrote:
> >>>> Please be aware that you may need even more patches. Overall this
> >>>> needs 3 patches. Where the first two try to fix a bug and the 3rd
> >>>> one fixes the fixes + even more bugs related to the scheduler. I've
> >>>> no idea on which patch level Ubuntu is.
> >>>
> >>> Stefan, would you be able to please point to the other two patches
> >>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
> >>
> >> Sorry sure yes:
> >>
> >> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
> >> bounded value")
> >>
> >> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
> >> post_init_entity_util_avg() serialization")
> >>
> >> 3.) the one listed at lkml.
> >>
> >> Stefan
> >>
> >>>
> >>> Thank you,
> >>> Alex
> >>>
> >>>>
> >>>> Stefan
> >>>>
> >>>> Excuse my typo sent from my mobile phone.
> >>>>
> >>>> Am 28.06.2016 um 17:59 schrieb Tim Bishop <tim-li...@bishnet.net>:
> >>>>
> >>>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel.
> >>>> No useful information to add other than it's not just you.
> >>>>
> >>>> Tim.
> >>>>
> >>>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
> >>>>
> >>>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
> >>>>
> >>>> these issues where an OSD would fail with the stack below.  I
> >>>> logged a
> >>>>
> >>>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there
> >>>> is
> >>>>
> >>>> a similar description at https://lkml.org/lkml/2016/6/22/102, but
> >>>> the
> >>>>
> >>>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
> >>>>
> >>>> just the noop scheduler.
> >>>>
> >>>>
> >>>> Does the ceph kernel code somehow use the fair sch

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-05 Thread Brad Hubbard
On Sun, Jul 3, 2016 at 7:51 AM, Alex Gorbachev  wrote:
>> Thank you Stefan and Campbell for the info - hope 4.7rc5 resolves this
>> for us - please note that my workload is purely RBD, no QEMU/KVM.
>> Also, we do not have CFQ turned on, neither scsi-mq and blk-mq, so I
>> am surmising ceph-osd must be using something from the fair scheduler.
>> I read that its IO has been switched to blk-mq internally, so maybe
>> there is a relationship there.
>
> If the OSD code is compiled against the source from a buggy fair
> scheduler code, then that would be an OSD code issue, correct?

OSD code is not compiled against any kernel code. ceph-osd runs in userpace,
not kernelspace. A userspace process should not be able to crash a kernel, if it
can that's a kernel bug.

HTH,
Brad
>
>>
>> We had no such problems with kernel 4.2.x, but had other issues with
>> XFS, which do not seem to happen now.
>>
>> Regards,
>> Alex
>>
>>>
>>> Stefan
>>>
>>> Am 29.06.2016 um 11:41 schrieb Campbell Steven:
 Hi Alex/Stefan,

 I'm in the middle of testing 4.7rc5 on our test cluster to confirm
 once and for all this particular issue has been completely resolved by
 Peter's recent patch to sched/fair.c refereed to by Stefan above. For
 us anyway the patches that Stefan applied did not solve the issue and
 neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
 does the trick for you. We could get about 4 hours uptime before
 things went haywire for us.

 It's interesting how it seems the CEPH workload triggers this bug so
 well as it's quite a long standing issue that's only just been
 resolved, another user chimed in on the lkml thread a couple of days
 ago as well and again his trace had ceph-osd in it as well.

 https://lkml.org/lkml/headers/2016/6/21/491

 Campbell

 On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
  wrote:
>
> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>> Hi Stefan,
>>
>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>  wrote:
>>> Please be aware that you may need even more patches. Overall this needs 
>>> 3
>>> patches. Where the first two try to fix a bug and the 3rd one fixes the
>>> fixes + even more bugs related to the scheduler. I've no idea on which 
>>> patch
>>> level Ubuntu is.
>>
>> Stefan, would you be able to please point to the other two patches
>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>
> Sorry sure yes:
>
> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
> bounded value")
>
> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
> post_init_entity_util_avg() serialization")
>
> 3.) the one listed at lkml.
>
> Stefan
>
>>
>> Thank you,
>> Alex
>>
>>>
>>> Stefan
>>>
>>> Excuse my typo sent from my mobile phone.
>>>
>>> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>>>
>>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
>>> useful information to add other than it's not just you.
>>>
>>> Tim.
>>>
>>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>>>
>>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>>>
>>> these issues where an OSD would fail with the stack below.  I logged a
>>>
>>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>>>
>>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>>>
>>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>>>
>>> just the noop scheduler.
>>>
>>>
>>> Does the ceph kernel code somehow use the fair scheduler code block?
>>>
>>>
>>> Thanks
>>>
>>> --
>>>
>>> Alex Gorbachev
>>>
>>> Storcium
>>>
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>>>
>>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>>>
>>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>>>
>>> 03/04/2015
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>>>
>>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>>>
>>> 0010:[]  []
>>>
>>> task_numa_find_cpu+0x22e/0x6f0
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>>>
>>> 0018:880f79fbb818  EFLAGS: 00010206
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>>
>>>  RBX: 880f79fbb8b8 RCX: 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-04 Thread Alex Gorbachev
On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven  wrote:
> Hi Alex/Stefan,
>
> I'm in the middle of testing 4.7rc5 on our test cluster to confirm
> once and for all this particular issue has been completely resolved by
> Peter's recent patch to sched/fair.c refereed to by Stefan above. For
> us anyway the patches that Stefan applied did not solve the issue and
> neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
> does the trick for you. We could get about 4 hours uptime before
> things went haywire for us.
>
> It's interesting how it seems the CEPH workload triggers this bug so
> well as it's quite a long standing issue that's only just been
> resolved, another user chimed in on the lkml thread a couple of days
> ago as well and again his trace had ceph-osd in it as well.
>
> https://lkml.org/lkml/headers/2016/6/21/491
>
> Campbell

Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I
am having trouble booting it on an ubuntu box due to some other
unrelated problem.  So dropping to kernel 4.2.0 for now, which does
not seem to have this load related problem.

I looked at the fair.c code in kernel source tree 4.4.14 and it is
quite different than Peter's patch (assuming 4.5.x source), so the
patch does not apply cleanly.  Maybe another 4.4.x kernel will get the
update.

Thanks,
Alex



>
> On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>  wrote:
>>
>> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>>> Hi Stefan,
>>>
>>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>>  wrote:
 Please be aware that you may need even more patches. Overall this needs 3
 patches. Where the first two try to fix a bug and the 3rd one fixes the
 fixes + even more bugs related to the scheduler. I've no idea on which 
 patch
 level Ubuntu is.
>>>
>>> Stefan, would you be able to please point to the other two patches
>>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>>
>> Sorry sure yes:
>>
>> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
>> bounded value")
>>
>> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
>> post_init_entity_util_avg() serialization")
>>
>> 3.) the one listed at lkml.
>>
>> Stefan
>>
>>>
>>> Thank you,
>>> Alex
>>>

 Stefan

 Excuse my typo sent from my mobile phone.

 Am 28.06.2016 um 17:59 schrieb Tim Bishop :

 Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
 useful information to add other than it's not just you.

 Tim.

 On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:

 After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of

 these issues where an OSD would fail with the stack below.  I logged a

 bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is

 a similar description at https://lkml.org/lkml/2016/6/22/102, but the

 odd part is we have turned off CFQ and blk-mq/scsi-mq and are using

 just the noop scheduler.


 Does the ceph kernel code somehow use the fair scheduler code block?


 Thanks

 --

 Alex Gorbachev

 Storcium


 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:

 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:

 Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2

 03/04/2015

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:

 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:

 0010:[]  []

 task_numa_find_cpu+0x22e/0x6f0

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:

 0018:880f79fbb818  EFLAGS: 00010206

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:

  RBX: 880f79fbb8b8 RCX: 

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:

  RSI:  RDI: 8810352d4800

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:

 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:

 0009 R11: 0006 R12: 8807c3adc4c0

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:

 0006 R14: 033e R15: fec7

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:

 7f30e46b8700() GS:88105f58()

 knlGS:

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:

  ES: 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-02 Thread Alex Gorbachev
> Thank you Stefan and Campbell for the info - hope 4.7rc5 resolves this
> for us - please note that my workload is purely RBD, no QEMU/KVM.
> Also, we do not have CFQ turned on, neither scsi-mq and blk-mq, so I
> am surmising ceph-osd must be using something from the fair scheduler.
> I read that its IO has been switched to blk-mq internally, so maybe
> there is a relationship there.

If the OSD code is compiled against the source from a buggy fair
scheduler code, then that would be an OSD code issue, correct?

>
> We had no such problems with kernel 4.2.x, but had other issues with
> XFS, which do not seem to happen now.
>
> Regards,
> Alex
>
>>
>> Stefan
>>
>> Am 29.06.2016 um 11:41 schrieb Campbell Steven:
>>> Hi Alex/Stefan,
>>>
>>> I'm in the middle of testing 4.7rc5 on our test cluster to confirm
>>> once and for all this particular issue has been completely resolved by
>>> Peter's recent patch to sched/fair.c refereed to by Stefan above. For
>>> us anyway the patches that Stefan applied did not solve the issue and
>>> neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
>>> does the trick for you. We could get about 4 hours uptime before
>>> things went haywire for us.
>>>
>>> It's interesting how it seems the CEPH workload triggers this bug so
>>> well as it's quite a long standing issue that's only just been
>>> resolved, another user chimed in on the lkml thread a couple of days
>>> ago as well and again his trace had ceph-osd in it as well.
>>>
>>> https://lkml.org/lkml/headers/2016/6/21/491
>>>
>>> Campbell
>>>
>>> On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>>>  wrote:

 Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
> Hi Stefan,
>
> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Please be aware that you may need even more patches. Overall this needs 3
>> patches. Where the first two try to fix a bug and the 3rd one fixes the
>> fixes + even more bugs related to the scheduler. I've no idea on which 
>> patch
>> level Ubuntu is.
>
> Stefan, would you be able to please point to the other two patches
> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?

 Sorry sure yes:

 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
 bounded value")

 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
 post_init_entity_util_avg() serialization")

 3.) the one listed at lkml.

 Stefan

>
> Thank you,
> Alex
>
>>
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>>
>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
>> useful information to add other than it's not just you.
>>
>> Tim.
>>
>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>>
>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>>
>> these issues where an OSD would fail with the stack below.  I logged a
>>
>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>>
>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>>
>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>>
>> just the noop scheduler.
>>
>>
>> Does the ceph kernel code somehow use the fair scheduler code block?
>>
>>
>> Thanks
>>
>> --
>>
>> Alex Gorbachev
>>
>> Storcium
>>
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>>
>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>>
>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>>
>> 03/04/2015
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>>
>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>>
>> 0010:[]  []
>>
>> task_numa_find_cpu+0x22e/0x6f0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>>
>> 0018:880f79fbb818  EFLAGS: 00010206
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>
>>  RBX: 880f79fbb8b8 RCX: 
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>>
>>  RSI:  RDI: 8810352d4800
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>>
>> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>>
>> 0009 R11: 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-01 Thread Christoph Adomeit
Hi,

is there meanwhile a proven solution to this issue ?

What can be done do fix the scheduler bug ? 1 Patch, 3 Patches, 20 Patches ?

Thanks
  Christoph

On Wed, Jun 29, 2016 at 12:02:11PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> to be precise i've far more patches attached to the sched part (around
> 20) of the kernel. So maybe that's the reason why it helps to me.
> 
> Could you please post a complete stack trace? Also Qemu / KVM triggers this.
> 
> Stefan
> 
> Am 29.06.2016 um 11:41 schrieb Campbell Steven:
> > Hi Alex/Stefan,
> > 
> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
> > once and for all this particular issue has been completely resolved by
> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
> > us anyway the patches that Stefan applied did not solve the issue and
> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
> > does the trick for you. We could get about 4 hours uptime before
> > things went haywire for us.
> > 
> > It's interesting how it seems the CEPH workload triggers this bug so
> > well as it's quite a long standing issue that's only just been
> > resolved, another user chimed in on the lkml thread a couple of days
> > ago as well and again his trace had ceph-osd in it as well.
> > 
> > https://lkml.org/lkml/headers/2016/6/21/491
> > 
> > Campbell
> > 
> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
> >  wrote:
> >>
> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
> >>> Hi Stefan,
> >>>
> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
> >>>  wrote:
>  Please be aware that you may need even more patches. Overall this needs 3
>  patches. Where the first two try to fix a bug and the 3rd one fixes the
>  fixes + even more bugs related to the scheduler. I've no idea on which 
>  patch
>  level Ubuntu is.
> >>>
> >>> Stefan, would you be able to please point to the other two patches
> >>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
> >>
> >> Sorry sure yes:
> >>
> >> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
> >> bounded value")
> >>
> >> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
> >> post_init_entity_util_avg() serialization")
> >>
> >> 3.) the one listed at lkml.
> >>
> >> Stefan
> >>
> >>>
> >>> Thank you,
> >>> Alex
> >>>
> 
>  Stefan
> 
>  Excuse my typo sent from my mobile phone.
> 
>  Am 28.06.2016 um 17:59 schrieb Tim Bishop :
> 
>  Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
>  useful information to add other than it's not just you.
> 
>  Tim.
> 
>  On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
> 
>  After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
> 
>  these issues where an OSD would fail with the stack below.  I logged a
> 
>  bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
> 
>  a similar description at https://lkml.org/lkml/2016/6/22/102, but the
> 
>  odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
> 
>  just the noop scheduler.
> 
> 
>  Does the ceph kernel code somehow use the fair scheduler code block?
> 
> 
>  Thanks
> 
>  --
> 
>  Alex Gorbachev
> 
>  Storcium
> 
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
> 
>  10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
> 
>  Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
> 
>  03/04/2015
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
> 
>  880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
> 
>  0010:[]  []
> 
>  task_numa_find_cpu+0x22e/0x6f0
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
> 
>  0018:880f79fbb818  EFLAGS: 00010206
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
> 
>   RBX: 880f79fbb8b8 RCX: 
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
> 
>   RSI:  RDI: 8810352d4800
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
> 
>  880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
> 
>  0009 R11: 0006 R12: 8807c3adc4c0
> 
>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
> 
>  0006 R14: 033e R15: fec7
> 
> 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-29 Thread Stefan Priebe - Profihost AG
Hi,

to be precise i've far more patches attached to the sched part (around
20) of the kernel. So maybe that's the reason why it helps to me.

Could you please post a complete stack trace? Also Qemu / KVM triggers this.

Stefan

Am 29.06.2016 um 11:41 schrieb Campbell Steven:
> Hi Alex/Stefan,
> 
> I'm in the middle of testing 4.7rc5 on our test cluster to confirm
> once and for all this particular issue has been completely resolved by
> Peter's recent patch to sched/fair.c refereed to by Stefan above. For
> us anyway the patches that Stefan applied did not solve the issue and
> neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
> does the trick for you. We could get about 4 hours uptime before
> things went haywire for us.
> 
> It's interesting how it seems the CEPH workload triggers this bug so
> well as it's quite a long standing issue that's only just been
> resolved, another user chimed in on the lkml thread a couple of days
> ago as well and again his trace had ceph-osd in it as well.
> 
> https://lkml.org/lkml/headers/2016/6/21/491
> 
> Campbell
> 
> On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>  wrote:
>>
>> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>>> Hi Stefan,
>>>
>>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>>  wrote:
 Please be aware that you may need even more patches. Overall this needs 3
 patches. Where the first two try to fix a bug and the 3rd one fixes the
 fixes + even more bugs related to the scheduler. I've no idea on which 
 patch
 level Ubuntu is.
>>>
>>> Stefan, would you be able to please point to the other two patches
>>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>>
>> Sorry sure yes:
>>
>> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
>> bounded value")
>>
>> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
>> post_init_entity_util_avg() serialization")
>>
>> 3.) the one listed at lkml.
>>
>> Stefan
>>
>>>
>>> Thank you,
>>> Alex
>>>

 Stefan

 Excuse my typo sent from my mobile phone.

 Am 28.06.2016 um 17:59 schrieb Tim Bishop :

 Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
 useful information to add other than it's not just you.

 Tim.

 On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:

 After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of

 these issues where an OSD would fail with the stack below.  I logged a

 bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is

 a similar description at https://lkml.org/lkml/2016/6/22/102, but the

 odd part is we have turned off CFQ and blk-mq/scsi-mq and are using

 just the noop scheduler.


 Does the ceph kernel code somehow use the fair scheduler code block?


 Thanks

 --

 Alex Gorbachev

 Storcium


 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:

 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:

 Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2

 03/04/2015

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:

 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:

 0010:[]  []

 task_numa_find_cpu+0x22e/0x6f0

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:

 0018:880f79fbb818  EFLAGS: 00010206

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:

  RBX: 880f79fbb8b8 RCX: 

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:

  RSI:  RDI: 8810352d4800

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:

 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:

 0009 R11: 0006 R12: 8807c3adc4c0

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:

 0006 R14: 033e R15: fec7

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:

 7f30e46b8700() GS:88105f58()

 knlGS:

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:

  ES:  CR0: 80050033

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:

 1321a000 CR3: 000853598000 CR4: 000406e0

 Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:

 Jun 28 09:46:41 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-29 Thread Campbell Steven
Hi Alex/Stefan,

I'm in the middle of testing 4.7rc5 on our test cluster to confirm
once and for all this particular issue has been completely resolved by
Peter's recent patch to sched/fair.c refereed to by Stefan above. For
us anyway the patches that Stefan applied did not solve the issue and
neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
does the trick for you. We could get about 4 hours uptime before
things went haywire for us.

It's interesting how it seems the CEPH workload triggers this bug so
well as it's quite a long standing issue that's only just been
resolved, another user chimed in on the lkml thread a couple of days
ago as well and again his trace had ceph-osd in it as well.

https://lkml.org/lkml/headers/2016/6/21/491

Campbell

On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
 wrote:
>
> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>> Hi Stefan,
>>
>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>  wrote:
>>> Please be aware that you may need even more patches. Overall this needs 3
>>> patches. Where the first two try to fix a bug and the 3rd one fixes the
>>> fixes + even more bugs related to the scheduler. I've no idea on which patch
>>> level Ubuntu is.
>>
>> Stefan, would you be able to please point to the other two patches
>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>
> Sorry sure yes:
>
> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
> bounded value")
>
> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
> post_init_entity_util_avg() serialization")
>
> 3.) the one listed at lkml.
>
> Stefan
>
>>
>> Thank you,
>> Alex
>>
>>>
>>> Stefan
>>>
>>> Excuse my typo sent from my mobile phone.
>>>
>>> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>>>
>>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
>>> useful information to add other than it's not just you.
>>>
>>> Tim.
>>>
>>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>>>
>>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>>>
>>> these issues where an OSD would fail with the stack below.  I logged a
>>>
>>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>>>
>>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>>>
>>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>>>
>>> just the noop scheduler.
>>>
>>>
>>> Does the ceph kernel code somehow use the fair scheduler code block?
>>>
>>>
>>> Thanks
>>>
>>> --
>>>
>>> Alex Gorbachev
>>>
>>> Storcium
>>>
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>>>
>>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>>>
>>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>>>
>>> 03/04/2015
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>>>
>>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>>>
>>> 0010:[]  []
>>>
>>> task_numa_find_cpu+0x22e/0x6f0
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>>>
>>> 0018:880f79fbb818  EFLAGS: 00010206
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>>
>>>  RBX: 880f79fbb8b8 RCX: 
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>>>
>>>  RSI:  RDI: 8810352d4800
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>>>
>>> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>>>
>>> 0009 R11: 0006 R12: 8807c3adc4c0
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>>>
>>> 0006 R14: 033e R15: fec7
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>>>
>>> 7f30e46b8700() GS:88105f58()
>>>
>>> knlGS:
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>>>
>>>  ES:  CR0: 80050033
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>>>
>>> 1321a000 CR3: 000853598000 CR4: 000406e0
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>>>
>>> 813d050f 000d 0045 880f79df8000
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>>>
>>> 033f  00016b00 033f
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>>>
>>> 880f79df8000 880f79fbb8b8 01f4 0054
>>>
>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>>>

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-29 Thread Stefan Priebe - Profihost AG

Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
> Hi Stefan,
> 
> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Please be aware that you may need even more patches. Overall this needs 3
>> patches. Where the first two try to fix a bug and the 3rd one fixes the
>> fixes + even more bugs related to the scheduler. I've no idea on which patch
>> level Ubuntu is.
> 
> Stefan, would you be able to please point to the other two patches
> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?

Sorry sure yes:

1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
bounded value")

2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
post_init_entity_util_avg() serialization")

3.) the one listed at lkml.

Stefan

> 
> Thank you,
> Alex
> 
>>
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>>
>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
>> useful information to add other than it's not just you.
>>
>> Tim.
>>
>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>>
>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>>
>> these issues where an OSD would fail with the stack below.  I logged a
>>
>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>>
>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>>
>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>>
>> just the noop scheduler.
>>
>>
>> Does the ceph kernel code somehow use the fair scheduler code block?
>>
>>
>> Thanks
>>
>> --
>>
>> Alex Gorbachev
>>
>> Storcium
>>
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>>
>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>>
>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>>
>> 03/04/2015
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>>
>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>>
>> 0010:[]  []
>>
>> task_numa_find_cpu+0x22e/0x6f0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>>
>> 0018:880f79fbb818  EFLAGS: 00010206
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>
>>  RBX: 880f79fbb8b8 RCX: 
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>>
>>  RSI:  RDI: 8810352d4800
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>>
>> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>>
>> 0009 R11: 0006 R12: 8807c3adc4c0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>>
>> 0006 R14: 033e R15: fec7
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>>
>> 7f30e46b8700() GS:88105f58()
>>
>> knlGS:
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>>
>>  ES:  CR0: 80050033
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>>
>> 1321a000 CR3: 000853598000 CR4: 000406e0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>>
>> 813d050f 000d 0045 880f79df8000
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>>
>> 033f  00016b00 033f
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>>
>> 880f79df8000 880f79fbb8b8 01f4 0054
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
>>
>> [] ? cpumask_next_and+0x2f/0x40
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
>>
>> [] task_numa_migrate+0x43e/0x9b0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
>>
>> [] ? update_cfs_shares+0xbc/0x100
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
>>
>> [] numa_migrate_preferred+0x79/0x80
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
>>
>> [] task_numa_fault+0x7f4/0xd40
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
>>
>> [] ? timerqueue_del+0x24/0x70
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
>>
>> [] ? should_numa_migrate_memory+0x55/0x130
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
>>
>> [] handle_mm_fault+0xbc0/0x1820
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
>>
>> [] ? __hrtimer_init+0x90/0x90
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
>>
>> [] ? remove_wait_queue+0x4d/0x60
>>
>> Jun 28 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Alex Gorbachev
Hi Stefan,

On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
 wrote:
> Please be aware that you may need even more patches. Overall this needs 3
> patches. Where the first two try to fix a bug and the 3rd one fixes the
> fixes + even more bugs related to the scheduler. I've no idea on which patch
> level Ubuntu is.

Stefan, would you be able to please point to the other two patches
beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?

Thank you,
Alex

>
> Stefan
>
> Excuse my typo sent from my mobile phone.
>
> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>
> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
> useful information to add other than it's not just you.
>
> Tim.
>
> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>
> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>
> these issues where an OSD would fail with the stack below.  I logged a
>
> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>
> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>
> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>
> just the noop scheduler.
>
>
> Does the ceph kernel code somehow use the fair scheduler code block?
>
>
> Thanks
>
> --
>
> Alex Gorbachev
>
> Storcium
>
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>
> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>
> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>
> 03/04/2015
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>
> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>
> 0010:[]  []
>
> task_numa_find_cpu+0x22e/0x6f0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>
> 0018:880f79fbb818  EFLAGS: 00010206
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>
>  RBX: 880f79fbb8b8 RCX: 
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>
>  RSI:  RDI: 8810352d4800
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>
> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>
> 0009 R11: 0006 R12: 8807c3adc4c0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>
> 0006 R14: 033e R15: fec7
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>
> 7f30e46b8700() GS:88105f58()
>
> knlGS:
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>
>  ES:  CR0: 80050033
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>
> 1321a000 CR3: 000853598000 CR4: 000406e0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>
> 813d050f 000d 0045 880f79df8000
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>
> 033f  00016b00 033f
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>
> 880f79df8000 880f79fbb8b8 01f4 0054
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
>
> [] ? cpumask_next_and+0x2f/0x40
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
>
> [] task_numa_migrate+0x43e/0x9b0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
>
> [] ? update_cfs_shares+0xbc/0x100
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
>
> [] numa_migrate_preferred+0x79/0x80
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
>
> [] task_numa_fault+0x7f4/0xd40
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
>
> [] ? timerqueue_del+0x24/0x70
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
>
> [] ? should_numa_migrate_memory+0x55/0x130
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
>
> [] handle_mm_fault+0xbc0/0x1820
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
>
> [] ? __hrtimer_init+0x90/0x90
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
>
> [] ? remove_wait_queue+0x4d/0x60
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
>
> [] ? poll_freewait+0x4a/0xa0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
>
> [] __do_page_fault+0x197/0x400
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
>
> [] do_page_fault+0x22/0x30
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
>
> [] page_fault+0x28/0x30
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
>
> [] ? copy_page_to_iter_iovec+0x5f/0x300
>
> Jun 28 09:46:41 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Brendan Moloney
The Ubuntu bug report is here: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729

> Please be aware that you may need even more patches. Overall this needs 3 
> patches. Where the first two try to fix a bug and the 3rd one fixes the fixes 
> + even more bugs related to the scheduler. I've no idea on which patch level 
> Ubuntu is.
> 
> Stefan
> 
> Excuse my typo sent from my mobile phone.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Stefan Priebe - Profihost AG
Please be aware that you may need even more patches. Overall this needs 3 
patches. Where the first two try to fix a bug and the 3rd one fixes the fixes + 
even more bugs related to the scheduler. I've no idea on which patch level 
Ubuntu is.

Stefan

Excuse my typo sent from my mobile phone.

> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
> 
> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
> useful information to add other than it's not just you.
> 
> Tim.
> 
>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>> these issues where an OSD would fail with the stack below.  I logged a
>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>> just the noop scheduler.
>> 
>> Does the ceph kernel code somehow use the fair scheduler code block?
>> 
>> Thanks
>> --
>> Alex Gorbachev
>> Storcium
>> 
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>> 03/04/2015
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>> 0010:[]  []
>> task_numa_find_cpu+0x22e/0x6f0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>> 0018:880f79fbb818  EFLAGS: 00010206
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>  RBX: 880f79fbb8b8 RCX: 
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>>  RSI:  RDI: 8810352d4800
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>> 0009 R11: 0006 R12: 8807c3adc4c0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>> 0006 R14: 033e R15: fec7
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>> 7f30e46b8700() GS:88105f58()
>> knlGS:
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>>  ES:  CR0: 80050033
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>> 1321a000 CR3: 000853598000 CR4: 000406e0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>> 813d050f 000d 0045 880f79df8000
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>> 033f  00016b00 033f
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>> 880f79df8000 880f79fbb8b8 01f4 0054
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
>> [] ? cpumask_next_and+0x2f/0x40
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
>> [] task_numa_migrate+0x43e/0x9b0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
>> [] ? update_cfs_shares+0xbc/0x100
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
>> [] numa_migrate_preferred+0x79/0x80
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
>> [] task_numa_fault+0x7f4/0xd40
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
>> [] ? timerqueue_del+0x24/0x70
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
>> [] ? should_numa_migrate_memory+0x55/0x130
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
>> [] handle_mm_fault+0xbc0/0x1820
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
>> [] ? __hrtimer_init+0x90/0x90
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
>> [] ? remove_wait_queue+0x4d/0x60
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
>> [] ? poll_freewait+0x4a/0xa0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
>> [] __do_page_fault+0x197/0x400
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
>> [] do_page_fault+0x22/0x30
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
>> [] page_fault+0x28/0x30
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
>> [] ? copy_page_to_iter_iovec+0x5f/0x300
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
>> [] ? select_task_rq_fair+0x625/0x700
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
>> [] copy_page_to_iter+0x16/0xa0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
>> [] skb_copy_datagram_iter+0x14d/0x280
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
>> [] 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Tim Bishop
Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
useful information to add other than it's not just you.

Tim.

On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
> these issues where an OSD would fail with the stack below.  I logged a
> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
> just the noop scheduler.
> 
> Does the ceph kernel code somehow use the fair scheduler code block?
> 
> Thanks
> --
> Alex Gorbachev
> Storcium
> 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
> 03/04/2015
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
> 0010:[]  []
> task_numa_find_cpu+0x22e/0x6f0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
> 0018:880f79fbb818  EFLAGS: 00010206
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>  RBX: 880f79fbb8b8 RCX: 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>  RSI:  RDI: 8810352d4800
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
> 0009 R11: 0006 R12: 8807c3adc4c0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
> 0006 R14: 033e R15: fec7
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
> 7f30e46b8700() GS:88105f58()
> knlGS:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>  ES:  CR0: 80050033
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
> 1321a000 CR3: 000853598000 CR4: 000406e0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
> 813d050f 000d 0045 880f79df8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
> 033f  00016b00 033f
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
> 880f79df8000 880f79fbb8b8 01f4 0054
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
> [] ? cpumask_next_and+0x2f/0x40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
> [] task_numa_migrate+0x43e/0x9b0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
> [] ? update_cfs_shares+0xbc/0x100
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
> [] numa_migrate_preferred+0x79/0x80
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
> [] task_numa_fault+0x7f4/0xd40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
> [] ? timerqueue_del+0x24/0x70
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
> [] ? should_numa_migrate_memory+0x55/0x130
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
> [] handle_mm_fault+0xbc0/0x1820
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
> [] ? __hrtimer_init+0x90/0x90
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
> [] ? remove_wait_queue+0x4d/0x60
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
> [] ? poll_freewait+0x4a/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
> [] __do_page_fault+0x197/0x400
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
> [] do_page_fault+0x22/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
> [] page_fault+0x28/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
> [] ? copy_page_to_iter_iovec+0x5f/0x300
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
> [] ? select_task_rq_fair+0x625/0x700
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
> [] copy_page_to_iter+0x16/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
> [] skb_copy_datagram_iter+0x14d/0x280
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
> [] tcp_recvmsg+0x613/0xbe0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
> [] inet_recvmsg+0x7e/0xb0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
> [] sock_recvmsg+0x3b/0x50
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
> [] SYSC_recvfrom+0xe1/0x160
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
> [] ? ktime_get_ts64+0x45/0xf0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
> [] SyS_recvfrom+0xe/0x10
> Jun 28 09:46:41 

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Stefan Priebe - Profihost AG
Yes you need those lkml patches. I added them to our custom 4.4 Kernel too to 
prevent this.

Stefan

Excuse my typo sent from my mobile phone.

> Am 28.06.2016 um 17:05 schrieb Alex Gorbachev :
> 
> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
> these issues where an OSD would fail with the stack below.  I logged a
> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
> just the noop scheduler.
> 
> Does the ceph kernel code somehow use the fair scheduler code block?
> 
> Thanks
> --
> Alex Gorbachev
> Storcium
> 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
> 03/04/2015
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
> 0010:[]  []
> task_numa_find_cpu+0x22e/0x6f0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
> 0018:880f79fbb818  EFLAGS: 00010206
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>  RBX: 880f79fbb8b8 RCX: 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>  RSI:  RDI: 8810352d4800
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
> 0009 R11: 0006 R12: 8807c3adc4c0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
> 0006 R14: 033e R15: fec7
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
> 7f30e46b8700() GS:88105f58()
> knlGS:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>  ES:  CR0: 80050033
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
> 1321a000 CR3: 000853598000 CR4: 000406e0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
> 813d050f 000d 0045 880f79df8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
> 033f  00016b00 033f
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
> 880f79df8000 880f79fbb8b8 01f4 0054
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
> [] ? cpumask_next_and+0x2f/0x40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
> [] task_numa_migrate+0x43e/0x9b0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
> [] ? update_cfs_shares+0xbc/0x100
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
> [] numa_migrate_preferred+0x79/0x80
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
> [] task_numa_fault+0x7f4/0xd40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
> [] ? timerqueue_del+0x24/0x70
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
> [] ? should_numa_migrate_memory+0x55/0x130
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
> [] handle_mm_fault+0xbc0/0x1820
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
> [] ? __hrtimer_init+0x90/0x90
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
> [] ? remove_wait_queue+0x4d/0x60
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
> [] ? poll_freewait+0x4a/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
> [] __do_page_fault+0x197/0x400
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
> [] do_page_fault+0x22/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
> [] page_fault+0x28/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
> [] ? copy_page_to_iter_iovec+0x5f/0x300
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
> [] ? select_task_rq_fair+0x625/0x700
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
> [] copy_page_to_iter+0x16/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
> [] skb_copy_datagram_iter+0x14d/0x280
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
> [] tcp_recvmsg+0x613/0xbe0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
> [] inet_recvmsg+0x7e/0xb0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
> [] sock_recvmsg+0x3b/0x50
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
> [] SYSC_recvfrom+0xe1/0x160
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
> [] ? ktime_get_ts64+0x45/0xf0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
> [] 

[ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Alex Gorbachev
After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
these issues where an OSD would fail with the stack below.  I logged a
bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
a similar description at https://lkml.org/lkml/2016/6/22/102, but the
odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
just the noop scheduler.

Does the ceph kernel code somehow use the fair scheduler code block?

Thanks
--
Alex Gorbachev
Storcium

Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
03/04/2015
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
0010:[]  []
task_numa_find_cpu+0x22e/0x6f0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
0018:880f79fbb818  EFLAGS: 00010206
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
 RBX: 880f79fbb8b8 RCX: 
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
 RSI:  RDI: 8810352d4800
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
0009 R11: 0006 R12: 8807c3adc4c0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
0006 R14: 033e R15: fec7
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
7f30e46b8700() GS:88105f58()
knlGS:
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
 ES:  CR0: 80050033
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
1321a000 CR3: 000853598000 CR4: 000406e0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
813d050f 000d 0045 880f79df8000
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
033f  00016b00 033f
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
880f79df8000 880f79fbb8b8 01f4 0054
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
[] ? cpumask_next_and+0x2f/0x40
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
[] task_numa_migrate+0x43e/0x9b0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
[] ? update_cfs_shares+0xbc/0x100
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
[] numa_migrate_preferred+0x79/0x80
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
[] task_numa_fault+0x7f4/0xd40
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
[] ? timerqueue_del+0x24/0x70
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
[] ? should_numa_migrate_memory+0x55/0x130
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
[] handle_mm_fault+0xbc0/0x1820
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
[] ? __hrtimer_init+0x90/0x90
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
[] ? remove_wait_queue+0x4d/0x60
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
[] ? poll_freewait+0x4a/0xa0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
[] __do_page_fault+0x197/0x400
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
[] do_page_fault+0x22/0x30
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
[] page_fault+0x28/0x30
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
[] ? copy_page_to_iter_iovec+0x5f/0x300
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
[] ? select_task_rq_fair+0x625/0x700
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
[] copy_page_to_iter+0x16/0xa0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
[] skb_copy_datagram_iter+0x14d/0x280
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
[] tcp_recvmsg+0x613/0xbe0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
[] inet_recvmsg+0x7e/0xb0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
[] sock_recvmsg+0x3b/0x50
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
[] SYSC_recvfrom+0xe1/0x160
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
[] ? ktime_get_ts64+0x45/0xf0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
[] SyS_recvfrom+0xe/0x10
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686259]
[] entry_SYSCALL_64_fastpath+0x16/0x71
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686287] Code: 55 b0 4c
89 f7 e8 53 cd ff ff 48 8b 55 b0 49 8b 4e 78 48 8b 82 d8 01 00 00 48
83 c1 01 31 d2 49 0f af 86 b0 00 00 00 4c 8b 73 78 <48> f7 f1 48 8b 4b
20 49 89 c0 48 29 c1 48 8b 45 d0 4c 03 43 48
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686512] RIP
[]