Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2018-05-30 Thread Dzmitryj Jakavuk
Indeed! Already  tested 2.10.4  with a fixed issue
  

> On May 31, 2018, at 09:15, Hans Henrik Happe  wrote:
> 
> Happy to report that 2.10.4 fixed this issue.
> 
> Cheers,
> Hans Henrik
> 
> On 06-04-2018 14:48, Hans Henrik Happe wrote:
>> Just for the record. 2.11.0 has fixed this. Not sure which LU though.
>> 
>> Cheers,
>> Hans Henrik
>> 
>> On 30-09-2017 23:43, Hans Henrik Happe wrote:
>>> On 27-09-2017 18:50, Dilger, Andreas wrote:
 On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
> Hi,
> 
> Did anyone else experience CPU load from ksoftirqd after 'modprobe
> lustre'? On an otherwise idle node I see:
> 
>   PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
> 9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1
> 
> 
>57 root  20   0 000 R 23.9  0.0  2:22.91
> ksoftirqd/13
> 
> The sum of those two is about 50% CPU.
> 
> I have narrowed it down to the ptlrpc module. When I remove that, it
> stops.
> 
> I also tested the 2.10.1-RC1, which is the same.
 If you can run "echo l > /proc/sysrq-trigger" it will report the
 processes
 that are currently running on the CPUs of your system to the console
 (and
 also /var/log/messages, if it can write everything in time).
 
 You might need to do this several times to get a representative
 sample of
 the ksoftirqd process stacks to see what they are doing that is
 consuming
 so much CPU.
 
 Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of
 all
 processes to the console (and /v/l/m), but there will be a lot of them,
 and no better chance that it catches what ksoftirqd is doing 25% of
 the time.
>>> I've attached the stacks. Some wakeup which I guess are initiated by
>>> something in the ptlrpc code.
>>> 
>>> Cheers,
>>> Hans Henrik
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>> 
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2018-05-30 Thread Hans Henrik Happe
Happy to report that 2.10.4 fixed this issue.

Cheers,
Hans Henrik

On 06-04-2018 14:48, Hans Henrik Happe wrote:
> Just for the record. 2.11.0 has fixed this. Not sure which LU though.
> 
> Cheers,
> Hans Henrik
> 
> On 30-09-2017 23:43, Hans Henrik Happe wrote:
>> On 27-09-2017 18:50, Dilger, Andreas wrote:
>>> On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
 Hi,

 Did anyone else experience CPU load from ksoftirqd after 'modprobe
 lustre'? On an otherwise idle node I see:

   PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
     9 root  20   0 0    0    0 S 28.5  0.0  2:05.58 ksoftirqd/1


    57 root  20   0 0    0    0 R 23.9  0.0  2:22.91
 ksoftirqd/13

 The sum of those two is about 50% CPU.

 I have narrowed it down to the ptlrpc module. When I remove that, it
 stops.

 I also tested the 2.10.1-RC1, which is the same.
>>> If you can run "echo l > /proc/sysrq-trigger" it will report the
>>> processes
>>> that are currently running on the CPUs of your system to the console
>>> (and
>>> also /var/log/messages, if it can write everything in time).
>>>
>>> You might need to do this several times to get a representative
>>> sample of
>>> the ksoftirqd process stacks to see what they are doing that is
>>> consuming
>>> so much CPU.
>>>
>>> Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of
>>> all
>>> processes to the console (and /v/l/m), but there will be a lot of them,
>>> and no better chance that it catches what ksoftirqd is doing 25% of
>>> the time.
>> I've attached the stacks. Some wakeup which I guess are initiated by
>> something in the ptlrpc code.
>>
>> Cheers,
>> Hans Henrik
>>
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2018-04-06 Thread Hans Henrik Happe

Just for the record. 2.11.0 has fixed this. Not sure which LU though.

Cheers,
Hans Henrik

On 30-09-2017 23:43, Hans Henrik Happe wrote:

On 27-09-2017 18:50, Dilger, Andreas wrote:

On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:

Hi,

Did anyone else experience CPU load from ksoftirqd after 'modprobe
lustre'? On an otherwise idle node I see:

  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1


   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13

The sum of those two is about 50% CPU.

I have narrowed it down to the ptlrpc module. When I remove that, it stops.

I also tested the 2.10.1-RC1, which is the same.

If you can run "echo l > /proc/sysrq-trigger" it will report the processes
that are currently running on the CPUs of your system to the console (and
also /var/log/messages, if it can write everything in time).

You might need to do this several times to get a representative sample of
the ksoftirqd process stacks to see what they are doing that is consuming
so much CPU.

Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
processes to the console (and /v/l/m), but there will be a lot of them,
and no better chance that it catches what ksoftirqd is doing 25% of the time.
I've attached the stacks. Some wakeup which I guess are initiated by 
something in the ptlrpc code.


Cheers,
Hans Henrik




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2017-09-30 Thread Hans Henrik Happe
On 27-09-2017 18:50, Dilger, Andreas wrote:
> On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
>> Hi,
>>
>> Did anyone else experience CPU load from ksoftirqd after 'modprobe
>> lustre'? On an otherwise idle node I see:
>>
>>  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
>>9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1
>>
>>
>>   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13
>>
>> The sum of those two is about 50% CPU.
>>
>> I have narrowed it down to the ptlrpc module. When I remove that, it stops.
>>
>> I also tested the 2.10.1-RC1, which is the same.
> If you can run "echo l > /proc/sysrq-trigger" it will report the processes
> that are currently running on the CPUs of your system to the console (and
> also /var/log/messages, if it can write everything in time).
>
> You might need to do this several times to get a representative sample of
> the ksoftirqd process stacks to see what they are doing that is consuming
> so much CPU.
>
> Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
> processes to the console (and /v/l/m), but there will be a lot of them,
> and no better chance that it catches what ksoftirqd is doing 25% of the time.
I've attached the stacks. Some wakeup which I guess are initiated by
something in the ptlrpc code.

Cheers,
Hans Henrik


Sep 30 23:26:01 node033 kernel: NMI backtrace for cpu 11
Sep 30 23:26:01 node033 kernel: CPU 11 Modules linked in: lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl mmfs(U) mmfslinux(U) tracedev(U) autofs4 sha512_generic cr
c32c_intel sunrpc acpi_cpufreq freq_table mperf bonding ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 xfs exportfs ext2 ipmi_devintf ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_vendor_support raid0 i
gb dca i2c_algo_bit i2c_core raid1 ptp pps_core sg serio_raw lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sd_mod crc_t10dif ahci hpsa dm_mirror dm_region_h
ash dm_log dm_mod [last unloaded: libcfs]
Sep 30 23:26:01 node033 kernel: 
Sep 30 23:26:01 node033 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6  
Sep 30 23:26:01 node033 kernel: RIP: 0010:[]  [] intel_idle+0xd1/0x1b0
Sep 30 23:26:01 node033 kernel: RSP: 0018:880c2d067e48  EFLAGS: 0046
Sep 30 23:26:01 node033 kernel: RAX: 0020 RBX: 0008 RCX: 0001
Sep 30 23:26:01 node033 kernel: RDX:  RSI: 880c2d067fd8 RDI: 81a98580
Sep 30 23:26:01 node033 kernel: RBP: 880c2d067ed8 R08: 0005 R09: 00c8
Sep 30 23:26:01 node033 kernel: R10: 0011f630c009c747 R11: 880c2d067e68 R12: 0004
Sep 30 23:26:01 node033 kernel: R13: 880c6a6beb40 R14: 0020 R15: 14e940c25ce4ad4a
Sep 30 23:26:01 node033 kernel: FS:  () GS:880c6a6a() knlGS:
Sep 30 23:26:01 node033 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
Sep 30 23:26:01 node033 kernel: CR2: 7fe396313000 CR3: 01a8d000 CR4: 07e0
Sep 30 23:26:01 node033 kernel: DR0:  DR1:  DR2: 
Sep 30 23:26:01 node033 kernel: DR3:  DR6: 0ff0 DR7: 0400
Sep 30 23:26:01 node033 kernel: Process swapper (pid: 0, threadinfo 880c2d064000, task 880c2d052040)
Sep 30 23:26:01 node033 kernel: Stack:
Sep 30 23:26:01 node033 kernel: 88182d471020 880c6a6b1b80 000b 0011f630b5b0c60b
Sep 30 23:26:01 node033 kernel: 880c0002 0282  804f
Sep 30 23:26:01 node033 kernel: 880c2d067ed8 81444bb4  000b0a59013c
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: [] ? menu_select+0x174/0x390
Sep 30 23:26:01 node033 kernel: [] cpuidle_idle_call+0x7a/0xe0
Sep 30 23:26:01 node033 kernel: [] cpu_idle+0xb6/0x110
Sep 30 23:26:01 node033 kernel: [] start_secondary+0x2c0/0x316
Sep 30 23:26:01 node033 kernel: Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 f0 0f 01 c9  5a 3f db ff 4c 29 f8 48 89 c7 e8 7f 61 d8 ff 49 89 c6 49 89 
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: <#DB[1]>  <> Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel:   [] ? show_regs+0x27/0x30
Sep 30 23:26:01 node033 kernel: [] ? arch_trigger_all_cpu_backtrace_handler+0x99/0xc0
Sep 30 23:26:01 node033 kernel: [] ? notifier_call_chain+0x55/0x80
Sep 30 23:26:01 node033 kernel: [] ? atomic_notifier_call_chain+0x1a/0x20
Sep 30 23:26:01 node033 

Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2017-09-27 Thread Patrick Farrell
A guess for you to consider:


A very common cause of ksoftirqd load is a hypervisor putting memory pressure 
on a VM.  At least VMWare, and I think KVM and others, use IRQs to implement 
some of their memory management and it can show up like this.


That would of course mean it's not really the ptlrpc module, I'm not sure how 
carefully you verified that it is causing this.  (Obviously your 'remove it, 
check, add it, check' method is sound, but if you just checked once or twice, 
you may have been wrong through bad luck or you could've been right at your 
limit of available memory.)


From: lustre-discuss  on behalf of 
Dilger, Andreas 
Sent: Wednesday, September 27, 2017 11:50:03 AM
To: Hans Henrik Happe
Cc: Shehata, Amir; lustre-discuss; Olaf Weber
Subject: Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
>
> Hi,
>
> Did anyone else experience CPU load from ksoftirqd after 'modprobe
> lustre'? On an otherwise idle node I see:
>
>  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
>9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1
>
>
>   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13
>
> The sum of those two is about 50% CPU.
>
> I have narrowed it down to the ptlrpc module. When I remove that, it stops.
>
> I also tested the 2.10.1-RC1, which is the same.

If you can run "echo l > /proc/sysrq-trigger" it will report the processes
that are currently running on the CPUs of your system to the console (and
also /var/log/messages, if it can write everything in time).

You might need to do this several times to get a representative sample of
the ksoftirqd process stacks to see what they are doing that is consuming
so much CPU.

Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
processes to the console (and /v/l/m), but there will be a lot of them,
and no better chance that it catches what ksoftirqd is doing 25% of the time.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2017-09-27 Thread Dilger, Andreas
On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
> 
> Hi,
> 
> Did anyone else experience CPU load from ksoftirqd after 'modprobe
> lustre'? On an otherwise idle node I see:
> 
>  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
>9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1
> 
> 
>   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13
> 
> The sum of those two is about 50% CPU.
> 
> I have narrowed it down to the ptlrpc module. When I remove that, it stops.
> 
> I also tested the 2.10.1-RC1, which is the same.

If you can run "echo l > /proc/sysrq-trigger" it will report the processes
that are currently running on the CPUs of your system to the console (and
also /var/log/messages, if it can write everything in time).

You might need to do this several times to get a representative sample of
the ksoftirqd process stacks to see what they are doing that is consuming
so much CPU.

Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
processes to the console (and /v/l/m), but there will be a lot of them,
and no better chance that it catches what ksoftirqd is doing 25% of the time.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org