Yes, I love to learn as well.

This is the output to lscpu:

  Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                4

On-line CPU(s) list:   0-3

Thread(s) per core:    1

Core(s) per socket:    1

Socket(s):             4

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 63

Stepping:              2

CPU MHz:               2597.046

BogoMIPS:              5194.09

Hypervisor vendor:     Xen

Virtualization type:   full

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              20480K

NUMA node0 CPU(s):     0-3


The load average is:

load average: 464.68, 415.14, 416.96


which does not make sense at all.


The rest of TOP:


  Cpu(s): 51.3%us, 16.0%sy,  0.0%ni, 32.0%id,  0.0%wa,  0.0%hi,  0.4%si,
0.2%st


If I hit 1, it affects all 4 CPUs.




Can you elaborate on why


 >    apicid : 25
>   initial apicid : 25

25 is a weird number?   From an earlier thread, is this simply a logical
ID?
All the other systems are reporting this number as 4 and all of them are
having reasonable load.
This machine, with apicid at 25, is reporting load average in the 300s to
400s range.


I do not have root access nor sudo.  I want to try and find out why
the load is so high before I escalate and argue for more privilege.
When I brought this up to the responsible team, I was given a probable
cause -- There are other activities hosting this VM server and they are
causing this
issue.  I am not sure I understand the configuration nor do I believe other
activities elsewhere outside of this image is causing this high load.
When I attempted to dig around (e.g. IO wait, network collision etc..),
this apicid is the only difference I can find when I compare with the other
machines we
have in our data center.


--v









On Thu, Apr 25, 2019 at 10:37 AM Aaron Burt <[email protected]> wrote:

> On 2019-04-24 17:40, VY wrote:
> > We have several machines and they are all supposedly identical Intel
> > Xeon
> > machines. 4 CPUs each and identical Linux version.
> > One of the machines are reporting VERY high load consistently.
> > They are all running identical applications and I don't see any
> > difference
> > in load.
>
> Classic problem.  Fun and very educational to solve.
>
> Some clarifying questions:
>
> 1. By "4 CPUs" do you mean 4 separate CPU dies/packages, 4 cores in a
> single package on the mobo or 2 cores with hyperthreading?  Or are these
> VMs from AWS or something?
>
> 2. By "VERY high load", what do you mean? loadavg?  What values on
> normal vs affected system?  Is it spiky (the three 1m/5m/15m loadavgs
> are very different) or consistent (the three loadavg values are about
> the same)?
>
> 3. How does total CPU usage break out in "top" - user, system,
> interrupt, iowait etc.?
>
> 4. If you hit "1" in "top", is it affecting all cores equally or just
> one core?
>
> 5. Is this affecting application performance, and if so what effects are
> you seeing?
>
> 6. Have you rebooted the affected system?
>
> 7. Have you done a "chkrootkit" or other security/intrusion check on the
> affected system?
>
>
> > However, when I look at /proc/cpuinfo, this "very high load" box is
> > saying:
> >    apicid : 25
> >   initial apicid : 25
> > All the other machines are reporting 4 for this number.
>
> Interesting.  Is it the same number for all 4 cores on each machine?  I
> don't know if the APIC is on the die on your setup, but there's usually
> only a few of those so 25 sounds like a weird number.  Maybe the funny
> one is running your kernel inside some virtualization layer?
>
> Learning experiences abound,
>    Aaron
>
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to