Yes, I love to learn as well. This is the output to lscpu:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Stepping: 2 CPU MHz: 2597.046 BogoMIPS: 5194.09 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-3 The load average is: load average: 464.68, 415.14, 416.96 which does not make sense at all. The rest of TOP: Cpu(s): 51.3%us, 16.0%sy, 0.0%ni, 32.0%id, 0.0%wa, 0.0%hi, 0.4%si, 0.2%st If I hit 1, it affects all 4 CPUs. Can you elaborate on why > apicid : 25 > initial apicid : 25 25 is a weird number? From an earlier thread, is this simply a logical ID? All the other systems are reporting this number as 4 and all of them are having reasonable load. This machine, with apicid at 25, is reporting load average in the 300s to 400s range. I do not have root access nor sudo. I want to try and find out why the load is so high before I escalate and argue for more privilege. When I brought this up to the responsible team, I was given a probable cause -- There are other activities hosting this VM server and they are causing this issue. I am not sure I understand the configuration nor do I believe other activities elsewhere outside of this image is causing this high load. When I attempted to dig around (e.g. IO wait, network collision etc..), this apicid is the only difference I can find when I compare with the other machines we have in our data center. --v On Thu, Apr 25, 2019 at 10:37 AM Aaron Burt <[email protected]> wrote: > On 2019-04-24 17:40, VY wrote: > > We have several machines and they are all supposedly identical Intel > > Xeon > > machines. 4 CPUs each and identical Linux version. > > One of the machines are reporting VERY high load consistently. > > They are all running identical applications and I don't see any > > difference > > in load. > > Classic problem. Fun and very educational to solve. > > Some clarifying questions: > > 1. By "4 CPUs" do you mean 4 separate CPU dies/packages, 4 cores in a > single package on the mobo or 2 cores with hyperthreading? Or are these > VMs from AWS or something? > > 2. By "VERY high load", what do you mean? loadavg? What values on > normal vs affected system? Is it spiky (the three 1m/5m/15m loadavgs > are very different) or consistent (the three loadavg values are about > the same)? > > 3. How does total CPU usage break out in "top" - user, system, > interrupt, iowait etc.? > > 4. If you hit "1" in "top", is it affecting all cores equally or just > one core? > > 5. Is this affecting application performance, and if so what effects are > you seeing? > > 6. Have you rebooted the affected system? > > 7. Have you done a "chkrootkit" or other security/intrusion check on the > affected system? > > > > However, when I look at /proc/cpuinfo, this "very high load" box is > > saying: > > apicid : 25 > > initial apicid : 25 > > All the other machines are reporting 4 for this number. > > Interesting. Is it the same number for all 4 cores on each machine? I > don't know if the APIC is on the die on your setup, but there's usually > only a few of those so 25 sounds like a weird number. Maybe the funny > one is running your kernel inside some virtualization layer? > > Learning experiences abound, > Aaron > _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
