** Description changed:
Below you can find the output of the collectl command when requesting
information about the ens3 interface
- root@ubuntu-1604:~# /usr/bin/collectl -sN
+ root@ubuntu-1604:~# /usr/bin/collectl -sN
waiting for 1 second sample...
Bogus data record skipped for NET:ens3: data on 20170809 at 08:45:13
# NETWORK STATISTICS (/sec)
#Num Name KBIn PktIn SizeIn MultI CmpI ErrsI KBOut PktOut SizeO
CmpO ErrsO
- 0 lo 0 2 84 0 0 0 0 2 84
0 0
- 1 ens3 0 0 0 0 0 0 0 0 0
0 0
+ 0 lo 0 2 84 0 0 0 0 2 84
0 0
+ 1 ens3 0 0 0 0 0 0 0 0 0
0 0
Bogus data record skipped for NET:ens3: data on 20170809 at 08:45:14
- 0 lo 0 2 84 0 0 0 0 2 84
0 0
- 1 ens3 0 0 0 0 0 0 0 0 0
0 0
+ 0 lo 0 2 84 0 0 0 0 2 84
0 0
+ 1 ens3 0 0 0 0 0 0 0 0 0
0 0
Bogus data record skipped for NET:ens3: data on 20170809 at 08:45:15
- 0 lo 0 2 84 0 0 0 0 2 84
0 0
- 1 ens3 0 0 0 0 0 0 0 0 0
0 0
-
+ 0 lo 0 2 84 0 0 0 0 2 84
0 0
+ 1 ens3 0 0 0 0 0 0 0 0 0
0 0
Another example:
-
root@ubuntu-1604:~# /usr/bin/collectl -sn
waiting for 1 second sample...
Bogus data record skipped for NET:ens3: data on 20170809 at 08:46:42
#<----------Network---------->
- # KBIn PktIn KBOut PktOut
- 0 0 0 0
+ # KBIn PktIn KBOut PktOut
+ 0 0 0 0
Bogus data record skipped for NET:ens3: data on 20170809 at 08:46:43
- 0 0 0 0
+ 0 0 0 0
Bogus data record skipped for NET:ens3: data on 20170809 at 08:46:44
- 0 0 0 0
+ 0 0 0 0
Bogus data record skipped for NET:ens3: data on 20170809 at 08:46:45
- 0 0 0 0
+ 0 0 0 0
Bogus data record skipped for NET:ens3: data on 20170809 at 08:46:46
- 0 0 0 0
-
+ 0 0 0 0
I debugged a bit the problem and this is what I found:
problem seems to be somewhere here:
/usr/share/collectl/formatit.ph
~approx line 4100 on condition
- if ($DefNetSpeed>0 && $intFirstSeen &&
+ if ($DefNetSpeed>0 && $intFirstSeen &&
($netRxKB[$netIndex]>$NetMaxTraffic[$netIndex] ||
netTxKB[$netIndex]>$NetMaxTraffic[$netIndex]))
The $NetMaxTraffic[$netIndex] seems to be -250 in my case because the
value of $speed is -1.
- This is how the value of speed seems to be generated:
- my $strace=`strace -c cat /proc/stat 2>&1`;
- $strace=~/^\s*\S+\s+(\S+).*read$/m;
- my $speed=$1;
- print "ProcReadSpeed: $speed\n" if $debug * 1;
-
-
-
- It is a bit weird that the strace -c option doesn't return anything in the
seconds column:
- Output from bash CLI:
- strace -c cat /proc/stat 2>&1
-
- root@ubuntu-1604:~# strace -c cat /proc/stat 2>&1
- cpu 2271314 847 242198 63491784 215548 0 29819 1194 0 0
- cpu0 609497 336 60150 15864230 20678 0 8004 192 0 0
- cpu1 640531 293 53714 15831059 23728 0 6560 635 0 0
- cpu2 524413 214 47444 15971787 26770 0 1822 158 0 0
- cpu3 496872 3 80889 15824707 144370 0 13432 208 0 0
- intr 83484424 44 10 0 0 1078 0 3 0 0 0 0 16627 15 0 0 0 0 0 0 0 0 0 0 0 0 25
0 5125053 190 0 0 0 34564947 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- ctxt 120422690
- btime 1502102910
- processes 199370
- procs_running 1
- procs_blocked 0
- softirq 80405112 0 16005968 366 6236511 34361453 0 42 8102266 0 15698506
- % time seconds usecs/call calls errors syscall
- ------ ----------- ----------- --------- --------- ----------------
- 0.00 0.000000 0 3 read
- 0.00 0.000000 0 1 write
- 0.00 0.000000 0 4 open
- 0.00 0.000000 0 6 close
- 0.00 0.000000 0 5 fstat
- 0.00 0.000000 0 10 mmap
- 0.00 0.000000 0 4 mprotect
- 0.00 0.000000 0 2 munmap
- 0.00 0.000000 0 3 brk
- 0.00 0.000000 0 3 3 access
- 0.00 0.000000 0 1 execve
- 0.00 0.000000 0 1 arch_prctl
- 0.00 0.000000 0 1 fadvise64
- ------ ----------- ----------- --------- --------- ----------------
- 100.00 0.000000 44 3 total
-
-
- This may be the problem why a negative value is generated for the speed.
-
- The speed value seems to be taken from here:
+
+ The speed value seems to be taken from here in my case:
cat /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/ens3/speed
-1
For lo interface I recive the following:
- cat /sys/devices/virtual/net/lo/speed
+ cat /sys/devices/virtual/net/lo/speed
cat: lo/speed: Invalid argument
The same 'Invalid argument' seems to be generated for ubuntu 14.04
instances.
-
Here is more info when collectl was started with the debug flag:
root@ubuntu-1604:~# /usr/bin/collectl -sn --debug 16989
Config File Search Path: /usr/bin/collectl.conf;/etc/collectl.conf
Reading Config File: /etc/collectl.conf
09:02:24 Couldn't find 'resize' so assuming terminal height of 24
BinDir: /usr/bin ReqDir: /usr/share/collectl
Set Output -- Subsys: n Verbose: 0 SameCols: 1
- DskFilt - Ignore: Keep:
- NetFilt - Ignore: Keep:
- RawDsk - Ignore: Keep:
- RawNet - Ignore: Keep:
- RawFlag: 0 PlotFlag: 0 Repeat: 22 Log2Flag: 0 Export:
+ DskFilt - Ignore: Keep:
+ NetFilt - Ignore: Keep:
+ RawDsk - Ignore: Keep:
+ RawNet - Ignore: Keep:
+ RawFlag: 0 PlotFlag: 0 Repeat: 22 Log2Flag: 0 Export:
From: 0 000000 Thru: 0 235959
09:02:25 V4.0.4-1 Beginning execution on prod-idx-solr-01...
initRecord() - Subsys: n
09:02:25 initDisk initialized 2 disks
set netSpeeds{ens3}=>-1<
set netSpeeds{lo}=>??<
SetFlags: n
RecFlags: 1 0
initFormat()
initLustre() -- Type: o From: 0 Number: 0
- initLustre() -- Type: m From: 0 Number:
+ initLustre() -- Type: m From: 0 Number:
initLustre() -- Type: c From: 0 Number: 0
initLustre() -- Type: c2 From: 0 Number: 0
waiting for 1 second sample...
Lustre Check Intervals: 1
>>> 1502269346.002 <<<
Net lo: 175833908 2093286 0 0 0 0 0 0
175833908 2093286 0 0 0 0 0 0
Net ens3: 71737046351 7211733 0 5 0 0 0 0
2086213559 6001251 0 0 0 0 0 0
>>> 1502269347.001 <<<
Net lo: 175834076 2093288 0 0 0 0 0 0
175834076 2093288 0 0 0 0 0 0
Net ens3: 71737049651 7211763 0 5 0 0 0 0
2086223834 6001270 0 0 0 0 0 0
09:02:27 Bogus data record skipped for NET:ens3: data on 20170809 at 09:02:26
09:02:27 Network speed threshhold: -250 Bogus Value(s) - TX: 10KB RX: 3KB
#<----------Network---------->
- # KBIn PktIn KBOut PktOut
- 0 0 0 0
+ # KBIn PktIn KBOut PktOut
+ 0 0 0 0
>>> 1502269348.001 <<<
Net lo: 175834244 2093290 0 0 0 0 0 0
175834244 2093290 0 0 0 0 0 0
Net ens3: 71737052969 7211801 0 5 0 0 0 0
2086234953 6001300 0 0 0 0 0 0
09:02:28 Bogus data record skipped for NET:ens3: data on 20170809 at 09:02:27
09:02:28 Network speed threshhold: -250 Bogus Value(s) - TX: 10KB RX: 3KB
- 0 0 0 0
+ 0 0 0 0
>>> 1502269349.001 <<<
Net lo: 175834412 2093292 0 0 0 0 0 0
175834412 2093292 0 0 0 0 0 0
Net ens3: 71737055047 7211823 0 5 0 0 0 0
2086243613 6001319 0 0 0 0 0 0
09:02:29 Bogus data record skipped for NET:ens3: data on 20170809 at 09:02:28
09:02:29 Network speed threshhold: -250 Bogus Value(s) - TX: 8KB RX: 2KB
- 0 0 0 0
+ 0 0 0 0
>>> 1502269350.001 <<<
Net lo: 175834580 2093294 0 0 0 0 0 0
175834580 2093294 0 0 0 0 0 0
Net ens3: 71737057941 7211855 0 5 0 0 0 0
2086253980 6001344 0 0 0 0 0 0
09:02:30 Bogus data record skipped for NET:ens3: data on 20170809 at 09:02:29
09:02:30 Network speed threshhold: -250 Bogus Value(s) - TX: 10KB RX: 2KB
- 0 0 0 0
+ 0 0 0 0
Ouch!
>>> 1502269350.559 <<<
09:02:30 Terminating...
-
-
-
-
Info about the system:
-
== ApportVersion =================================
2.20.1-0ubuntu2.10
== Architecture =================================
amd64
== Date =================================
Wed Aug 9 09:06:59 2017
== Dependencies =================================
gcc-6-base 6.0.1-0ubuntu1
libc6 2.23-0ubuntu9
libgcc1 1:6.0.1-0ubuntu1
libkmod2 22-1ubuntu5
libpci3 1:3.3.1-1.1ubuntu1.1
libudev1 229-4ubuntu19
pciutils 1:3.3.1-1.1ubuntu1.1
zlib1g 1:1.2.8.dfsg-2ubuntu4.1
== DistroRelease =================================
Ubuntu 16.04
== Ec2AMI =================================
ami-0000000a
== Ec2AMIManifest =================================
FIXME
== Ec2AvailabilityZone =================================
node-1
== Ec2InstanceType =================================
cloud-4c.24gb.10gb
== Ec2Kernel =================================
unavailable
== Ec2Ramdisk =================================
unavailable
== JournalErrors =================================
-- Logs begin at Mon 2017-08-07 10:48:34 UTC, end at Wed 2017-08-09 09:06:56
UTC. --
Aug 07 10:48:34 hostname kernel: ACPI: RSDP 0x00000000000F6590 000014 (v00
BOCHS )
Aug 07 10:48:34 hostname kernel: ACPI: RSDT 0x00000000BFFE1499 000030 (v01
BOCHS BXPCRSDT 00000001 BXPC 00000001)
Aug 07 10:48:34 hostname kernel: ACPI: FACP 0x00000000BFFE0914 000074 (v01
BOCHS BXPCFACP 00000001 BXPC 00000001)
Aug 07 10:48:34 hostname kernel: ACPI: DSDT 0x00000000BFFDFD40 000BD4 (v01
BOCHS BXPCDSDT 00000001 BXPC 00000001)
Aug 07 10:48:34 hostname kernel: ACPI: FACS 0x00000000BFFDFD00 000040
Aug 07 10:48:34 hostname kernel: ACPI: SSDT 0x00000000BFFE0988 000A81 (v01
BOCHS BXPCSSDT 00000001 BXPC 00000001)
Aug 07 10:48:34 hostname kernel: ACPI: APIC 0x00000000BFFE1409 000090 (v01
BOCHS BXPCAPIC 00000001 BXPC 00000001)
Aug 07 10:48:34 hostname kernel: ACPI: 2 ACPI AML tables successfully
acquired and loaded
Aug 07 10:48:34 hostname kernel: #2
Aug 07 10:48:34 hostname kernel: #3
Aug 07 10:48:34 hostname kernel: PCCT header not found.
Aug 07 10:48:34 hostname kernel: acpi PNP0A03:00: fail to add MMCONFIG
information, can't access extended PCI configuration space under this bridge.
Aug 07 10:48:34 hostname kernel: ACPI: Enabled 16 GPEs in block 00 to 0F
Aug 07 10:48:34 hostname kernel: ACPI: PCI Interrupt Link [LNKD] enabled at
IRQ 11
Aug 07 10:48:34 hostname kernel: ACPI: PCI Interrupt Link [LNKC] enabled at
IRQ 10
Aug 07 10:48:34 hostname kernel: ACPI: PCI Interrupt Link [LNKA] enabled at
IRQ 10
Aug 07 10:48:34 hostname kernel: ACPI: PCI Interrupt Link [LNKB] enabled at
IRQ 11
Aug 07 10:48:34 hostname systemd-sysv-generator[601]: stat() failed on
/etc/init.d/solr, ignoring: No such file or directory
Aug 07 10:48:35 hostname systemd-tmpfiles[909]:
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Aug 07 10:48:37 hostname kernel: cgroup: new mount options do not match the
existing superblock, will be ignored
Aug 07 10:48:37 hostname iscsid[1323]: iSCSI logger with pid=1325 started!
Aug 07 10:48:37 hostname iscsid[1325]: iSCSI daemon with pid=1326 started!
Aug 07 11:03:41 hostname systemd-tmpfiles[2838]:
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Aug 08 11:04:23 hostname systemd-tmpfiles[32491]:
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Aug 09 07:38:54 hostname iscsid[1325]: iscsid shutting down.
Aug 09 07:38:54 hostname iscsid[24414]: iSCSI logger with pid=24416 started!
Aug 09 07:38:55 hostname iscsid[24416]: iSCSI daemon with pid=24417 started!
Aug 09 07:38:55 hostname kernel: audit_printk_skb: 12 callbacks suppressed
== Package =================================
collectl 4.0.4-1 [modified: usr/bin/collectl usr/share/collectl/formatit.ph]
== PackageArchitecture =================================
all
== ProblemType =================================
Bug
== ProcCpuinfoMinimal =================================
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel Xeon E312xx (Sandy Bridge)
stepping : 1
microcode : 0x1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 3
siblings : 1
core id : 0
cpu cores : 1
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc
rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt
tsc_deadline_timer aes xsave avx hypervisor lahf_lm xsaveopt arat
bugs :
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
== ProcEnviron =================================
LC_TIME=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
TERM=xterm-256color
PATH=(custom, no user)
LC_ADDRESS=en_US.UTF-8
XDG_RUNTIME_DIR=<set>
LC_TELEPHONE=en_US.UTF-8
LANG=en_US.UTF-8
SHELL=/bin/bash
LC_NAME=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_PAPER=en_US.UTF-8
== ProcVersionSignature =================================
Ubuntu 4.4.0-89.112-generic 4.4.76
== SourcePackage =================================
collectl
== Tags =================================
- xenial ec2-images
+ xenial ec2-images
== Uname =================================
Linux 4.4.0-89-generic x86_64
== UpgradeStatus =================================
No upgrade log present (probably fresh install)
-
-
- Do you think that it is possible to add another way to make the collectl
script functional even if this type of negative values are found in the
/sys/class/net/*/speed path?
+ Do you think that it is possible to add another way to make the collectl
+ script functional even if this type of negative values are found in the
+ /sys/class/net/*/speed path?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1709589
Title:
collectl doesn't return correct values for network on ens3 interface
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/collectl/+bug/1709589/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs