+dev lists

Peter Mikus
Engineer - Software
Cisco Systems Limited

> -----Original Message-----
> From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco)
> Sent: Friday, November 29, 2019 11:06 AM
> To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš
> <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan)
> <mkons...@cisco.com>
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco)
> <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>;
> lijian.zh...@arm.com; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> Subject: CSIT - performance tests failing on Taishan
> 
> Hello all,
> 
> In CSIT we are observing the issue with Taishan boxes where performance
> tests are failing.
> There has been long misleading discussion about the potential issue, root
> cause and what workaround to apply.
> 
> Issue
> =====
> VPP is being restarted after an attempt to read "show pci" over the
> socket on '/run/vpp/cli.sock'
> in a loop. This loop test is executed in CSIT towards VPP with default
> startup configuration via command below to check if VPP is really UP and
> responding.
> 
> How to reproduce
> ================
> for i in $(seq 1 120); do echo "show pci" | sudo socat - UNIX-
> CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done
> 
> The same can be reproduced using vppctl:
> 
> for i in $(seq 1 120); do echo "show pci" | sudo vppctl; sudo netstat -ap
> | grep vpp; done
> 
> To eliminate the issue with test itself I used "show version"
> for i in $(seq 1 120); do echo "show version" | sudo socat - UNIX-
> CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done
> 
> This test is passing with "show version" and VPP is not restarted.
> 
> 
> Root cause
> ==========
> The root cause seems to be:
> 
> Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
> 0x0000ffffbeb4f3d0 in format_vlib_pci_vpd (
>     s=0xffff7fabe830 "0002:f9:00.0   0  15b3:1015   8.0 GT/s x8
> mlx5_core       CX4121A - ConnectX-4 LX SFP28", args
> =<optimized out>)
>     at /w/workspace/vpp-arm-merge-master-
> ubuntu1804/src/vlib/pci/pci.c:230
> 230     /w/workspace/vpp-arm-merge-master-ubuntu1804/src/vlib/pci/pci.c:
> No such file or directory.
> (gdb)
> Continuing.
> 
> Thread 1 "vpp_main" received signal SIGABRT, Aborted.
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb)
> 
> 
> Issue started after MLX was installed into Taishan.
> 
> 
> @Benoit Ganne (bganne) can you please help fixing the root cause?
> 
> Thank you.
> 
> Peter Mikus
> Engineer - Software
> Cisco Systems Limited

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14734): https://lists.fd.io/g/vpp-dev/message/14734
Mute This Topic: https://lists.fd.io/mt/64332740/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-
  • ... Peter Mikus via Lists.Fd.Io
    • ... Benoit Ganne (bganne) via Lists.Fd.Io
      • ... Juraj Linkeš
        • ... Peter Mikus via Lists.Fd.Io
          • ... Juraj Linkeš
            • ... Lijian Zhang
              • ... Juraj Linkeš
                • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io

Reply via email to