Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-08-21 Thread Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
> then this issue was fixed 

It was not fixed, we have just listed VPP-1361 as a known issue [2]
and performed more job runs to compensate for failures.

Vratko.

[2] 
https://docs.fd.io/csit/rls1807/report/vpp_performance_tests/csit_release_notes.html

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Pei, Yulong
Sent: Tuesday, 2018-August-21 08:31
To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
; Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi  Peter & Vratko,

I noticed " FD.io CSIT-18.07 v1.0 report published",  then this issue was fixed 
in csit-18.07,  right ?
Can you help to tell me how to fix this issue ?

Best Regards
Yulong Pei 

-Original Message-
From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
[mailto:vrpo...@cisco.com]
Sent: Thursday, August 2, 2018 12:13 AM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Pei, Yulong 
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

> VPP is not crashing so no core dump are available.

I tried to use "gcore" command to create a core dump from running VPP.
So far I got this [0] archive, compressed to around 25 MB, but the core file 
inside is around 160 GB big.

Not sure how to make it smaller, even with small numbers in startup.conf, the 
core file has around 140 GB.

Vratko.

[0] 
https://jenkins.fd.io/sandbox/job/csit-vpp-perf-verify-master-2n-skx/13/artifact/archive/DUT1_cores.tar.xz

-Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco)
Sent: Tuesday, 2018-July-31 13:25
To: Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at 
Cisco) ; yulong....@intel.com
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello,

Thanks to Vratko (cc), he tested latest master with DPDK 18.02.2 [0]. The issue 
is there as well.

I cannot confirm if "no JSON data.VAT" is related. The bad thing is that there 
is no meaningful return message with more verbose output.

(we do see this on pretty much on all NIC cards in LF and all TBs)

[0] 
https://jenkins.fd.io/sandbox/job/vpp-csit-verify-hw-perf-master-2n-skx/6/consoleFull

Peter Mikus
Engineer – Software
Cisco Systems Limited

-Original Message-
From: Ray Kinsella [mailto:m...@ashroe.eu]
Sent: Tuesday, July 31, 2018 12:06 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; yulong....@intel.com
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly with 
FD.io VPP 18.04 and the x520, on our local test rig.

The error we typically see is "VAT command sw_interface_set_flags sw_if_index 1 
admin-up: no JSON data.VAT".

Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:
> Hello vpp-dev,
> 
> I am looking for consultation. We started to test VPP for report on 
> all LF CSIT testbeds Skylakes and Haswells.
> 
> We are observing weird behavior. In each test we are using sequence to 
> first bring the both interfaces (physical up) by VAT:
> 
>    sw_interface_set_flags sw_if_index  admin-up (I 
> also tried sw_interface_set_flags sw_if_index idx admin-up link-up)
> 
> After setting all interfaces UP we are testing if interfaces are 
> really UP by VAT (loop 30times, 1s between API call check): 
> “sw_interface_dump”.
> 
> It wasn’t an issue in past but recently we start seeing that 
> sw_interface_dump is reporting interfaces as link_down (admin-up).
> 
> Notes/symptoms:
> 
> -Our sw_interface_dump check is running 30x (1s interval) in loop.
> 
> -Link-down is random, sometimes both interfaces are link-up sometimes 
> just one and sometimes both link are down.
> 
> -_It is not TB related_, nor cabling related, we see it on 
> Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
> but on Skylake-3node more than half of the tests.
> 
> -Checking state during test reveals that interfaces are link-down 
> (show
> int) so “sw_interface_dump” is reporting state correctly.
> 
> -Doing CLI during test “set interface state … up” does bring 
> interfaces UP -> (but it is hard to check the timing here).
> 
> -Affected are mostly x520 and x710, but that is most probably because 
> of statistics (low coverage of other NICs like xxv710 and xl710).
> 
> -We have seen this in master vpp as well as rc2 vpp.
> 
> -It is not clear when this starts to happen, so bisecting would take 
> lot of time.
&

Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-08-21 Thread Pei, Yulong
Hi  Peter & Vratko,

I noticed " FD.io CSIT-18.07 v1.0 report published",  then this issue was fixed 
in csit-18.07,  right ?
Can you help to tell me how to fix this issue ?

Best Regards
Yulong Pei 

-Original Message-
From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
[mailto:vrpo...@cisco.com] 
Sent: Thursday, August 2, 2018 12:13 AM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Pei, Yulong 
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

> VPP is not crashing so no core dump are available.

I tried to use "gcore" command to create a core dump from running VPP.
So far I got this [0] archive, compressed to around 25 MB, but the core file 
inside is around 160 GB big.

Not sure how to make it smaller, even with small numbers in startup.conf, the 
core file has around 140 GB.

Vratko.

[0] 
https://jenkins.fd.io/sandbox/job/csit-vpp-perf-verify-master-2n-skx/13/artifact/archive/DUT1_cores.tar.xz

-Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco)
Sent: Tuesday, 2018-July-31 13:25
To: Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at 
Cisco) ; yulong....@intel.com
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello,

Thanks to Vratko (cc), he tested latest master with DPDK 18.02.2 [0]. The issue 
is there as well.

I cannot confirm if "no JSON data.VAT" is related. The bad thing is that there 
is no meaningful return message with more verbose output.

(we do see this on pretty much on all NIC cards in LF and all TBs)

[0] 
https://jenkins.fd.io/sandbox/job/vpp-csit-verify-hw-perf-master-2n-skx/6/consoleFull

Peter Mikus
Engineer – Software
Cisco Systems Limited

-Original Message-
From: Ray Kinsella [mailto:m...@ashroe.eu]
Sent: Tuesday, July 31, 2018 12:06 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; yulong....@intel.com
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly with 
FD.io VPP 18.04 and the x520, on our local test rig.

The error we typically see is "VAT command sw_interface_set_flags sw_if_index 1 
admin-up: no JSON data.VAT".

Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:
> Hello vpp-dev,
> 
> I am looking for consultation. We started to test VPP for report on 
> all LF CSIT testbeds Skylakes and Haswells.
> 
> We are observing weird behavior. In each test we are using sequence to 
> first bring the both interfaces (physical up) by VAT:
> 
>    sw_interface_set_flags sw_if_index  admin-up (I 
> also tried sw_interface_set_flags sw_if_index idx admin-up link-up)
> 
> After setting all interfaces UP we are testing if interfaces are 
> really UP by VAT (loop 30times, 1s between API call check): 
> “sw_interface_dump”.
> 
> It wasn’t an issue in past but recently we start seeing that 
> sw_interface_dump is reporting interfaces as link_down (admin-up).
> 
> Notes/symptoms:
> 
> -Our sw_interface_dump check is running 30x (1s interval) in loop.
> 
> -Link-down is random, sometimes both interfaces are link-up sometimes 
> just one and sometimes both link are down.
> 
> -_It is not TB related_, nor cabling related, we see it on 
> Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
> but on Skylake-3node more than half of the tests.
> 
> -Checking state during test reveals that interfaces are link-down 
> (show
> int) so “sw_interface_dump” is reporting state correctly.
> 
> -Doing CLI during test “set interface state … up” does bring 
> interfaces UP -> (but it is hard to check the timing here).
> 
> -Affected are mostly x520 and x710, but that is most probably because 
> of statistics (low coverage of other NICs like xxv710 and xl710).
> 
> -We have seen this in master vpp as well as rc2 vpp.
> 
> -It is not clear when this starts to happen, so bisecting would take 
> lot of time.
> 
> -This was spotted on VIRL as well also on Memif interface which bring 
> us to suspicious that this is related to API not HW.
> 
> Do you have an idea what we could check further? VPP is not crashing 
> so no core dump are available. This issue is not 100% replicable which 
> makes it hard to debug.
> 
> Is there a way to get more verbose error from the api call mentioned 
> to reveal more information?
> 
> **
> 
> Thank you.
> 
> *Peter Mikus*
> Engineer – Software
> 
> *Cisco Systems Limited*
> 
> http://www.cisco.com/web/europe/images/email/signatu

Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-08-03 Thread Peter Mikus via Lists.Fd.Io
Hello vpp-dev,

Can you please take a look and advise? Currently 50-70% of CSIT tests on SKX 
(Ubuntu 18.04) are failing. About 10% affected on Haswell testbeds (Ubuntu 
16.04).

Thank you in advance.

Peter Mikus
Engineer – Software
Cisco Systems Limited


-Original Message-
From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
Sent: Thursday, August 02, 2018 1:05 PM
To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
; Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; Ray Kinsella ; vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Added a Jira comment [1] with some details and attached the same dump (just 
compressed better) to the Jira bug report.

Vratko.

[1] 
https://jira.fd.io/browse/VPP-1361?focusedCommentId=13104=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13104

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Vratko Polak -X 
(vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
Sent: Wednesday, 2018-August-01 18:13
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; Ray Kinsella ; vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

> VPP is not crashing so no core dump are available.

I tried to use "gcore" command to create a core dump from running VPP.
So far I got this [0] archive, compressed to around 25 MB, but the core file 
inside is around 160 GB big.

Not sure how to make it smaller, even with small numbers in startup.conf, the 
core file has around 140 GB.

Vratko.

[0] 
https://jenkins.fd.io/sandbox/job/csit-vpp-perf-verify-master-2n-skx/13/artifact/archive/DUT1_cores.tar.xz

-Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco)
Sent: Tuesday, 2018-July-31 13:25
To: Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at 
Cisco) ; yulong@intel.com
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello,

Thanks to Vratko (cc), he tested latest master with DPDK 18.02.2 [0]. The issue 
is there as well.

I cannot confirm if "no JSON data.VAT" is related. The bad thing is that there 
is no meaningful return message with more verbose output.

(we do see this on pretty much on all NIC cards in LF and all TBs)

[0] 
https://jenkins.fd.io/sandbox/job/vpp-csit-verify-hw-perf-master-2n-skx/6/consoleFull

Peter Mikus
Engineer – Software
Cisco Systems Limited

-Original Message-
From: Ray Kinsella [mailto:m...@ashroe.eu]
Sent: Tuesday, July 31, 2018 12:06 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; yulong@intel.com
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly with 
FD.io VPP 18.04 and the x520, on our local test rig.

The error we typically see is "VAT command sw_interface_set_flags sw_if_index 1 
admin-up: no JSON data.VAT".

Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:
> Hello vpp-dev,
> 
> I am looking for consultation. We started to test VPP for report on 
> all LF CSIT testbeds Skylakes and Haswells.
> 
> We are observing weird behavior. In each test we are using sequence to 
> first bring the both interfaces (physical up) by VAT:
> 
>    sw_interface_set_flags sw_if_index  admin-up (I 
> also tried sw_interface_set_flags sw_if_index idx admin-up link-up)
> 
> After setting all interfaces UP we are testing if interfaces are 
> really UP by VAT (loop 30times, 1s between API call check): 
> “sw_interface_dump”.
> 
> It wasn’t an issue in past but recently we start seeing that 
> sw_interface_dump is reporting interfaces as link_down (admin-up).
> 
> Notes/symptoms:
> 
> -Our sw_interface_dump check is running 30x (1s interval) in loop.
> 
> -Link-down is random, sometimes both interfaces are link-up sometimes 
> just one and sometimes both link are down.
> 
> -_It is not TB related_, nor cabling related, we see it on 
> Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
> but on Skylake-3node more than half of the tests.
> 
> -Checking state during test reveals that interfaces are link-down 
> (show
> int) so “sw_interface_dump” is reporting state correctly.
> 
> -Doing CLI during test “set interface state … up” does bring 
> interfaces UP -> (but it is hard to check the timing here).
> 
> -Affected are mostly x520 and x710, but that is most probably because 
> of statistics (low coverage of other NICs like xxv710 and xl710).
> 
> -We have seen this in master vpp as well as rc2 vpp.
> 
> -It is not clear 

Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-08-02 Thread Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
Added a Jira comment [1] with some details
and attached the same dump (just compressed better)
to the Jira bug report.

Vratko.

[1] 
https://jira.fd.io/browse/VPP-1361?focusedCommentId=13104=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13104

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Vratko Polak -X 
(vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
Sent: Wednesday, 2018-August-01 18:13
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; Ray Kinsella ; vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

> VPP is not crashing so no core dump are available.

I tried to use "gcore" command to create a core dump from running VPP.
So far I got this [0] archive, compressed to around 25 MB, but the core file 
inside is around 160 GB big.

Not sure how to make it smaller, even with small numbers in startup.conf, the 
core file has around 140 GB.

Vratko.

[0] 
https://jenkins.fd.io/sandbox/job/csit-vpp-perf-verify-master-2n-skx/13/artifact/archive/DUT1_cores.tar.xz

-Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco)
Sent: Tuesday, 2018-July-31 13:25
To: Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at 
Cisco) ; yulong@intel.com
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello,

Thanks to Vratko (cc), he tested latest master with DPDK 18.02.2 [0]. The issue 
is there as well.

I cannot confirm if "no JSON data.VAT" is related. The bad thing is that there 
is no meaningful return message with more verbose output.

(we do see this on pretty much on all NIC cards in LF and all TBs)

[0] 
https://jenkins.fd.io/sandbox/job/vpp-csit-verify-hw-perf-master-2n-skx/6/consoleFull

Peter Mikus
Engineer – Software
Cisco Systems Limited

-Original Message-
From: Ray Kinsella [mailto:m...@ashroe.eu]
Sent: Tuesday, July 31, 2018 12:06 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; yulong@intel.com
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly with 
FD.io VPP 18.04 and the x520, on our local test rig.

The error we typically see is "VAT command sw_interface_set_flags sw_if_index 1 
admin-up: no JSON data.VAT".

Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:
> Hello vpp-dev,
> 
> I am looking for consultation. We started to test VPP for report on 
> all LF CSIT testbeds Skylakes and Haswells.
> 
> We are observing weird behavior. In each test we are using sequence to 
> first bring the both interfaces (physical up) by VAT:
> 
>    sw_interface_set_flags sw_if_index  admin-up (I 
> also tried sw_interface_set_flags sw_if_index idx admin-up link-up)
> 
> After setting all interfaces UP we are testing if interfaces are 
> really UP by VAT (loop 30times, 1s between API call check): 
> “sw_interface_dump”.
> 
> It wasn’t an issue in past but recently we start seeing that 
> sw_interface_dump is reporting interfaces as link_down (admin-up).
> 
> Notes/symptoms:
> 
> -Our sw_interface_dump check is running 30x (1s interval) in loop.
> 
> -Link-down is random, sometimes both interfaces are link-up sometimes 
> just one and sometimes both link are down.
> 
> -_It is not TB related_, nor cabling related, we see it on 
> Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
> but on Skylake-3node more than half of the tests.
> 
> -Checking state during test reveals that interfaces are link-down 
> (show
> int) so “sw_interface_dump” is reporting state correctly.
> 
> -Doing CLI during test “set interface state … up” does bring 
> interfaces UP -> (but it is hard to check the timing here).
> 
> -Affected are mostly x520 and x710, but that is most probably because 
> of statistics (low coverage of other NICs like xxv710 and xl710).
> 
> -We have seen this in master vpp as well as rc2 vpp.
> 
> -It is not clear when this starts to happen, so bisecting would take 
> lot of time.
> 
> -This was spotted on VIRL as well also on Memif interface which bring 
> us to suspicious that this is related to API not HW.
> 
> Do you have an idea what we could check further? VPP is not crashing 
> so no core dump are available. This issue is not 100% replicable which 
> makes it hard to debug.
> 
> Is there a way to get more verbose error from the api call mentioned 
> to reveal more information?
> 
> **
> 
> Thank you.
> 
> *Peter Mikus*
> Engineer – Software
> 
> *Cisco Systems Limited*
> 

Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-08-01 Thread Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
> VPP is not crashing so no core dump are available.

I tried to use "gcore" command to create a core dump from running VPP.
So far I got this [0] archive, compressed to around 25 MB,
but the core file inside is around 160 GB big.

Not sure how to make it smaller, even with small numbers
in startup.conf, the core file has around 140 GB.

Vratko.

[0] 
https://jenkins.fd.io/sandbox/job/csit-vpp-perf-verify-master-2n-skx/13/artifact/archive/DUT1_cores.tar.xz

-Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
Sent: Tuesday, 2018-July-31 13:25
To: Ray Kinsella ; vpp-dev@lists.fd.io
Cc: csit-...@lists.fd.io; Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at 
Cisco) ; yulong@intel.com
Subject: RE: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello,

Thanks to Vratko (cc), he tested latest master with DPDK 18.02.2 [0]. The issue 
is there as well.

I cannot confirm if "no JSON data.VAT" is related. The bad thing is that there 
is no meaningful return message with more verbose output.

(we do see this on pretty much on all NIC cards in LF and all TBs)

[0] 
https://jenkins.fd.io/sandbox/job/vpp-csit-verify-hw-perf-master-2n-skx/6/consoleFull

Peter Mikus
Engineer – Software
Cisco Systems Limited

-Original Message-
From: Ray Kinsella [mailto:m...@ashroe.eu]
Sent: Tuesday, July 31, 2018 12:06 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; yulong@intel.com
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly with 
FD.io VPP 18.04 and the x520, on our local test rig.

The error we typically see is "VAT command sw_interface_set_flags sw_if_index 1 
admin-up: no JSON data.VAT".

Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:
> Hello vpp-dev,
> 
> I am looking for consultation. We started to test VPP for report on 
> all LF CSIT testbeds Skylakes and Haswells.
> 
> We are observing weird behavior. In each test we are using sequence to 
> first bring the both interfaces (physical up) by VAT:
> 
>    sw_interface_set_flags sw_if_index  admin-up (I 
> also tried sw_interface_set_flags sw_if_index idx admin-up link-up)
> 
> After setting all interfaces UP we are testing if interfaces are 
> really UP by VAT (loop 30times, 1s between API call check): 
> “sw_interface_dump”.
> 
> It wasn’t an issue in past but recently we start seeing that 
> sw_interface_dump is reporting interfaces as link_down (admin-up).
> 
> Notes/symptoms:
> 
> -Our sw_interface_dump check is running 30x (1s interval) in loop.
> 
> -Link-down is random, sometimes both interfaces are link-up sometimes 
> just one and sometimes both link are down.
> 
> -_It is not TB related_, nor cabling related, we see it on 
> Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
> but on Skylake-3node more than half of the tests.
> 
> -Checking state during test reveals that interfaces are link-down 
> (show
> int) so “sw_interface_dump” is reporting state correctly.
> 
> -Doing CLI during test “set interface state … up” does bring 
> interfaces UP -> (but it is hard to check the timing here).
> 
> -Affected are mostly x520 and x710, but that is most probably because 
> of statistics (low coverage of other NICs like xxv710 and xl710).
> 
> -We have seen this in master vpp as well as rc2 vpp.
> 
> -It is not clear when this starts to happen, so bisecting would take 
> lot of time.
> 
> -This was spotted on VIRL as well also on Memif interface which bring 
> us to suspicious that this is related to API not HW.
> 
> Do you have an idea what we could check further? VPP is not crashing 
> so no core dump are available. This issue is not 100% replicable which 
> makes it hard to debug.
> 
> Is there a way to get more verbose error from the api call mentioned 
> to reveal more information?
> 
> **
> 
> Thank you.
> 
> *Peter Mikus*
> Engineer – Software
> 
> *Cisco Systems Limited*
> 
> http://www.cisco.com/web/europe/images/email/signature/logo05.jpg
> 
> Think before you print.
> 
> This email may contain confidential and privileged material for the 
> sole use of the intended recipient. Any review, use, distribution or 
> disclosure by others is strictly prohibited. If you are not the 
> intended recipient (or authorized to receive for the recipient), 
> please contact the sender by reply email and delete all copies of this 
> message.
> 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
&g

Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-07-31 Thread Peter Mikus via Lists.Fd.Io
Hello,

Thanks to Vratko (cc), he tested latest master with DPDK 18.02.2 [0]. The issue 
is there as well.

I cannot confirm if "no JSON data.VAT" is related. The bad thing is that there 
is no meaningful return message with more verbose output.

(we do see this on pretty much on all NIC cards in LF and all TBs)

[0] 
https://jenkins.fd.io/sandbox/job/vpp-csit-verify-hw-perf-master-2n-skx/6/consoleFull

Peter Mikus
Engineer - Software
Cisco Systems Limited

-Original Message-
From: Ray Kinsella [mailto:m...@ashroe.eu] 
Sent: Tuesday, July 31, 2018 12:06 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; yulong@intel.com
Subject: Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly with 
FD.io VPP 18.04 and the x520, on our local test rig.

The error we typically see is "VAT command sw_interface_set_flags sw_if_index 1 
admin-up: no JSON data.VAT".

Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:
> Hello vpp-dev,
> 
> I am looking for consultation. We started to test VPP for report on 
> all LF CSIT testbeds Skylakes and Haswells.
> 
> We are observing weird behavior. In each test we are using sequence to 
> first bring the both interfaces (physical up) by VAT:
> 
>    sw_interface_set_flags sw_if_index  admin-up (I 
> also tried sw_interface_set_flags sw_if_index idx admin-up link-up)
> 
> After setting all interfaces UP we are testing if interfaces are 
> really UP by VAT (loop 30times, 1s between API call check): 
> "sw_interface_dump".
> 
> It wasn't an issue in past but recently we start seeing that 
> sw_interface_dump is reporting interfaces as link_down (admin-up).
> 
> Notes/symptoms:
> 
> -Our sw_interface_dump check is running 30x (1s interval) in loop.
> 
> -Link-down is random, sometimes both interfaces are link-up sometimes 
> just one and sometimes both link are down.
> 
> -_It is not TB related_, nor cabling related, we see it on 
> Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
> but on Skylake-3node more than half of the tests.
> 
> -Checking state during test reveals that interfaces are link-down 
> (show
> int) so "sw_interface_dump" is reporting state correctly.
> 
> -Doing CLI during test "set interface state . up" does bring 
> interfaces UP -> (but it is hard to check the timing here).
> 
> -Affected are mostly x520 and x710, but that is most probably because 
> of statistics (low coverage of other NICs like xxv710 and xl710).
> 
> -We have seen this in master vpp as well as rc2 vpp.
> 
> -It is not clear when this starts to happen, so bisecting would take 
> lot of time.
> 
> -This was spotted on VIRL as well also on Memif interface which bring 
> us to suspicious that this is related to API not HW.
> 
> Do you have an idea what we could check further? VPP is not crashing 
> so no core dump are available. This issue is not 100% replicable which 
> makes it hard to debug.
> 
> Is there a way to get more verbose error from the api call mentioned 
> to reveal more information?
> 
> **
> 
> Thank you.
> 
> *Peter Mikus*
> Engineer - Software
> 
> *Cisco Systems Limited*
> 
> http://www.cisco.com/web/europe/images/email/signature/logo05.jpg
> 
> Think before you print.
> 
> This email may contain confidential and privileged material for the 
> sole use of the intended recipient. Any review, use, distribution or 
> disclosure by others is strictly prohibited. If you are not the 
> intended recipient (or authorized to receive for the recipient), 
> please contact the sender by reply email and delete all copies of this 
> message.
> 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#9967): https://lists.fd.io/g/vpp-dev/message/9967
> Mute This Topic: https://lists.fd.io/mt/23857615/675355
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [m...@ashroe.eu]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9983): https://lists.fd.io/g/vpp-dev/message/9983
Mute This Topic: https://lists.fd.io/mt/23857615/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-07-31 Thread Ray Kinsella

Hi Peter,

It may be unrelated, but I think we see this issue also pretty regularly 
with FD.io VPP 18.04 and the x520, on our local test rig.


The error we typically see is "VAT command sw_interface_set_flags 
sw_if_index 1 admin-up: no JSON data.VAT".


Do think it is the same or a separate issue?

Ray K


On 30/07/2018 08:02, Peter Mikus via Lists.Fd.Io wrote:

Hello vpp-dev,

I am looking for consultation. We started to test VPP for report on all 
LF CSIT testbeds Skylakes and Haswells.


We are observing weird behavior. In each test we are using sequence to 
first bring the both interfaces (physical up) by VAT:


   sw_interface_set_flags sw_if_index  admin-up (I also 
tried sw_interface_set_flags sw_if_index idx admin-up link-up)


After setting all interfaces UP we are testing if interfaces are really 
UP by VAT (loop 30times, 1s between API call check): “sw_interface_dump”.


It wasn’t an issue in past but recently we start seeing that 
sw_interface_dump is reporting interfaces as link_down (admin-up).


Notes/symptoms:

-Our sw_interface_dump check is running 30x (1s interval) in loop.

-Link-down is random, sometimes both interfaces are link-up sometimes 
just one and sometimes both link are down.


-_It is not TB related_, nor cabling related, we see it on 
Haswells-3node in like 1 out of 70 tests, Skylakes-2node 1 out of 70, 
but on Skylake-3node more than half of the tests.


-Checking state during test reveals that interfaces are link-down (show 
int) so “sw_interface_dump” is reporting state correctly.


-Doing CLI during test “set interface state … up” does bring interfaces 
UP -> (but it is hard to check the timing here).


-Affected are mostly x520 and x710, but that is most probably because of 
statistics (low coverage of other NICs like xxv710 and xl710).


-We have seen this in master vpp as well as rc2 vpp.

-It is not clear when this starts to happen, so bisecting would take lot 
of time.


-This was spotted on VIRL as well also on Memif interface which bring us 
to suspicious that this is related to API not HW.


Do you have an idea what we could check further? VPP is not crashing so 
no core dump are available. This issue is not 100% replicable which 
makes it hard to debug.


Is there a way to get more verbose error from the api call mentioned to 
reveal more information?


**

Thank you.

*Peter Mikus*
Engineer – Software

*Cisco Systems Limited*

http://www.cisco.com/web/europe/images/email/signature/logo05.jpg

Think before you print.

This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review, use, distribution or 
disclosure by others is strictly prohibited. If you are not the intended 
recipient (or authorized to receive for the recipient), please contact 
the sender by reply email and delete all copies of this message.


For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9967): https://lists.fd.io/g/vpp-dev/message/9967
Mute This Topic: https://lists.fd.io/mt/23857615/675355
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [m...@ashroe.eu]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9981): https://lists.fd.io/g/vpp-dev/message/9981
Mute This Topic: https://lists.fd.io/mt/23857615/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-07-30 Thread Dave Barach via Lists.Fd.Io
Before doing anything else: please revert to the previous DPDK version and see 
if the issue vanishes.

From: vpp-dev@lists.fd.io  On Behalf Of Peter Mikus via 
Lists.Fd.Io
Sent: Monday, July 30, 2018 3:02 AM
To: vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello vpp-dev,

I am looking for consultation. We started to test VPP for report on all LF CSIT 
testbeds Skylakes and Haswells.
We are observing weird behavior. In each test we are using sequence to first 
bring the both interfaces (physical up) by VAT:

  sw_interface_set_flags sw_if_index  admin-up (I also tried 
sw_interface_set_flags sw_if_index idx admin-up link-up)

After setting all interfaces UP we are testing if interfaces are really UP by 
VAT (loop 30times, 1s between API call check): "sw_interface_dump".
It wasn't an issue in past but recently we start seeing that sw_interface_dump 
is reporting interfaces as link_down (admin-up).

Notes/symptoms:
-   Our sw_interface_dump check is running 30x (1s interval) in loop.
-   Link-down is random, sometimes both interfaces are link-up sometimes just 
one and sometimes both link are down.
-   It is not TB related, nor cabling related, we see it on Haswells-3node in 
like 1 out of 70 tests, Skylakes-2node 1 out of 70, but on Skylake-3node more 
than half of the tests.
-   Checking state during test reveals that interfaces are link-down (show int) 
so "sw_interface_dump" is reporting state correctly.
-   Doing CLI during test "set interface state ... up" does bring interfaces UP 
-> (but it is hard to check the timing here).
-   Affected are mostly x520 and x710, but that is most probably because of 
statistics (low coverage of other NICs like xxv710 and xl710).
-   We have seen this in master vpp as well as rc2 vpp.
-   It is not clear when this starts to happen, so bisecting would take lot of 
time.
-   This was spotted on VIRL as well also on Memif interface which bring us to 
suspicious that this is related to API not HW.

Do you have an idea what we could check further? VPP is not crashing so no core 
dump are available. This issue is not 100% replicable which makes it hard to 
debug.

Is there a way to get more verbose error from the api call mentioned to reveal 
more information?

Thank you.

Peter Mikus
Engineer - Software
Cisco Systems Limited
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9972): https://lists.fd.io/g/vpp-dev/message/9972
Mute This Topic: https://lists.fd.io/mt/23857615/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

2018-07-30 Thread Peter Mikus via Lists.Fd.Io
Hello vpp-dev,

I am looking for consultation. We started to test VPP for report on all LF CSIT 
testbeds Skylakes and Haswells.
We are observing weird behavior. In each test we are using sequence to first 
bring the both interfaces (physical up) by VAT:

  sw_interface_set_flags sw_if_index  admin-up (I also tried 
sw_interface_set_flags sw_if_index idx admin-up link-up)

After setting all interfaces UP we are testing if interfaces are really UP by 
VAT (loop 30times, 1s between API call check): "sw_interface_dump".
It wasn't an issue in past but recently we start seeing that sw_interface_dump 
is reporting interfaces as link_down (admin-up).

Notes/symptoms:
-   Our sw_interface_dump check is running 30x (1s interval) in loop.
-   Link-down is random, sometimes both interfaces are link-up sometimes just 
one and sometimes both link are down.
-   It is not TB related, nor cabling related, we see it on Haswells-3node in 
like 1 out of 70 tests, Skylakes-2node 1 out of 70, but on Skylake-3node more 
than half of the tests.
-   Checking state during test reveals that interfaces are link-down (show int) 
so "sw_interface_dump" is reporting state correctly.
-   Doing CLI during test "set interface state ... up" does bring interfaces UP 
-> (but it is hard to check the timing here).
-   Affected are mostly x520 and x710, but that is most probably because of 
statistics (low coverage of other NICs like xxv710 and xl710).
-   We have seen this in master vpp as well as rc2 vpp.
-   It is not clear when this starts to happen, so bisecting would take lot of 
time.
-   This was spotted on VIRL as well also on Memif interface which bring us to 
suspicious that this is related to API not HW.

Do you have an idea what we could check further? VPP is not crashing so no core 
dump are available. This issue is not 100% replicable which makes it hard to 
debug.

Is there a way to get more verbose error from the api call mentioned to reveal 
more information?

Thank you.

Peter Mikus
Engineer - Software
Cisco Systems Limited
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9967): https://lists.fd.io/g/vpp-dev/message/9967
Mute This Topic: https://lists.fd.io/mt/23857615/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-