Re: [ovs-discuss] [openvswitch 2.10.0+2018.08.28+git.e0cea85314+ds2] testsuite: 975 2347 2482 2483 2633 failed

2018-09-05 Thread Ben Pfaff
On Wed, Sep 05, 2018 at 01:50:06PM +0200, Thomas Goirand wrote:
> On 09/04/2018 11:06 PM, Ben Pfaff wrote:
> > On Tue, Sep 04, 2018 at 09:20:45AM +0200, Thomas Goirand wrote:
> >> On 09/02/2018 03:12 AM, Justin Pettit wrote:
> >>>
>  On Sep 1, 2018, at 3:52 PM, Ben Pfaff  wrote:
> 
>  On Sat, Sep 01, 2018 at 01:23:32PM -0700, Justin Pettit wrote:
> >
> >> On Sep 1, 2018, at 12:21 PM, Thomas Goirand  wrote:
> >>
> >>
> >> The only one failure:
> >>
> >> 2633: ovn -- ACL rate-limited logging FAILED 
> >> (ovn.at:6516)
> >
> > My guess if that this is meter-related. Can you send the 
> > ovs-vswitchd.log and testsuite.log so I can take a look?
> 
>  It probably hasn't changed from what he sent the first time around.
> >>>
> >>> Yes, "testsuite.log" was in the original message, so I don't need that.  
> >>> Thomas, can you send me "ovs-vswitchd.log" and "ovn-controller.log"?  
> >>> Does it consistently fail for you?
> >>>
> >>> --Justin
> >>
> >> Hi,
> >>
> >> As I blacklisted the above test, I uploaded to Sid, and now there's a
> >> number of failures on non-intel arch:
> >>
> >> https://buildd.debian.org/status/package.php?p=openvswitch
> >> https://buildd.debian.org/status/logs.php?pkg=openvswitch
> >>
> >> Ben, Justin, can you help me fix all of this?
> > 
> > Thanks for passing that along.
> > 
> > A lot of these failures seem to involve unexpected timeouts.  I wonder
> > whether the buildds are so overloaded that some of the 10-second
> > timeouts in the testsuite are just too short.  Usually, this is a
> > generous timeout interval.
> > 
> > I sent a patch that should help to debug the problem by doing more logging:
> > https://patchwork.ozlabs.org/patch/966087/
> > 
> > It won't help with tests that fully succeed, because the logs by default
> > are discarded, but for tests that have a sequence of waits, in which one
> > eventually fails, it will allow us to see how long the successful waits
> > took.
> > 
> > Any chance you could apply that patch and try another build?  Feel free
> > to wait for review, if you prefer.
> > 
> 
> Hi,
> 
> I've just uploaded OVS with that patch. Thanks, I think it's a very good
> idea. And indeed, it looks like failing arch are the slower ones.

I'm pretty pleased with the theory myself, but the results tend to show
that it wasn't the problem.  In most of the tests that eventually
failed, the wait failure was preceded by other waits that succeeded
immediately, and the longest wait I see is 3 seconds.  I'll look for
other possible causes.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-09-05 Thread Han Zhou
On Wed, Sep 5, 2018 at 10:44 AM aginwala  wrote:
>
> Thanks Numan:
>
> I will give it shot and update the findings.
>
>
> On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique  wrote:
>>
>>
>>
>> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou  wrote:
>>>
>>>
>>>
>>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique 
wrote:
>>> >
>>> >
>>> >
>>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff  wrote:
>>> >>
>>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote:
>>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala  wrote:
>>> >> > >
>>> >> > >
>>> >> > > To add on , we are using LB VIP IP and no constraint with 3
nodes as Han
>>> >> > mentioned earlier where active node  have syncs from invalid IP
and rest
>>> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs
from one
>>> >> > node  that triggered:
>>> >> >
https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
>>> >> > >
>>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:
10.189.208.16:50686:
>>> >> > entering RECONNECT
>>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
>>> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound
database due to
>>> >> > server termination)
>>> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
>>> >> > 10.189.208.21:56160: disconnecting (removing _Server database due
to server
>>> >> > termination)
>>> >> > > 20
>>> >> > >
>>> >> > > I am not sure if sync_from on active node too via some invalid
ip is
>>> >> > causing some flaw when all are down during the race condition in
this
>>> >> > corner case.
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique <
nusid...@redhat.com> wrote:
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff  wrote:
>>> >> > >>>
>>> >> > >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote:
>>> >> > >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff 
wrote:
>>> >> > >>> > >
>>> >> > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote:
>>> >> > >>> > > > Hi,
>>> >> > >>> > > >
>>> >> > >>> > > > We found an issue in our testing (thanks aginwala) with
>>> >> > active-backup
>>> >> > >>> > mode
>>> >> > >>> > > > in OVN setup.
>>> >> > >>> > > > In the 3 node setup with pacemaker, after stopping
pacemaker on
>>> >> > all
>>> >> > >>> > three
>>> >> > >>> > > > nodes (simulate a complete shutdown), and then if
starting all of
>>> >> > them
>>> >> > >>> > > > simultaneously, there is a good chance that the whole DB
content
>>> >> > gets
>>> >> > >>> > lost.
>>> >> > >>> > > >
>>> >> > >>> > > > After studying the replication code, it seems there is a
phase
>>> >> > that the
>>> >> > >>> > > > backup node deletes all its data and wait for data to be
synced
>>> >> > from the
>>> >> > >>> > > > active node:
>>> >> > >>> > > >
>>> >> >
https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306
>>> >> > >>> > > >
>>> >> > >>> > > > At this state, if the node was set to active, then all
data is
>>> >> > gone for
>>> >> > >>> > the
>>> >> > >>> > > > whole cluster. This can happen in different situations.
In the
>>> >> > test
>>> >> > >>> > > > scenario mentioned above it is very likely to happen,
since
>>> >> > pacemaker
>>> >> > >>> > just
>>> >> > >>> > > > randomly select one as master, not knowing the internal
sync
>>> >> > state of
>>> >> > >>> > each
>>> >> > >>> > > > node. It could also happen when failover happens right
after a new
>>> >> > >>> > backup
>>> >> > >>> > > > is started, although less likely in real environment, so
starting
>>> >> > up
>>> >> > >>> > node
>>> >> > >>> > > > one by one may largely reduce the probability.
>>> >> > >>> > > >
>>> >> > >>> > > > Does this analysis make sense? We will do more tests to
verify the
>>> >> > >>> > > > conclusion, but would like to share with community for
>>> >> > discussions and
>>> >> > >>> > > > suggestions. Once this happens it is very critical -
even more
>>> >> > serious
>>> >> > >>> > than
>>> >> > >>> > > > just no HA. Without HA it is just control plane outage,
but this
>>> >> > would
>>> >> > >>> > be
>>> >> > >>> > > > data plane outage because OVS flows will be removed
accordingly
>>> >> > since
>>> >> > >>> > the
>>> >> > >>> > > > data is considered as deleted from ovn-controller point
of view.
>>> >> > >>> > > >
>>> >> > >>> > > > We understand that active-standby is not the ideal HA
mechanism
>>> >> > and
>>> >> > >>> > > > clustering is the future, and we are also testing the
clustering
>>> >> > with
>>> >> > >>> > the
>>> >> > >>> > > > latest patch. But it would be good if this problem can be
>>> >> > addressed with
>>> >> > >>> > > > some quick fix, such as keep a copy of the old data
somewhere
>>> >> > until the
>>> >> > >>> > > > first sync finishes?
>>> >> > >>> > >
>>> >> > >>> > > This does seem like a plausible bug, and at first glance I
believe
>>> >> > that
>>> >> > >>> > > you're correct about the race here.  I 

Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-09-05 Thread aginwala
Thanks Numan:

I will give it shot and update the findings.


On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique  wrote:

>
>
> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou  wrote:
>
>>
>>
>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique 
>> wrote:
>> >
>> >
>> >
>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff  wrote:
>> >>
>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote:
>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala  wrote:
>> >> > >
>> >> > >
>> >> > > To add on , we are using LB VIP IP and no constraint with 3 nodes
>> as Han
>> >> > mentioned earlier where active node  have syncs from invalid IP and
>> rest
>> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs
>> from one
>> >> > node  that triggered:
>> >> >
>> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
>> >> > >
>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:
>> 10.189.208.16:50686:
>> >> > entering RECONNECT
>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
>> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound
>> database due to
>> >> > server termination)
>> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
>> >> > 10.189.208.21:56160: disconnecting (removing _Server database due
>> to server
>> >> > termination)
>> >> > > 20
>> >> > >
>> >> > > I am not sure if sync_from on active node too via some invalid ip
>> is
>> >> > causing some flaw when all are down during the race condition in this
>> >> > corner case.
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique 
>> wrote:
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff  wrote:
>> >> > >>>
>> >> > >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote:
>> >> > >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff 
>> wrote:
>> >> > >>> > >
>> >> > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote:
>> >> > >>> > > > Hi,
>> >> > >>> > > >
>> >> > >>> > > > We found an issue in our testing (thanks aginwala) with
>> >> > active-backup
>> >> > >>> > mode
>> >> > >>> > > > in OVN setup.
>> >> > >>> > > > In the 3 node setup with pacemaker, after stopping
>> pacemaker on
>> >> > all
>> >> > >>> > three
>> >> > >>> > > > nodes (simulate a complete shutdown), and then if starting
>> all of
>> >> > them
>> >> > >>> > > > simultaneously, there is a good chance that the whole DB
>> content
>> >> > gets
>> >> > >>> > lost.
>> >> > >>> > > >
>> >> > >>> > > > After studying the replication code, it seems there is a
>> phase
>> >> > that the
>> >> > >>> > > > backup node deletes all its data and wait for data to be
>> synced
>> >> > from the
>> >> > >>> > > > active node:
>> >> > >>> > > >
>> >> >
>> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306
>> >> > >>> > > >
>> >> > >>> > > > At this state, if the node was set to active, then all
>> data is
>> >> > gone for
>> >> > >>> > the
>> >> > >>> > > > whole cluster. This can happen in different situations. In
>> the
>> >> > test
>> >> > >>> > > > scenario mentioned above it is very likely to happen, since
>> >> > pacemaker
>> >> > >>> > just
>> >> > >>> > > > randomly select one as master, not knowing the internal
>> sync
>> >> > state of
>> >> > >>> > each
>> >> > >>> > > > node. It could also happen when failover happens right
>> after a new
>> >> > >>> > backup
>> >> > >>> > > > is started, although less likely in real environment, so
>> starting
>> >> > up
>> >> > >>> > node
>> >> > >>> > > > one by one may largely reduce the probability.
>> >> > >>> > > >
>> >> > >>> > > > Does this analysis make sense? We will do more tests to
>> verify the
>> >> > >>> > > > conclusion, but would like to share with community for
>> >> > discussions and
>> >> > >>> > > > suggestions. Once this happens it is very critical - even
>> more
>> >> > serious
>> >> > >>> > than
>> >> > >>> > > > just no HA. Without HA it is just control plane outage,
>> but this
>> >> > would
>> >> > >>> > be
>> >> > >>> > > > data plane outage because OVS flows will be removed
>> accordingly
>> >> > since
>> >> > >>> > the
>> >> > >>> > > > data is considered as deleted from ovn-controller point of
>> view.
>> >> > >>> > > >
>> >> > >>> > > > We understand that active-standby is not the ideal HA
>> mechanism
>> >> > and
>> >> > >>> > > > clustering is the future, and we are also testing the
>> clustering
>> >> > with
>> >> > >>> > the
>> >> > >>> > > > latest patch. But it would be good if this problem can be
>> >> > addressed with
>> >> > >>> > > > some quick fix, such as keep a copy of the old data
>> somewhere
>> >> > until the
>> >> > >>> > > > first sync finishes?
>> >> > >>> > >
>> >> > >>> > > This does seem like a plausible bug, and at first glance I
>> believe
>> >> > that
>> >> > >>> > > you're correct about the race here.  I guess that the correct
>> >> > behavior
>> >> > >>> > > must be to keep the original data until a new copy of 

Re: [ovs-discuss] Geneve remote_ip as flow for OVN hosts

2018-09-05 Thread Girish Moodalbail
Hello all,

I would like to add more context here. In the diagram below

+--+
|ovn-host  |
|  |
|  |
|   +-+|
|   | br-int  ||
|   ++-+--+|
|| |   |
| +--v-+   +---v+  |
| | geneve |   | geneve |  |
| +--+-+   +---++  |
|| |   |
|  +-v+ +--v---+   |
|  | IP0  | | IP1  |   |
|  +--+ +--+   |
+--+ eth0 +-+ eth1 +---+
   +--+ +--+

eth0 and eth are, say, in its own physical segments. The VMs that are
instantiated in the above ovn-host will have multiple interfaces and each
of those interface need to be on a different Geneve VTEP.

I think the following entry in OVN TODOs (
https://github.com/openvswitch/ovs/blob/master/ovn/TODO.rst)

---8<--8<---
Support multiple tunnel encapsulations in Chassis.

So far, both ovn-controller and ovn-controller-vtep only allow chassis to
have one tunnel encapsulation entry. We should extend the implementation to
support multiple tunnel encapsulations
---8<--8<---

captures the above requirement. Is that the case?

Thanks again.

Regards,
~Girish




On Tue, Sep 4, 2018 at 3:00 PM Girish Moodalbail 
wrote:

> Hello all,
>
> Is it possible to configure remote_ip as a 'flow' instead of an IP address
> (i.e., setting ovn-encap-ip to a single IP address)?
>
> Today, we have one VTEP endpoint per OVN host and all the VMs that
> connects to br-int  on that OVN host are reachable behind this VTEP
> endpoint. Is it possible to have multiple VTEP endpoints for a br-int
> bridge and use Open Flow flows to select one of the VTEP endpoint?
>
>
> +--+
> |ovn-host  |
> |  |
> |  |
> |   +-+|
> |   | br-int  ||
> |   ++-+--+|
> || |   |
> | +--v-+   +---v+  |
> | | geneve |   | geneve |  |
> | +--+-+   +---++  |
> || |   |
> |  +-v+ +--v---+   |
> |  | IP0  | | IP1  |   |
> |  +--+ +--+   |
> +--+ eth0 +-+ eth1 +---+
>+--+ +--+
>
> Also, we don't want to bond eth0 and eth1 into a bond interface and then
> use bond's IP as VTEP endpoint.
>
> Thanks in advance,
> ~Girish
>
>
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-09-05 Thread Numan Siddique
On Wed, Sep 5, 2018 at 12:42 AM Han Zhou  wrote:

>
>
> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique 
> wrote:
> >
> >
> >
> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff  wrote:
> >>
> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote:
> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala  wrote:
> >> > >
> >> > >
> >> > > To add on , we are using LB VIP IP and no constraint with 3 nodes
> as Han
> >> > mentioned earlier where active node  have syncs from invalid IP and
> rest
> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs from
> one
> >> > node  that triggered:
> >> >
> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
> >> > >
> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:
> 10.189.208.16:50686:
> >> > entering RECONNECT
> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound database
> due to
> >> > server termination)
> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
> >> > 10.189.208.21:56160: disconnecting (removing _Server database due to
> server
> >> > termination)
> >> > > 20
> >> > >
> >> > > I am not sure if sync_from on active node too via some invalid ip is
> >> > causing some flaw when all are down during the race condition in this
> >> > corner case.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique 
> wrote:
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff  wrote:
> >> > >>>
> >> > >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote:
> >> > >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff  wrote:
> >> > >>> > >
> >> > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote:
> >> > >>> > > > Hi,
> >> > >>> > > >
> >> > >>> > > > We found an issue in our testing (thanks aginwala) with
> >> > active-backup
> >> > >>> > mode
> >> > >>> > > > in OVN setup.
> >> > >>> > > > In the 3 node setup with pacemaker, after stopping
> pacemaker on
> >> > all
> >> > >>> > three
> >> > >>> > > > nodes (simulate a complete shutdown), and then if starting
> all of
> >> > them
> >> > >>> > > > simultaneously, there is a good chance that the whole DB
> content
> >> > gets
> >> > >>> > lost.
> >> > >>> > > >
> >> > >>> > > > After studying the replication code, it seems there is a
> phase
> >> > that the
> >> > >>> > > > backup node deletes all its data and wait for data to be
> synced
> >> > from the
> >> > >>> > > > active node:
> >> > >>> > > >
> >> >
> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306
> >> > >>> > > >
> >> > >>> > > > At this state, if the node was set to active, then all data
> is
> >> > gone for
> >> > >>> > the
> >> > >>> > > > whole cluster. This can happen in different situations. In
> the
> >> > test
> >> > >>> > > > scenario mentioned above it is very likely to happen, since
> >> > pacemaker
> >> > >>> > just
> >> > >>> > > > randomly select one as master, not knowing the internal sync
> >> > state of
> >> > >>> > each
> >> > >>> > > > node. It could also happen when failover happens right
> after a new
> >> > >>> > backup
> >> > >>> > > > is started, although less likely in real environment, so
> starting
> >> > up
> >> > >>> > node
> >> > >>> > > > one by one may largely reduce the probability.
> >> > >>> > > >
> >> > >>> > > > Does this analysis make sense? We will do more tests to
> verify the
> >> > >>> > > > conclusion, but would like to share with community for
> >> > discussions and
> >> > >>> > > > suggestions. Once this happens it is very critical - even
> more
> >> > serious
> >> > >>> > than
> >> > >>> > > > just no HA. Without HA it is just control plane outage, but
> this
> >> > would
> >> > >>> > be
> >> > >>> > > > data plane outage because OVS flows will be removed
> accordingly
> >> > since
> >> > >>> > the
> >> > >>> > > > data is considered as deleted from ovn-controller point of
> view.
> >> > >>> > > >
> >> > >>> > > > We understand that active-standby is not the ideal HA
> mechanism
> >> > and
> >> > >>> > > > clustering is the future, and we are also testing the
> clustering
> >> > with
> >> > >>> > the
> >> > >>> > > > latest patch. But it would be good if this problem can be
> >> > addressed with
> >> > >>> > > > some quick fix, such as keep a copy of the old data
> somewhere
> >> > until the
> >> > >>> > > > first sync finishes?
> >> > >>> > >
> >> > >>> > > This does seem like a plausible bug, and at first glance I
> believe
> >> > that
> >> > >>> > > you're correct about the race here.  I guess that the correct
> >> > behavior
> >> > >>> > > must be to keep the original data until a new copy of the
> data has
> >> > been
> >> > >>> > > received, and only then atomically replace the original by
> the new.
> >> > >>> > >
> >> > >>> > > Is this something you have time and ability to fix?
> >> > >>> >
> >> > >>> > Thanks Ben for quick response. I guess I will not have time

Re: [ovs-discuss] [openvswitch 2.10.0+2018.08.28+git.e0cea85314+ds2] testsuite: 975 2347 2482 2483 2633 failed

2018-09-05 Thread Thomas Goirand
On 09/04/2018 11:06 PM, Ben Pfaff wrote:
> On Tue, Sep 04, 2018 at 09:20:45AM +0200, Thomas Goirand wrote:
>> On 09/02/2018 03:12 AM, Justin Pettit wrote:
>>>
 On Sep 1, 2018, at 3:52 PM, Ben Pfaff  wrote:

 On Sat, Sep 01, 2018 at 01:23:32PM -0700, Justin Pettit wrote:
>
>> On Sep 1, 2018, at 12:21 PM, Thomas Goirand  wrote:
>>
>>
>> The only one failure:
>>
>> 2633: ovn -- ACL rate-limited logging FAILED 
>> (ovn.at:6516)
>
> My guess if that this is meter-related. Can you send the ovs-vswitchd.log 
> and testsuite.log so I can take a look?

 It probably hasn't changed from what he sent the first time around.
>>>
>>> Yes, "testsuite.log" was in the original message, so I don't need that.  
>>> Thomas, can you send me "ovs-vswitchd.log" and "ovn-controller.log"?  Does 
>>> it consistently fail for you?
>>>
>>> --Justin
>>
>> Hi,
>>
>> As I blacklisted the above test, I uploaded to Sid, and now there's a
>> number of failures on non-intel arch:
>>
>> https://buildd.debian.org/status/package.php?p=openvswitch
>> https://buildd.debian.org/status/logs.php?pkg=openvswitch
>>
>> Ben, Justin, can you help me fix all of this?
> 
> Thanks for passing that along.
> 
> A lot of these failures seem to involve unexpected timeouts.  I wonder
> whether the buildds are so overloaded that some of the 10-second
> timeouts in the testsuite are just too short.  Usually, this is a
> generous timeout interval.
> 
> I sent a patch that should help to debug the problem by doing more logging:
> https://patchwork.ozlabs.org/patch/966087/
> 
> It won't help with tests that fully succeed, because the logs by default
> are discarded, but for tests that have a sequence of waits, in which one
> eventually fails, it will allow us to see how long the successful waits
> took.
> 
> Any chance you could apply that patch and try another build?  Feel free
> to wait for review, if you prefer.
> 

Hi,

I've just uploaded OVS with that patch. Thanks, I think it's a very good
idea. And indeed, it looks like failing arch are the slower ones.

Cheers,

Thomas Goirand (zigo)
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Regarding kernel module debugging

2018-09-05 Thread Vikas Kumar
hello everyone,
i am new to kernel module programming . can any one please tell me, how we
can debug the datapth kernel modules eg. openvswitch.ko.
actually i wanted to get a real feeling, that how the things move in ovs.
it will be great if you could list our the tool name and steps to debug the
kernel module.

Thanks
vikash
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss