Re: [openstack-dev] Top Gate Bugs

2013-12-06 Thread Matt Riedemann



On Wednesday, December 04, 2013 7:22:23 AM, Joe Gordon wrote:

TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron
and tempest. We need help fixing these bugs.


Hi All,

Before going any further we have a bug that is affecting gate and
stable, so its getting top priority here. elastic-recheck currently
doesn't track unit tests because we don't expect them to fail very
often. Turns out that assessment was wrong, we now have a nova py27
unit test bug in gate and stable gate.

https://bugs.launchpad.net/nova/+bug/1216851
Title: nova unit tests occasionally fail migration tests for mysql and
postgres
Hits
  FAILURE: 74
The failures appear multiple times for a single job, and some of those
are due to bad patches in the check queue.  But this is being seen in
stable and trunk gate so something is definitely wrong.

===


Its time for another edition of of 'Top Gate Bugs.'  I am sending this
out now because in addition to our usual gate bugs a few new ones have
cropped up recently, and as we saw a few weeks ago it doesn't take
very many new bugs to wedge the gate.

Currently the gate has a failure rate of at least 23%! [0]

Note: this email was generated with
http://status.openstack.org/elastic-recheck/ and
'elastic-recheck-success' [1]

1) https://bugs.launchpad.net/bugs/1253896
Title: test_minimum_basic_scenario fails with SSHException: Error
reading SSH protocol banner
Projects:  neutron, nova, tempest
Hits
  FAILURE: 324
This one has been around for several weeks now and although we have
made some attempts at fixing this, we aren't any closer at resolving
this then we were a few weeks ago.

2) https://bugs.launchpad.net/bugs/1251448
Title: BadRequest: Multiple possible networks found, use a Network ID
to be more specific.
Project: neutron
Hits
  FAILURE: 141

3) https://bugs.launchpad.net/bugs/1249065
Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py
Project: nova
Hits
  FAILURE: 112
This is a bug in nova's neutron code.

4) https://bugs.launchpad.net/bugs/1250168
Title: gate-tempest-devstack-vm-neutron-large-ops is failing
Projects: neutron, nova
Hits
  FAILURE: 94
This is an old bug that was fixed, but came back on December 3rd. So
this is a recent regression. This may be an infra issue.

5) https://bugs.launchpad.net/bugs/1210483
Title: ServerAddressesTestXML.test_list_server_addresses FAIL
Projects: neutron, nova
Hits
  FAILURE: 73
This has had some attempts made at fixing it but its still around.


In addition to the existing bugs, we have some new bugs on the rise:

1) https://bugs.launchpad.net/bugs/1257626
Title: Timeout while waiting on RPC response - topic: network, RPC
method: allocate_for_instance info: unknown
Project: nova
Hits
  FAILURE: 52
large-ops only bug. This has been around for at least two weeks, but
we have seen this in higher numbers starting around December 3rd. This
may  be an infrastructure issue as the neutron-large-ops started
failing more around the same time.

2) https://bugs.launchpad.net/bugs/1257641
Title: Quota exceeded for instances: Requested 1, but already used 10
of 10 instances
Projects: nova, tempest
Hits
  FAILURE: 41
Like the previous bug, this has been around for at least two weeks but
appears to be on the rise.



Raw Data: http://paste.openstack.org/show/54419/


best,
Joe


[0] failure rate = 1-(success rate gate-tempest-dsvm-neutron)*(success
rate ...) * ...

gate-tempest-dsvm-neutron = 0.00
gate-tempest-dsvm-neutron-large-ops = 11.11
gate-tempest-dsvm-full = 11.11
gate-tempest-dsvm-large-ops = 4.55
gate-tempest-dsvm-postgres-full = 10.00
gate-grenade-dsvm = 0.00

(I hope I got the math right here)

[1]
http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/elastic_recheck/cmd/check_success.py


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Let's add bug 1257644 [1] to the list.  I'm pretty sure this is due to 
some recent code [2][3] in the nova libvirt driver that is 
automatically disabling the host when the libvirt connection drops.


Joe said there was a known issue with libvirt connection failures so 
this could be duped against that, but I'm not sure where/what that one 
is - maybe bug 1254872 [4]?


Unless I just don't understand the code, there is some funny logic 
going on in the libvirt driver when it's automatically disabling a host 
which I've documented in bug 1257644.  It would help to have some 
libvirt-minded people helping to look at that, or the authors/approvers 
of those patches.


Also, does anyone know if libvirt will pass a 'reason' string to the 
_close_callback function?  I was digging through the libvirt code this 
morning but couldn't figure out where the callback is actually called 
and with what parameters.  The code in nova seemed to just be based on 
the patch that danpb had in libvirt [5].


This bug is going to raise a bigger long-term question 

Re: [openstack-dev] Top Gate Bugs

2013-12-06 Thread Davanum Srinivas
Joe,

Looks like we may be a bit more stable now?

Short URL: http://bit.ly/18qq4q2

Long URL : 
http://graphite.openstack.org/graphlot/?from=-120houruntil=-0hourtarget=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-postgres-full'),'ED9121')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00F0F0')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron'),'00FF00')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.{S
 
UCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00c868')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-grenade-dsvm.SUCCESS,sum(stats.zuul.pipeline.check.job.check-grenade-dsvm.{SUCCESS,FAILURE})),'6hours'),%20'check-grenade-dsvm'),'800080')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'E080FF')

-- dims


On Fri, Dec 6, 2013 at 11:28 AM, Matt Riedemann
mrie...@linux.vnet.ibm.com wrote:


 On Wednesday, December 04, 2013 7:22:23 AM, Joe Gordon wrote:

 TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron
 and tempest. We need help fixing these bugs.


 Hi All,

 Before going any further we have a bug that is affecting gate and
 stable, so its getting top priority here. elastic-recheck currently
 doesn't track unit tests because we don't expect them to fail very
 often. Turns out that assessment was wrong, we now have a nova py27
 unit test bug in gate and stable gate.

 https://bugs.launchpad.net/nova/+bug/1216851
 Title: nova unit tests occasionally fail migration tests for mysql and
 postgres
 Hits
   FAILURE: 74
 The failures appear multiple times for a single job, and some of those
 are due to bad patches in the check queue.  But this is being seen in
 stable and trunk gate so something is definitely wrong.

 ===


 Its time for another edition of of 'Top Gate Bugs.'  I am sending this
 out now because in addition to our usual gate bugs a few new ones have
 cropped up recently, and as we saw a few weeks ago it doesn't take
 very many new bugs to wedge the gate.

 Currently the gate has a failure rate of at least 23%! [0]

 Note: this email was generated with
 http://status.openstack.org/elastic-recheck/ and
 'elastic-recheck-success' [1]

 1) https://bugs.launchpad.net/bugs/1253896
 Title: test_minimum_basic_scenario fails with SSHException: Error
 reading SSH protocol banner
 Projects:  neutron, nova, tempest
 Hits
   FAILURE: 324
 This one has been around for several weeks now and although we have
 made some attempts at fixing this, we aren't any closer at resolving
 this then we were a few weeks ago.

 2) https://bugs.launchpad.net/bugs/1251448
 Title: BadRequest: Multiple possible networks found, use a Network ID
 to be more specific.
 Project: neutron
 Hits
   FAILURE: 141

 3) https://bugs.launchpad.net/bugs/1249065
 Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py
 Project: nova
 Hits
   FAILURE: 112
 This is a bug in nova's neutron code.

 4) https://bugs.launchpad.net/bugs/1250168
 Title: gate-tempest-devstack-vm-neutron-large-ops is failing
 Projects: neutron, nova
 Hits
   FAILURE: 94
 This is an old bug that was fixed, but came back on December 3rd. So
 this is a recent regression. This may be an infra issue.

 5) https://bugs.launchpad.net/bugs/1210483
 Title: ServerAddressesTestXML.test_list_server_addresses FAIL
 Projects: neutron, nova
 Hits
   FAILURE: 73
 This has had some attempts made at fixing it but its still around.


 In addition to the existing bugs, we have some new bugs on the rise:

 1) https://bugs.launchpad.net/bugs/1257626
 Title: Timeout while waiting on RPC response - topic: network, RPC
 method: allocate_for_instance info: unknown
 Project: nova
 Hits
   FAILURE: 52
 large-ops only bug. This has been around for at least two weeks, but
 we have seen this in higher numbers starting around December 3rd. This
 may  be an infrastructure issue as the neutron-large-ops started
 failing more around the same time.

 2) https://bugs.launchpad.net/bugs/1257641
 Title: Quota exceeded for instances: Requested 1, but already used 10
 of 10 instances
 Projects: nova, tempest
 Hits
   FAILURE: 41
 Like the previous bug, this has been around 

Re: [openstack-dev] Top Gate Bugs

2013-12-06 Thread Davanum Srinivas
I had the labels wrong - here's a slightly better link - http://bit.ly/1gdxYeg

On Fri, Dec 6, 2013 at 4:31 PM, Davanum Srinivas dava...@gmail.com wrote:
 Joe,

 Looks like we may be a bit more stable now?

 Short URL: http://bit.ly/18qq4q2

 Long URL : 
 http://graphite.openstack.org/graphlot/?from=-120houruntil=-0hourtarget=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-postgres-full'),'ED9121')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00F0F0')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron'),'00FF00')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.
 
{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00c868')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-grenade-dsvm.SUCCESS,sum(stats.zuul.pipeline.check.job.check-grenade-dsvm.{SUCCESS,FAILURE})),'6hours'),%20'check-grenade-dsvm'),'800080')target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'E080FF')

 -- dims


 On Fri, Dec 6, 2013 at 11:28 AM, Matt Riedemann
 mrie...@linux.vnet.ibm.com wrote:


 On Wednesday, December 04, 2013 7:22:23 AM, Joe Gordon wrote:

 TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron
 and tempest. We need help fixing these bugs.


 Hi All,

 Before going any further we have a bug that is affecting gate and
 stable, so its getting top priority here. elastic-recheck currently
 doesn't track unit tests because we don't expect them to fail very
 often. Turns out that assessment was wrong, we now have a nova py27
 unit test bug in gate and stable gate.

 https://bugs.launchpad.net/nova/+bug/1216851
 Title: nova unit tests occasionally fail migration tests for mysql and
 postgres
 Hits
   FAILURE: 74
 The failures appear multiple times for a single job, and some of those
 are due to bad patches in the check queue.  But this is being seen in
 stable and trunk gate so something is definitely wrong.

 ===


 Its time for another edition of of 'Top Gate Bugs.'  I am sending this
 out now because in addition to our usual gate bugs a few new ones have
 cropped up recently, and as we saw a few weeks ago it doesn't take
 very many new bugs to wedge the gate.

 Currently the gate has a failure rate of at least 23%! [0]

 Note: this email was generated with
 http://status.openstack.org/elastic-recheck/ and
 'elastic-recheck-success' [1]

 1) https://bugs.launchpad.net/bugs/1253896
 Title: test_minimum_basic_scenario fails with SSHException: Error
 reading SSH protocol banner
 Projects:  neutron, nova, tempest
 Hits
   FAILURE: 324
 This one has been around for several weeks now and although we have
 made some attempts at fixing this, we aren't any closer at resolving
 this then we were a few weeks ago.

 2) https://bugs.launchpad.net/bugs/1251448
 Title: BadRequest: Multiple possible networks found, use a Network ID
 to be more specific.
 Project: neutron
 Hits
   FAILURE: 141

 3) https://bugs.launchpad.net/bugs/1249065
 Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py
 Project: nova
 Hits
   FAILURE: 112
 This is a bug in nova's neutron code.

 4) https://bugs.launchpad.net/bugs/1250168
 Title: gate-tempest-devstack-vm-neutron-large-ops is failing
 Projects: neutron, nova
 Hits
   FAILURE: 94
 This is an old bug that was fixed, but came back on December 3rd. So
 this is a recent regression. This may be an infra issue.

 5) https://bugs.launchpad.net/bugs/1210483
 Title: ServerAddressesTestXML.test_list_server_addresses FAIL
 Projects: neutron, nova
 Hits
   FAILURE: 73
 This has had some attempts made at fixing it but its still around.


 In addition to the existing bugs, we have some new bugs on the rise:

 1) https://bugs.launchpad.net/bugs/1257626
 Title: Timeout while waiting on RPC response - topic: network, RPC
 method: allocate_for_instance info: unknown
 Project: nova
 Hits
   FAILURE: 52
 large-ops only bug. This has been around for at least two weeks, but
 we have seen this in higher numbers starting around December 3rd. This
 may  be an infrastructure issue as the neutron-large-ops started
 failing more around the same time.

 2) https://bugs.launchpad.net/bugs/1257641
 Title: Quota 

[openstack-dev] Top Gate Bugs

2013-12-04 Thread Joe Gordon
TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron and
tempest. We need help fixing these bugs.


Hi All,

Before going any further we have a bug that is affecting gate and stable,
so its getting top priority here. elastic-recheck currently doesn't track
unit tests because we don't expect them to fail very often. Turns out that
assessment was wrong, we now have a nova py27 unit test bug in gate and
stable gate.

https://bugs.launchpad.net/nova/+bug/1216851
Title: nova unit tests occasionally fail migration tests for mysql and
postgres
Hits
  FAILURE: 74
The failures appear multiple times for a single job, and some of those are
due to bad patches in the check queue.  But this is being seen in stable
and trunk gate so something is definitely wrong.

===


Its time for another edition of of 'Top Gate Bugs.'  I am sending this out
now because in addition to our usual gate bugs a few new ones have cropped
up recently, and as we saw a few weeks ago it doesn't take very many new
bugs to wedge the gate.

Currently the gate has a failure rate of at least 23%! [0]

Note: this email was generated with
http://status.openstack.org/elastic-recheck/ and 'elastic-recheck-success'
[1]

1) https://bugs.launchpad.net/bugs/1253896
Title: test_minimum_basic_scenario fails with SSHException: Error reading
SSH protocol banner
Projects:  neutron, nova, tempest
Hits
  FAILURE: 324
This one has been around for several weeks now and although we have made
some attempts at fixing this, we aren't any closer at resolving this then
we were a few weeks ago.

2) https://bugs.launchpad.net/bugs/1251448
Title: BadRequest: Multiple possible networks found, use a Network ID to be
more specific.
Project: neutron
Hits
  FAILURE: 141

3) https://bugs.launchpad.net/bugs/1249065
Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py
Project: nova
Hits
  FAILURE: 112
This is a bug in nova's neutron code.

4) https://bugs.launchpad.net/bugs/1250168
Title: gate-tempest-devstack-vm-neutron-large-ops is failing
Projects: neutron, nova
Hits
  FAILURE: 94
This is an old bug that was fixed, but came back on December 3rd. So this
is a recent regression. This may be an infra issue.

5) https://bugs.launchpad.net/bugs/1210483
Title: ServerAddressesTestXML.test_list_server_addresses FAIL
Projects: neutron, nova
Hits
  FAILURE: 73
This has had some attempts made at fixing it but its still around.


In addition to the existing bugs, we have some new bugs on the rise:

1) https://bugs.launchpad.net/bugs/1257626
Title: Timeout while waiting on RPC response - topic: network, RPC
method: allocate_for_instance info: unknown
Project: nova
Hits
  FAILURE: 52
large-ops only bug. This has been around for at least two weeks, but we
have seen this in higher numbers starting around December 3rd. This may  be
an infrastructure issue as the neutron-large-ops started failing more
around the same time.

2) https://bugs.launchpad.net/bugs/1257641
Title: Quota exceeded for instances: Requested 1, but already used 10 of 10
instances
Projects: nova, tempest
Hits
  FAILURE: 41
Like the previous bug, this has been around for at least two weeks but
appears to be on the rise.



Raw Data: http://paste.openstack.org/show/54419/


best,
Joe


[0] failure rate = 1-(success rate gate-tempest-dsvm-neutron)*(success rate
...) * ...

gate-tempest-dsvm-neutron = 0.00
gate-tempest-dsvm-neutron-large-ops = 11.11
gate-tempest-dsvm-full = 11.11
gate-tempest-dsvm-large-ops = 4.55
gate-tempest-dsvm-postgres-full = 10.00
gate-grenade-dsvm = 0.00

(I hope I got the math right here)

[1]
http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/elastic_recheck/cmd/check_success.py
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Top Gate Bugs

2013-11-21 Thread Ken'ichi Ohmichi
Hi Clark,

2013/11/21 Clark Boylan clark.boy...@gmail.com:

 Joe seemed to be on the same track with
 https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:57578,n,z
 but went far enough to revert the change that introduced that test. A
 couple people were going to keep hitting those changes to run them
 through more tests and see if 1251920 goes away.

Thanks for updating my patch and pushing to approve it.
Now 1251920 went away from gerrit :-)


 I don't quite understand why this test is problematic (Joe indicated
 it went in at about the time 1251920 became a problem). I would be
 very interested in finding out why this caused a problem.

test_create_backup deletes two server snapshot images at the end,
and I guess the deleting process runs with the next
test(test_get_console_output)
in parallel. As the result, heavy workload causes at test_get_console_output,
and it is a little difficult to get console log.
Now the problem is in work around, I think we would solve it by waiting
for the end of the image delete in each test. I will dig this problem more
next week.


 You can see frequencies for bugs with known signatures at
 http://status.openstack.org/elastic-recheck/

Thank you for the info, that is interesting.


Thanks
Ken'ichi Ohmichi

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Top Gate Bugs

2013-11-21 Thread Matt Riedemann



On Wednesday, November 20, 2013 11:53:45 PM, Clark Boylan wrote:

On Wed, Nov 20, 2013 at 9:43 PM, Ken'ichi Ohmichi ken1ohmi...@gmail.com wrote:

Hi Joe,

2013/11/20 Joe Gordon joe.gord...@gmail.com:

Hi All,

As many of you have noticed the gate has been in very bad shape over the
past few days.  Here is a list of some of the top open bugs (without pending
patches, and many recent hits) that we are hitting.  Gate won't be stable,
and it will be hard to get your code merged, until we fix these bugs.

1) https://bugs.launchpad.net/bugs/1251920
nova
468 Hits


Can we know the frequency of each failure?
I'm trying 1251920 and putting the investigation tempest patch.
  https://review.openstack.org/#/c/57193/

The patch can avoid this problem 4 times, but I am not sure this is
worth or not.


Thanks
Ken'ichi Ohmichi

---

2) https://bugs.launchpad.net/bugs/1251784
neutron, Nova
328 Hits
3) https://bugs.launchpad.net/bugs/1249065
neutron
   122 hits
4) https://bugs.launchpad.net/bugs/1251448
neutron
65 Hits

Raw Data:


Note: If a bug has any hits for anything besides failure, it means the
fingerprint isn't perfect.

Elastic recheck known issues
Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
console output was empty AND filename:console.html Title: Tempest
failures due to failure to return console logs from an instance Project:
Status nova: Confirmed Hits FAILURE: 468 Bug:
https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
message: 503 AND filename:logs/syslog.txt AND
syslog_program:proxy-server Title: swift proxy-server returning 503 during
tempest run Project: Status openstack-ci: Incomplete swift: New tempest: New
Hits FAILURE: 136 SUCCESS: 83
Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
nw_info cache associated with instance AND filename:logs/screen-n-api.txt
Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py Project:
Status neutron: New nova: Confirmed Hits FAILURE: 122 Bug:
https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
recover if Swift returns an error Project: Status devstack: New glance: New
swift: New Hits FAILURE: 95
Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
message:NovaException: Unexpected vif_type=binding_failed AND
filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
possible networks found, use a Network ID to be more specific. (HTTP 400)
AND filename:console.html Title: BadRequest: Multiple possible networks
found, use a Network ID to be more specific. Project: Status neutron: New
Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
message:tempest/services AND message:/images_client.py AND
message:wait_for_image_status AND filename:console.html Title:
TimeoutException: Request timed out on
tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML
Project: Status glance: New Hits FAILURE: 62 Bug:
https://bugs.launchpad.net/bugs/1235435 = message:One or more ports have
an IP allocation from this subnet AND message: SubnetInUse: Unable to
complete operation on subnet AND filename:logs/screen-q-svc.txt Title:
'SubnetInUse: Unable to complete operation on subnet UUID. One or more ports
have an IP allocation from this subnet.' Project: Status neutron: Incomplete
nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
https://bugs.launchpad.net/bugs/1224001 =
message:tempest.scenario.test_network_basic_ops AssertionError: Timed out
waiting for AND filename:console.html Title: test_network_basic_ops fails
waiting for network to become available Project: Status neutron: In Progress
swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug:
https://bugs.launchpad.net/bugs/1218391 = message:Cannot 'createImage'
AND filename:console.html Title:
tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestXML.test_delete_image_that_is_not_yet_active
spurious failure Project: Status nova: Confirmed swift: Confirmed tempest:
Confirmed Hits FAILURE: 25



best,
Joe Gordon

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Joe seemed to be on the same track with

Re: [openstack-dev] Top Gate Bugs

2013-11-21 Thread Christopher Yeoh
On Fri, Nov 22, 2013 at 2:28 AM, Matt Riedemann
mrie...@linux.vnet.ibm.comwrote:



 On Wednesday, November 20, 2013 11:53:45 PM, Clark Boylan wrote:

 On Wed, Nov 20, 2013 at 9:43 PM, Ken'ichi Ohmichi ken1ohmi...@gmail.com
 wrote:

 Hi Joe,

 2013/11/20 Joe Gordon joe.gord...@gmail.com:

 Hi All,

 As many of you have noticed the gate has been in very bad shape over the
 past few days.  Here is a list of some of the top open bugs (without
 pending
 patches, and many recent hits) that we are hitting.  Gate won't be
 stable,
 and it will be hard to get your code merged, until we fix these bugs.

 1) https://bugs.launchpad.net/bugs/1251920
 nova
 468 Hits


 Can we know the frequency of each failure?
 I'm trying 1251920 and putting the investigation tempest patch.
   https://review.openstack.org/#/c/57193/

 The patch can avoid this problem 4 times, but I am not sure this is
 worth or not.


 Thanks
 Ken'ichi Ohmichi

 ---

 2) https://bugs.launchpad.net/bugs/1251784
 neutron, Nova
 328 Hits
 3) https://bugs.launchpad.net/bugs/1249065
 neutron
122 hits
 4) https://bugs.launchpad.net/bugs/1251448
 neutron
 65 Hits

 Raw Data:


 Note: If a bug has any hits for anything besides failure, it means the
 fingerprint isn't perfect.

 Elastic recheck known issues
 Bug: https://bugs.launchpad.net/bugs/1251920 =
 message:assertionerror:
 console output was empty AND filename:console.html Title: Tempest
 failures due to failure to return console logs from an instance Project:
 Status nova: Confirmed Hits FAILURE: 468 Bug:
 https://bugs.launchpad.net/bugs/1251784 = message:Connection to
 neutron
 failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
 Title: nova+neutron scheduling error: Connection to neutron failed:
 Maximum
 attempts reached Project: Status neutron: New nova: New Hits FAILURE:
 328
 UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256=
 message: 503 AND filename:logs/syslog.txt AND
 syslog_program:proxy-server Title: swift proxy-server returning 503
 during
 tempest run Project: Status openstack-ci: Incomplete swift: New
 tempest: New
 Hits FAILURE: 136 SUCCESS: 83
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 =
 message:No
 nw_info cache associated with instance AND filename:logs/screen-n-api.
 txt
 Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py
 Project:
 Status neutron: New nova: Confirmed Hits FAILURE: 122 Bug:
 https://bugs.launchpad.net/bugs/1252514 = message:Got error from
 Swift:
 put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
 recover if Swift returns an error Project: Status devstack: New glance:
 New
 swift: New Hits FAILURE: 95
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
 message:NovaException: Unexpected vif_type=binding_failed AND
 filename:logs/screen-n-cpu.txt Title: binding_failed because of l2
 agent
 assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
 SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
 possible networks found, use a Network ID to be more specific. (HTTP
 400)
 AND filename:console.html Title: BadRequest: Multiple possible
 networks
 found, use a Network ID to be more specific. Project: Status neutron:
 New
 Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
 message:tempest/services AND message:/images_client.py AND
 message:wait_for_image_status AND filename:console.html Title:
 TimeoutException: Request timed out on
 tempest.api.compute.images.test_list_image_filters.
 ListImageFiltersTestXML
 Project: Status glance: New Hits FAILURE: 62 Bug:
 https://bugs.launchpad.net/bugs/1235435 = message:One or more ports
 have
 an IP allocation from this subnet AND message: SubnetInUse: Unable to
 complete operation on subnet AND filename:logs/screen-q-svc.txt
 Title:
 'SubnetInUse: Unable to complete operation on subnet UUID. One or more
 ports
 have an IP allocation from this subnet.' Project: Status neutron:
 Incomplete
 nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
 https://bugs.launchpad.net/bugs/1224001 =
 message:tempest.scenario.test_network_basic_ops AssertionError: Timed
 out
 waiting for AND filename:console.html Title: test_network_basic_ops
 fails
 waiting for network to become available Project: Status neutron: In
 Progress
 swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug:
 https://bugs.launchpad.net/bugs/1218391 = message:Cannot
 'createImage'
 AND filename:console.html Title:
 tempest.api.compute.images.test_images_oneserver.
 ImagesOneServerTestXML.test_delete_image_that_is_not_yet_active
 spurious failure Project: Status nova: Confirmed swift: Confirmed
 tempest:
 Confirmed Hits FAILURE: 25



 best,
 Joe Gordon

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 

Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Anita Kuno
Thanks for posting this, Joe. It really helps to create focus so we can
address these bugs.

We are chatting in #openstack-neutron about 1251784, 1249065, and 1251448.

We are looking for someone to work on 1251784 - I had mentioned it at
Monday's Neutron team meeting and am trying to shop it around in
-neutron now. We need someone other than Salvatore, Aaron or Maru to
work on this since they each have at least one very important bug they
are working on. Please join us in #openstack-neutron and lend a hand -
all of OpenStack needs your help.

Bug 1249065 is assigned to Aaron Rosen, who isn't in the channel at the
moment, so I don't have an update on his progress or any blockers he is
facing. Hopefully (if you are reading this Aaron) he will join us in
channel soon and I had hear from him about his status.

Bug 1251448 is assigned to Maru Newby, who I am talking with now in
-neutron. He is addressing the bug. I will share what information I have
regarding this one when I have some.

We are all looking forward to a more stable gate and this information
really helps.

Thanks again, Joe,
Anita.

On 11/20/2013 01:09 AM, Joe Gordon wrote:
 Hi All,
 
 As many of you have noticed the gate has been in very bad shape over the
 past few days.  Here is a list of some of the top open bugs (without
 pending patches, and many recent hits) that we are hitting.  Gate won't be
 stable, and it will be hard to get your code merged, until we fix these
 bugs.
 
 1) https://bugs.launchpad.net/bugs/1251920
  nova
 468 Hits
 2) https://bugs.launchpad.net/bugs/1251784
  neutron, Nova
  328 Hits
 3) https://bugs.launchpad.net/bugs/1249065
  neutron
   122 hits
 4) https://bugs.launchpad.net/bugs/1251448
  neutron
 65 Hits
 
 Raw Data:
 
 
 Note: If a bug has any hits for anything besides failure, it means the
 fingerprint isn't perfect.
 
 Elastic recheck known issues
  Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
 console output was empty AND filename:console.html Title: Tempest
 failures due to failure to return console logs from an instance Project:
 Status nova: Confirmed Hits FAILURE: 468 Bug:
 https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
 failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
 Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
 attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
 UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
 message: 503 AND filename:logs/syslog.txt AND
 syslog_program:proxy-server Title: swift proxy-server returning 503
 during tempest run Project: Status openstack-ci: Incomplete swift: New
 tempest: New Hits FAILURE: 136 SUCCESS: 83
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
 nw_info cache associated with instance AND
 filename:logs/screen-n-api.txt Title: Tempest failure:
 tempest/scenario/test_snapshot_pattern.py Project: Status neutron: New
 nova: Confirmed Hits FAILURE: 122 Bug:
 https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
 put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
 recover if Swift returns an error Project: Status devstack: New glance: New
 swift: New Hits FAILURE: 95
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
 message:NovaException: Unexpected vif_type=binding_failed AND
 filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
 assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
 SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
 possible networks found, use a Network ID to be more specific. (HTTP 400)
 AND filename:console.html Title: BadRequest: Multiple possible networks
 found, use a Network ID to be more specific. Project: Status neutron: New
 Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
 message:tempest/services AND message:/images_client.py AND
 message:wait_for_image_status AND filename:console.html Title:
 TimeoutException: Request timed out on
 tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML
 Project: Status glance: New Hits FAILURE: 62 Bug:
 https://bugs.launchpad.net/bugs/1235435 = message:One or more ports have
 an IP allocation from this subnet AND message: SubnetInUse: Unable to
 complete operation on subnet AND filename:logs/screen-q-svc.txt Title:
 'SubnetInUse: Unable to complete operation on subnet UUID. One or more
 ports have an IP allocation from this subnet.' Project: Status neutron:
 Incomplete nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
 https://bugs.launchpad.net/bugs/1224001 =
 message:tempest.scenario.test_network_basic_ops AssertionError: Timed out
 waiting for AND filename:console.html Title: test_network_basic_ops
 fails waiting for network to become available Project: Status neutron: In
 Progress swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug:
 https://bugs.launchpad.net/bugs/1218391 = message:Cannot 

Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Derek Higgins
On 20/11/13 14:21, Anita Kuno wrote:
 Thanks for posting this, Joe. It really helps to create focus so we can
 address these bugs.
 
 We are chatting in #openstack-neutron about 1251784, 1249065, and 1251448.
 
 We are looking for someone to work on 1251784 - I had mentioned it at
 Monday's Neutron team meeting and am trying to shop it around in
 -neutron now. We need someone other than Salvatore, Aaron or Maru to
 work on this since they each have at least one very important bug they
 are working on. Please join us in #openstack-neutron and lend a hand -
 all of OpenStack needs your help.

I've been hitting this in tripleo intermittently for the last few days
(or it at least looks to be the same bug), this morning while trying to
debug the problem I noticed http request/responses happening out of
order. I've added details to the bug.

https://bugs.launchpad.net/tripleo/+bug/1251784

 
 Bug 1249065 is assigned to Aaron Rosen, who isn't in the channel at the
 moment, so I don't have an update on his progress or any blockers he is
 facing. Hopefully (if you are reading this Aaron) he will join us in
 channel soon and I had hear from him about his status.
 
 Bug 1251448 is assigned to Maru Newby, who I am talking with now in
 -neutron. He is addressing the bug. I will share what information I have
 regarding this one when I have some.
 
 We are all looking forward to a more stable gate and this information
 really helps.
 
 Thanks again, Joe,
 Anita.
 
 On 11/20/2013 01:09 AM, Joe Gordon wrote:
 Hi All,

 As many of you have noticed the gate has been in very bad shape over the
 past few days.  Here is a list of some of the top open bugs (without
 pending patches, and many recent hits) that we are hitting.  Gate won't be
 stable, and it will be hard to get your code merged, until we fix these
 bugs.

 1) https://bugs.launchpad.net/bugs/1251920
  nova
 468 Hits
 2) https://bugs.launchpad.net/bugs/1251784
  neutron, Nova
  328 Hits
 3) https://bugs.launchpad.net/bugs/1249065
  neutron
   122 hits
 4) https://bugs.launchpad.net/bugs/1251448
  neutron
 65 Hits

 Raw Data:


 Note: If a bug has any hits for anything besides failure, it means the
 fingerprint isn't perfect.

 Elastic recheck known issues
  Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
 console output was empty AND filename:console.html Title: Tempest
 failures due to failure to return console logs from an instance Project:
 Status nova: Confirmed Hits FAILURE: 468 Bug:
 https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
 failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
 Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
 attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
 UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
 message: 503 AND filename:logs/syslog.txt AND
 syslog_program:proxy-server Title: swift proxy-server returning 503
 during tempest run Project: Status openstack-ci: Incomplete swift: New
 tempest: New Hits FAILURE: 136 SUCCESS: 83
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
 nw_info cache associated with instance AND
 filename:logs/screen-n-api.txt Title: Tempest failure:
 tempest/scenario/test_snapshot_pattern.py Project: Status neutron: New
 nova: Confirmed Hits FAILURE: 122 Bug:
 https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
 put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
 recover if Swift returns an error Project: Status devstack: New glance: New
 swift: New Hits FAILURE: 95
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
 message:NovaException: Unexpected vif_type=binding_failed AND
 filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
 assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
 SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
 possible networks found, use a Network ID to be more specific. (HTTP 400)
 AND filename:console.html Title: BadRequest: Multiple possible networks
 found, use a Network ID to be more specific. Project: Status neutron: New
 Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
 message:tempest/services AND message:/images_client.py AND
 message:wait_for_image_status AND filename:console.html Title:
 TimeoutException: Request timed out on
 tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML
 Project: Status glance: New Hits FAILURE: 62 Bug:
 https://bugs.launchpad.net/bugs/1235435 = message:One or more ports have
 an IP allocation from this subnet AND message: SubnetInUse: Unable to
 complete operation on subnet AND filename:logs/screen-q-svc.txt Title:
 'SubnetInUse: Unable to complete operation on subnet UUID. One or more
 ports have an IP allocation from this subnet.' Project: Status neutron:
 Incomplete nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
 

Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Roman Podoliaka
Hey all,

I think I found a serious bug in our usage of eventlet thread local
storage. Please check out this snippet [1].

This is how we use eventlet TLS in Nova and common Oslo code [2]. This
could explain how [3] actually breaks TripleO devtest story and our
gates.

Am I right? Or I am missing something and should get some sleep? :)

Thanks,
Roman

[1] http://paste.openstack.org/show/53686/
[2] 
https://github.com/openstack/nova/blob/master/nova/openstack/common/local.py#L48
[3] 
https://github.com/openstack/nova/commit/85332012dede96fa6729026c2a90594ea0502ac5

On Wed, Nov 20, 2013 at 5:55 PM, Derek Higgins der...@redhat.com wrote:
 On 20/11/13 14:21, Anita Kuno wrote:
 Thanks for posting this, Joe. It really helps to create focus so we can
 address these bugs.

 We are chatting in #openstack-neutron about 1251784, 1249065, and 1251448.

 We are looking for someone to work on 1251784 - I had mentioned it at
 Monday's Neutron team meeting and am trying to shop it around in
 -neutron now. We need someone other than Salvatore, Aaron or Maru to
 work on this since they each have at least one very important bug they
 are working on. Please join us in #openstack-neutron and lend a hand -
 all of OpenStack needs your help.

 I've been hitting this in tripleo intermittently for the last few days
 (or it at least looks to be the same bug), this morning while trying to
 debug the problem I noticed http request/responses happening out of
 order. I've added details to the bug.

 https://bugs.launchpad.net/tripleo/+bug/1251784


 Bug 1249065 is assigned to Aaron Rosen, who isn't in the channel at the
 moment, so I don't have an update on his progress or any blockers he is
 facing. Hopefully (if you are reading this Aaron) he will join us in
 channel soon and I had hear from him about his status.

 Bug 1251448 is assigned to Maru Newby, who I am talking with now in
 -neutron. He is addressing the bug. I will share what information I have
 regarding this one when I have some.

 We are all looking forward to a more stable gate and this information
 really helps.

 Thanks again, Joe,
 Anita.

 On 11/20/2013 01:09 AM, Joe Gordon wrote:
 Hi All,

 As many of you have noticed the gate has been in very bad shape over the
 past few days.  Here is a list of some of the top open bugs (without
 pending patches, and many recent hits) that we are hitting.  Gate won't be
 stable, and it will be hard to get your code merged, until we fix these
 bugs.

 1) https://bugs.launchpad.net/bugs/1251920
  nova
 468 Hits
 2) https://bugs.launchpad.net/bugs/1251784
  neutron, Nova
  328 Hits
 3) https://bugs.launchpad.net/bugs/1249065
  neutron
   122 hits
 4) https://bugs.launchpad.net/bugs/1251448
  neutron
 65 Hits

 Raw Data:


 Note: If a bug has any hits for anything besides failure, it means the
 fingerprint isn't perfect.

 Elastic recheck known issues
  Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
 console output was empty AND filename:console.html Title: Tempest
 failures due to failure to return console logs from an instance Project:
 Status nova: Confirmed Hits FAILURE: 468 Bug:
 https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
 failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
 Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
 attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
 UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
 message: 503 AND filename:logs/syslog.txt AND
 syslog_program:proxy-server Title: swift proxy-server returning 503
 during tempest run Project: Status openstack-ci: Incomplete swift: New
 tempest: New Hits FAILURE: 136 SUCCESS: 83
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
 nw_info cache associated with instance AND
 filename:logs/screen-n-api.txt Title: Tempest failure:
 tempest/scenario/test_snapshot_pattern.py Project: Status neutron: New
 nova: Confirmed Hits FAILURE: 122 Bug:
 https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
 put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
 recover if Swift returns an error Project: Status devstack: New glance: New
 swift: New Hits FAILURE: 95
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
 message:NovaException: Unexpected vif_type=binding_failed AND
 filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
 assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
 SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
 possible networks found, use a Network ID to be more specific. (HTTP 400)
 AND filename:console.html Title: BadRequest: Multiple possible networks
 found, use a Network ID to be more specific. Project: Status neutron: New
 Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
 message:tempest/services AND message:/images_client.py AND
 message:wait_for_image_status AND 

Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Alex Gaynor
Nope, you're totally right, corolocal.local is a class, whose instances are
the actual coroutine local storage.

Alex


On Wed, Nov 20, 2013 at 9:11 AM, Roman Podoliaka rpodoly...@mirantis.comwrote:

 Hey all,

 I think I found a serious bug in our usage of eventlet thread local
 storage. Please check out this snippet [1].

 This is how we use eventlet TLS in Nova and common Oslo code [2]. This
 could explain how [3] actually breaks TripleO devtest story and our
 gates.

 Am I right? Or I am missing something and should get some sleep? :)

 Thanks,
 Roman

 [1] http://paste.openstack.org/show/53686/
 [2]
 https://github.com/openstack/nova/blob/master/nova/openstack/common/local.py#L48
 [3]
 https://github.com/openstack/nova/commit/85332012dede96fa6729026c2a90594ea0502ac5

 On Wed, Nov 20, 2013 at 5:55 PM, Derek Higgins der...@redhat.com wrote:
  On 20/11/13 14:21, Anita Kuno wrote:
  Thanks for posting this, Joe. It really helps to create focus so we can
  address these bugs.
 
  We are chatting in #openstack-neutron about 1251784, 1249065, and
 1251448.
 
  We are looking for someone to work on 1251784 - I had mentioned it at
  Monday's Neutron team meeting and am trying to shop it around in
  -neutron now. We need someone other than Salvatore, Aaron or Maru to
  work on this since they each have at least one very important bug they
  are working on. Please join us in #openstack-neutron and lend a hand -
  all of OpenStack needs your help.
 
  I've been hitting this in tripleo intermittently for the last few days
  (or it at least looks to be the same bug), this morning while trying to
  debug the problem I noticed http request/responses happening out of
  order. I've added details to the bug.
 
  https://bugs.launchpad.net/tripleo/+bug/1251784
 
 
  Bug 1249065 is assigned to Aaron Rosen, who isn't in the channel at the
  moment, so I don't have an update on his progress or any blockers he is
  facing. Hopefully (if you are reading this Aaron) he will join us in
  channel soon and I had hear from him about his status.
 
  Bug 1251448 is assigned to Maru Newby, who I am talking with now in
  -neutron. He is addressing the bug. I will share what information I have
  regarding this one when I have some.
 
  We are all looking forward to a more stable gate and this information
  really helps.
 
  Thanks again, Joe,
  Anita.
 
  On 11/20/2013 01:09 AM, Joe Gordon wrote:
  Hi All,
 
  As many of you have noticed the gate has been in very bad shape over
 the
  past few days.  Here is a list of some of the top open bugs (without
  pending patches, and many recent hits) that we are hitting.  Gate
 won't be
  stable, and it will be hard to get your code merged, until we fix these
  bugs.
 
  1) https://bugs.launchpad.net/bugs/1251920
   nova
  468 Hits
  2) https://bugs.launchpad.net/bugs/1251784
   neutron, Nova
   328 Hits
  3) https://bugs.launchpad.net/bugs/1249065
   neutron
122 hits
  4) https://bugs.launchpad.net/bugs/1251448
   neutron
  65 Hits
 
  Raw Data:
 
 
  Note: If a bug has any hits for anything besides failure, it means the
  fingerprint isn't perfect.
 
  Elastic recheck known issues
   Bug: https://bugs.launchpad.net/bugs/1251920 =
 message:assertionerror:
  console output was empty AND filename:console.html Title: Tempest
  failures due to failure to return console logs from an instance
 Project:
  Status nova: Confirmed Hits FAILURE: 468 Bug:
  https://bugs.launchpad.net/bugs/1251784 = message:Connection to
 neutron
  failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
  Title: nova+neutron scheduling error: Connection to neutron failed:
 Maximum
  attempts reached Project: Status neutron: New nova: New Hits FAILURE:
 328
  UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256=
  message: 503 AND filename:logs/syslog.txt AND
  syslog_program:proxy-server Title: swift proxy-server returning 503
  during tempest run Project: Status openstack-ci: Incomplete swift: New
  tempest: New Hits FAILURE: 136 SUCCESS: 83
  Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 =
 message:No
  nw_info cache associated with instance AND
  filename:logs/screen-n-api.txt Title: Tempest failure:
  tempest/scenario/test_snapshot_pattern.py Project: Status neutron: New
  nova: Confirmed Hits FAILURE: 122 Bug:
  https://bugs.launchpad.net/bugs/1252514 = message:Got error from
 Swift:
  put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
  recover if Swift returns an error Project: Status devstack: New
 glance: New
  swift: New Hits FAILURE: 95
  Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
  message:NovaException: Unexpected vif_type=binding_failed AND
  filename:logs/screen-n-cpu.txt Title: binding_failed because of l2
 agent
  assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
  SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
  possible networks found, use a Network ID to be more specific. (HTTP
 

Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Russell Bryant
On 11/20/2013 12:21 PM, Alex Gaynor wrote:
 Nope, you're totally right, corolocal.local is a class, whose instances
 are the actual coroutine local storage.

But I don't think his example is what is being used.

Here is an example using the openstack.common.local module, which is
what nova uses for this.  This produces the expected output.

http://paste.openstack.org/show/53687/

https://git.openstack.org/cgit/openstack/nova/tree/nova/openstack/common/local.py

For reference, original example from OP:
http://paste.openstack.org/show/53686/

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Robert Collins
We settled on 1251920.

https://review.openstack.org/57509 is the fix for that bug.

Note that Oslo was fixed on Jun 28th, nova hasn't synced since then.
If we were using oslo as a library we would have had the fix as soon
as olso did a release.

These are the references to strong_store - and thus broken in nova
trunk (and if any references exist in H, in H too):
./nova/network/neutronv2/__init__.py:58:if not
hasattr(local.strong_store, 'neutron_client'):
./nova/network/neutronv2/__init__.py:59:
local.strong_store.neutron_client = _get_client(token=None)
./nova/network/neutronv2/__init__.py:60:return
local.strong_store.neutron_client
./nova/openstack/common/rpc/__init__.py:102:if
((hasattr(local.strong_store, 'locks_held')
./nova/openstack/common/rpc/__init__.py:103: and
local.strong_store.locks_held)):
./nova/openstack/common/rpc/__init__.py:108: {'locks':
local.strong_store.locks_held,
./nova/openstack/common/local.py:47:strong_store = threading.local()
./nova/openstack/common/lockutils.py:173:if not
hasattr(local.strong_store, 'locks_held'):
./nova/openstack/common/lockutils.py:174:
local.strong_store.locks_held = []
./nova/openstack/common/lockutils.py:175:
local.strong_store.locks_held.append(name)
./nova/openstack/common/lockutils.py:217:
local.strong_store.locks_held.remove(name)
./nova/tests/network/test_neutronv2.py:1837:
local.strong_store.neutron_client = None


So we can expect lockutils to be broken, and rpc to be broken. Clearly
they are being impacted more subtly than the neutron client usage.

-Rob


On 21 November 2013 07:44, Robert Collins robe...@robertcollins.net wrote:
 Which of these bugs would be appropriate to use for the fix to
 strong_store - it affects lockutils and rpc, both of which are going
 to create havoc :)

 -Rob

 On 21 November 2013 07:19, Salvatore Orlando sorla...@nicira.com wrote:
 I've noticed that
 https://github.com/openstack/nova/commit/85332012dede96fa6729026c2a90594ea0502ac5
 stores the network client in local.strong_store which is a reference to
 corolocal.local (the class, not the instance).

 In Russell's example instead the code accesses local.store which is an
 instance of WeakLocal (inheriting from corolocal.local).

 Perhaps then Roman's findings apply to the issue being observed on the gate.

 Regards,
 Salvatore


 On 20 November 2013 18:32, Russell Bryant rbry...@redhat.com wrote:

 On 11/20/2013 12:21 PM, Alex Gaynor wrote:
  Nope, you're totally right, corolocal.local is a class, whose instances
  are the actual coroutine local storage.

 But I don't think his example is what is being used.

 Here is an example using the openstack.common.local module, which is
 what nova uses for this.  This produces the expected output.

 http://paste.openstack.org/show/53687/


 https://git.openstack.org/cgit/openstack/nova/tree/nova/openstack/common/local.py

 For reference, original example from OP:
 http://paste.openstack.org/show/53686/

 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud



-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Ken'ichi Ohmichi
Hi Joe,

2013/11/20 Joe Gordon joe.gord...@gmail.com:
 Hi All,

 As many of you have noticed the gate has been in very bad shape over the
 past few days.  Here is a list of some of the top open bugs (without pending
 patches, and many recent hits) that we are hitting.  Gate won't be stable,
 and it will be hard to get your code merged, until we fix these bugs.

 1) https://bugs.launchpad.net/bugs/1251920
 nova
 468 Hits

Can we know the frequency of each failure?
I'm trying 1251920 and putting the investigation tempest patch.
 https://review.openstack.org/#/c/57193/

The patch can avoid this problem 4 times, but I am not sure this is
worth or not.


Thanks
Ken'ichi Ohmichi

---
 2) https://bugs.launchpad.net/bugs/1251784
 neutron, Nova
 328 Hits
 3) https://bugs.launchpad.net/bugs/1249065
 neutron
   122 hits
 4) https://bugs.launchpad.net/bugs/1251448
 neutron
 65 Hits

 Raw Data:


 Note: If a bug has any hits for anything besides failure, it means the
 fingerprint isn't perfect.

 Elastic recheck known issues
 Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
 console output was empty AND filename:console.html Title: Tempest
 failures due to failure to return console logs from an instance Project:
 Status nova: Confirmed Hits FAILURE: 468 Bug:
 https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
 failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
 Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
 attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
 UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
 message: 503 AND filename:logs/syslog.txt AND
 syslog_program:proxy-server Title: swift proxy-server returning 503 during
 tempest run Project: Status openstack-ci: Incomplete swift: New tempest: New
 Hits FAILURE: 136 SUCCESS: 83
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
 nw_info cache associated with instance AND filename:logs/screen-n-api.txt
 Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py Project:
 Status neutron: New nova: Confirmed Hits FAILURE: 122 Bug:
 https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
 put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
 recover if Swift returns an error Project: Status devstack: New glance: New
 swift: New Hits FAILURE: 95
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
 message:NovaException: Unexpected vif_type=binding_failed AND
 filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
 assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
 SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
 possible networks found, use a Network ID to be more specific. (HTTP 400)
 AND filename:console.html Title: BadRequest: Multiple possible networks
 found, use a Network ID to be more specific. Project: Status neutron: New
 Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
 message:tempest/services AND message:/images_client.py AND
 message:wait_for_image_status AND filename:console.html Title:
 TimeoutException: Request timed out on
 tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML
 Project: Status glance: New Hits FAILURE: 62 Bug:
 https://bugs.launchpad.net/bugs/1235435 = message:One or more ports have
 an IP allocation from this subnet AND message: SubnetInUse: Unable to
 complete operation on subnet AND filename:logs/screen-q-svc.txt Title:
 'SubnetInUse: Unable to complete operation on subnet UUID. One or more ports
 have an IP allocation from this subnet.' Project: Status neutron: Incomplete
 nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
 https://bugs.launchpad.net/bugs/1224001 =
 message:tempest.scenario.test_network_basic_ops AssertionError: Timed out
 waiting for AND filename:console.html Title: test_network_basic_ops fails
 waiting for network to become available Project: Status neutron: In Progress
 swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug:
 https://bugs.launchpad.net/bugs/1218391 = message:Cannot 'createImage'
 AND filename:console.html Title:
 tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestXML.test_delete_image_that_is_not_yet_active
 spurious failure Project: Status nova: Confirmed swift: Confirmed tempest:
 Confirmed Hits FAILURE: 25



 best,
 Joe Gordon

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Top Gate Bugs

2013-11-20 Thread Clark Boylan
On Wed, Nov 20, 2013 at 9:43 PM, Ken'ichi Ohmichi ken1ohmi...@gmail.com wrote:
 Hi Joe,

 2013/11/20 Joe Gordon joe.gord...@gmail.com:
 Hi All,

 As many of you have noticed the gate has been in very bad shape over the
 past few days.  Here is a list of some of the top open bugs (without pending
 patches, and many recent hits) that we are hitting.  Gate won't be stable,
 and it will be hard to get your code merged, until we fix these bugs.

 1) https://bugs.launchpad.net/bugs/1251920
 nova
 468 Hits

 Can we know the frequency of each failure?
 I'm trying 1251920 and putting the investigation tempest patch.
  https://review.openstack.org/#/c/57193/

 The patch can avoid this problem 4 times, but I am not sure this is
 worth or not.


 Thanks
 Ken'ichi Ohmichi

 ---
 2) https://bugs.launchpad.net/bugs/1251784
 neutron, Nova
 328 Hits
 3) https://bugs.launchpad.net/bugs/1249065
 neutron
   122 hits
 4) https://bugs.launchpad.net/bugs/1251448
 neutron
 65 Hits

 Raw Data:


 Note: If a bug has any hits for anything besides failure, it means the
 fingerprint isn't perfect.

 Elastic recheck known issues
 Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
 console output was empty AND filename:console.html Title: Tempest
 failures due to failure to return console logs from an instance Project:
 Status nova: Confirmed Hits FAILURE: 468 Bug:
 https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
 failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
 Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
 attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
 UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
 message: 503 AND filename:logs/syslog.txt AND
 syslog_program:proxy-server Title: swift proxy-server returning 503 during
 tempest run Project: Status openstack-ci: Incomplete swift: New tempest: New
 Hits FAILURE: 136 SUCCESS: 83
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
 nw_info cache associated with instance AND filename:logs/screen-n-api.txt
 Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py Project:
 Status neutron: New nova: Confirmed Hits FAILURE: 122 Bug:
 https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
 put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
 recover if Swift returns an error Project: Status devstack: New glance: New
 swift: New Hits FAILURE: 95
 Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
 message:NovaException: Unexpected vif_type=binding_failed AND
 filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
 assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
 SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
 possible networks found, use a Network ID to be more specific. (HTTP 400)
 AND filename:console.html Title: BadRequest: Multiple possible networks
 found, use a Network ID to be more specific. Project: Status neutron: New
 Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
 message:tempest/services AND message:/images_client.py AND
 message:wait_for_image_status AND filename:console.html Title:
 TimeoutException: Request timed out on
 tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML
 Project: Status glance: New Hits FAILURE: 62 Bug:
 https://bugs.launchpad.net/bugs/1235435 = message:One or more ports have
 an IP allocation from this subnet AND message: SubnetInUse: Unable to
 complete operation on subnet AND filename:logs/screen-q-svc.txt Title:
 'SubnetInUse: Unable to complete operation on subnet UUID. One or more ports
 have an IP allocation from this subnet.' Project: Status neutron: Incomplete
 nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
 https://bugs.launchpad.net/bugs/1224001 =
 message:tempest.scenario.test_network_basic_ops AssertionError: Timed out
 waiting for AND filename:console.html Title: test_network_basic_ops fails
 waiting for network to become available Project: Status neutron: In Progress
 swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug:
 https://bugs.launchpad.net/bugs/1218391 = message:Cannot 'createImage'
 AND filename:console.html Title:
 tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestXML.test_delete_image_that_is_not_yet_active
 spurious failure Project: Status nova: Confirmed swift: Confirmed tempest:
 Confirmed Hits FAILURE: 25



 best,
 Joe Gordon

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Joe seemed to be on the same track with

[openstack-dev] Top Gate Bugs

2013-11-19 Thread Joe Gordon
Hi All,

As many of you have noticed the gate has been in very bad shape over the
past few days.  Here is a list of some of the top open bugs (without
pending patches, and many recent hits) that we are hitting.  Gate won't be
stable, and it will be hard to get your code merged, until we fix these
bugs.

1) https://bugs.launchpad.net/bugs/1251920
 nova
468 Hits
2) https://bugs.launchpad.net/bugs/1251784
 neutron, Nova
 328 Hits
3) https://bugs.launchpad.net/bugs/1249065
 neutron
  122 hits
4) https://bugs.launchpad.net/bugs/1251448
 neutron
65 Hits

Raw Data:


Note: If a bug has any hits for anything besides failure, it means the
fingerprint isn't perfect.

Elastic recheck known issues
 Bug: https://bugs.launchpad.net/bugs/1251920 = message:assertionerror:
console output was empty AND filename:console.html Title: Tempest
failures due to failure to return console logs from an instance Project:
Status nova: Confirmed Hits FAILURE: 468 Bug:
https://bugs.launchpad.net/bugs/1251784 = message:Connection to neutron
failed: Maximum attempts reached AND filename:logs/screen-n-cpu.txt
Title: nova+neutron scheduling error: Connection to neutron failed: Maximum
attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328
UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 =
message: 503 AND filename:logs/syslog.txt AND
syslog_program:proxy-server Title: swift proxy-server returning 503
during tempest run Project: Status openstack-ci: Incomplete swift: New
tempest: New Hits FAILURE: 136 SUCCESS: 83
Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 = message:No
nw_info cache associated with instance AND
filename:logs/screen-n-api.txt Title: Tempest failure:
tempest/scenario/test_snapshot_pattern.py Project: Status neutron: New
nova: Confirmed Hits FAILURE: 122 Bug:
https://bugs.launchpad.net/bugs/1252514 = message:Got error from Swift:
put_object AND filename:logs/screen-g-api.txt Title: glance doesn't
recover if Swift returns an error Project: Status devstack: New glance: New
swift: New Hits FAILURE: 95
Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 =
message:NovaException: Unexpected vif_type=binding_failed AND
filename:logs/screen-n-cpu.txt Title: binding_failed because of l2 agent
assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92
SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 = message:
possible networks found, use a Network ID to be more specific. (HTTP 400)
AND filename:console.html Title: BadRequest: Multiple possible networks
found, use a Network ID to be more specific. Project: Status neutron: New
Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 =
message:tempest/services AND message:/images_client.py AND
message:wait_for_image_status AND filename:console.html Title:
TimeoutException: Request timed out on
tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML
Project: Status glance: New Hits FAILURE: 62 Bug:
https://bugs.launchpad.net/bugs/1235435 = message:One or more ports have
an IP allocation from this subnet AND message: SubnetInUse: Unable to
complete operation on subnet AND filename:logs/screen-q-svc.txt Title:
'SubnetInUse: Unable to complete operation on subnet UUID. One or more
ports have an IP allocation from this subnet.' Project: Status neutron:
Incomplete nova: Fix Committed tempest: New Hits FAILURE: 48 Bug:
https://bugs.launchpad.net/bugs/1224001 =
message:tempest.scenario.test_network_basic_ops AssertionError: Timed out
waiting for AND filename:console.html Title: test_network_basic_ops
fails waiting for network to become available Project: Status neutron: In
Progress swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug:
https://bugs.launchpad.net/bugs/1218391 = message:Cannot 'createImage'
AND filename:console.html Title:
tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestXML.test_delete_image_that_is_not_yet_active
spurious failure Project: Status nova: Confirmed swift: Confirmed tempest:
Confirmed Hits FAILURE: 25



best,
Joe Gordon
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev