Re: [openstack-dev] [Infra][nova][magnum] Jenkins failed quite often for "Cannot set up guest memory 'pc.ram': Cannot allocate memory"

2015-12-13 Thread pcrews

Hi,

OVH is a new cloud provider for openstack-infra nodes:
http://www.openstack.org/blog/2015/12/announcing-a-new-cloud-provider-for-openstacks-ci-system-ovh/

It appears that selection of nodes on any cloud provider is a matter of 
luck:
"When a developer uploads a proposed change to an OpenStack project, 
available instances from any of our contributing cloud providers will be 
used interchangeably to test it."


You might want to ping people in #openstack-infra to find a point of 
contact for them (OVH) and/or to work with the infra folks directly to 
see about troubleshooting this further.



On 12/12/2015 02:16 PM, Hongbin Lu wrote:

Hi,

As Kai Qiang mentioned, magnum gate recently had a bunch of random
failures, which occurred on creating a nova instance with 2G of RAM.
According to the error message, it seems that the hypervisor tried to
allocate memory to the nova instance but couldn’t find enough free
memory in the host. However, by adding a few “nova hypervisor-show XX”
before, during, and right after the test, it showed that the host has 6G
of free RAM, which is far more than 2G. Here is a snapshot of the output
[1]. You can find the full log here [2].

Another observation is that most of the failure happened on a node with
name “devstack-trusty-ovh-*” (You can verify it by entering a query [3]
at http://logstash.openstack.org/ ). It seems that the jobs will be fine
if they are allocated to a node other than “ovh”.

Any hints to debug this issue further? Suggestions are greatly appreciated.



[1] http://paste.openstack.org/show/481746/

[2]
http://logs.openstack.org/48/256748/1/check/gate-functional-dsvm-magnum-swarm/56d79c3/console.html

[3] https://review.openstack.org/#/c/254370/2/queries/1521237.yaml

Best regards,

Hongbin

*From:*Kai Qiang Wu [mailto:wk...@cn.ibm.com]
*Sent:* December-09-15 7:23 AM
*To:* openstack-dev@lists.openstack.org
*Subject:* [openstack-dev] [Infra][nova][magnum] Jenkins failed quite
often for "Cannot set up guest memory 'pc.ram': Cannot allocate memory"

Hi All,

I am not sure what changes these days, We found quite often now, the
Jenkins failed for:


http://logs.openstack.org/07/244907/5/check/gate-functional-dsvm-magnum-k8s/5305d7a/logs/libvirt/libvirtd.txt.gz

2015-12-09 08:52:27.892
+:
22957: debug : qemuMonitorJSONCommandWithFd:264 : Send command
'{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
2015-12-09 08:52:27.892
+:
22957: debug : qemuMonitorSend:959 : QEMU_MONITOR_SEND_MSG:
mon=0x7fa66400c6f0 msg={"execute":"qmp_capabilities","id":"libvirt-1"}
  fd=-1
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:347 : dispatching to max 0
clients, called from event watch 6
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:360 : event not handled.
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:347 : dispatching to max 0
clients, called from event watch 6
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:360 : event not handled.
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:347 : dispatching to max 0
clients, called from event watch 6
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:360 : event not handled.
2015-12-09 08:52:28.070
+:
22951: error : qemuMonitorIORead:554 : Unable to read from monitor:
Connection reset by peer
2015-12-09 08:52:28.070
+:
22951: error : qemuMonitorIO:690 : internal error: early end of file
from monitor: possible problem:
Cannot set up guest memor

Re: [openstack-dev] [heat] heat delete woes in Juno

2015-03-26 Thread pcrews

Regarding item #3:
I have mainly seen this issue on stacks that have been snapshotted:
https://bugs.launchpad.net/heat/+bug/1412965

In such cases, the only way to avoid (afaik) is for the owner to 
manually delete the snapshots prior to deleting the stack.  Heat tries 
to auto-delete snapshots and hangs otherwise.


On 03/26/2015 11:17 AM, Matt Fischer wrote:

Nobody on the operators list had any ideas on this, so re-posting here.

We've been having some issues with heatdelete-stack in Juno. The issues
generally fall into three categories:

1) it takes multiple calls to heat to delete a stack. Presumably due
to heat being unable to figure out the ordering on deletion and
resources being in use.

2) undeleteable stacks. Stacks that refuse to delete, get stuck in
DELETE_FAILED state. In this case, they show up in stack-list and
stack-show, yet resource-list and stack-delete deny their existence.
This means I can't be sure whether they have any real resources very easily.

3) As a corollary to item 1, stacks for which heat can never unwind the
dependencies and stay in DELETE_IN_PROGRESS forever.

Does anyone have any work-arounds for these or recommendations on
cleanup? My main worry is removing a stack from the database that is
still consuming the customer's resources. I also don't just want to
remove stacks from the database and leave orphaned records in the DB.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] How to delete a VM which is in ERROR state?

2014-12-12 Thread pcrews

On 12/09/2014 03:54 PM, Ken'ichi Ohmichi wrote:

Hi,

This case is always tested by Tempest on the gate.

https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_delete_server.py#L152

So I guess this problem wouldn't happen on the latest version at least.

Thanks
Ken'ichi Ohmichi

---

2014-12-10 6:32 GMT+09:00 Joe Gordon :



On Sat, Dec 6, 2014 at 5:08 PM, Danny Choi (dannchoi) 
wrote:


Hi,

I have a VM which is in ERROR state.


+--+--+++-++

| ID   | Name
| Status | Task State | Power State | Networks   |


+--+--+++-++

| 1cb5bf96-619c-4174-baae-dd0d8c3d40c5 |
cirros--1cb5bf96-619c-4174-baae-dd0d8c3d40c5 | ERROR  | -  | NOSTATE
||


I tried in both CLI “nova delete” and Horizon “terminate instance”.
Both accepted the delete command without any error.
However, the VM never got deleted.

Is there a way to remove the VM?



What version of nova are you using? This is definitely a serious bug, you
should be able to delete an instance in error state. Can you file a bug that
includes steps on how to reproduce the bug along with all relevant logs.

bugs.launchpad.net/nova




Thanks,
Danny

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Hi,

I've encountered this in my own testing and have found that it appears 
to be tied to libvirt.


When I hit this, reset-state as the admin user reports success (and 
state is set), *but* things aren't really working as advertised and 
subsequent attempts to do anything with the errant vm's will send them 
right back into 'FLAIL' / can't delete / endless DELETING mode.


restarting libvirt-bin on my machine fixes this - after restart, the 
deleting vm's are properly wiped without any further user input to 
nova/horizon and all seems right in the world.


using:
devstack
ubuntu 14.04
libvirtd (libvirt) 1.2.2

triggered via:
lots of random create/reboot/resize/delete requests of varying validity 
and sanity.


Am in the process of cleaning up my test code so as not to hurt anyone's 
brain with the ugly and will file a bug once done, but thought this 
worth sharing.


Thanks,
Patrick

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal new hacking rules

2014-11-24 Thread pcrews

On 11/24/2014 09:40 AM, Ben Nemec wrote:

On 11/24/2014 08:50 AM, Matthew Gilliard wrote:

1/ assertFalse() vs assertEqual(x, False) - these are semantically
different because of python's notion of truthiness, so I don't think
we ought to make this a rule.

2/ expected/actual - incorrect failure messages have cost me more time
than I should admit to. I don't see any reason not to try to improve
in this area, even if it's difficult to automate.


Personally I'd rather kill the expected, actual ordering and just have
first, second or something that doesn't imply which value is which.
Because it can't be automatically enforced, we'll _never_ fix all of the
expected, actual mistakes (and will continually introduce new ones), so
I'd prefer to eliminate the confusion by not requiring a specific ordering.


++.  It should be a part of review to ensure that the test (including 
error messages) makes sense.  Simply having a (seemingly costly to 
implement and enforce) rule stating that something must adhere to a 
pattern does not guarantee that.


assertEqual(expected, actual, msg="nom nom nom cookie cookie yum") 
matches the pattern, but the message still doesn't necessarily provide 
much worth.


Focusing on making tests informative and clear about what is thought to 
be broken on failure seems to be the better target (imo).




Alternatively I suppose we could require kwargs for expected and actual
in assertEqual.  That would at least make it more obvious when someone
has gotten it backward, but again that's a ton of code churn for minimal
gain IMHO.



3/ warn{ing} - 
https://github.com/openstack/nova/blob/master/nova/hacking/checks.py#L322

On the overarching point: There is no way to get started with
OpenStack, other than starting small.  My first ever patch (a tidy-up)
was rejected for being trivial, and that was confusing and
disheartening. Nova has a lot on its plate, sure, and plenty of
pending code reviews.  But there is also a lot of inconsistency and
unloved code which *is* worth fixing, because a tidy codebase is a joy
to work with, *and* these changes are ideal to bring new reviewers and
developers into the project.

Linus' post on this from the LKML is almost a decade old (!) but worth reading.
https://lkml.org/lkml/2004/12/20/255

   MG

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] database hoarding

2014-10-30 Thread pcrews

On 10/30/2014 03:30 PM, Abel Lopez wrote:

It seems that every release, there is more and more emphasis on upgradability. 
This is a good thing, I've love to see production users easily go from old to 
new.

As an operator, I've seen first hand the results of neglecting the databases 
that openstack creates. If we intend to have users go year-over-year with 
upgrades, we're going to expect them to carry huge databases around.

Just in my lab, I have over 10 deleted instances in the last two months.
Frankly, I'm not convinced that simply archiving deleted rows is the best idea. 
Sure, gets your production databases and tables to a manageable size, but 
you're simply hoarding old data.

As an operator, I'd prefer to have time based criteria over number of rows, too.
I envision something like `nova-manage db purge [days]` where we can leave it 
up to the administrator to decide how much of their old data (if any) they'd be 
OK losing.

Think about data destruction guidelines too, some companies require old data be 
destroyed when not needed, others require maintaining it.
We can easily provide both here.

I've drafted a simple blueprint 
https://blueprints.launchpad.net/nova/+spec/database-purge

I've love some input from the community.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

HP's LBaaS code (libra), uses something similar for the reasons you 
state - 
http://libra.readthedocs.org/en/latest/admin_api/schedulers.html#expunge-scheduler


The admin-api code would go through and wipe any records that were older 
than the --expire-days parameter, although this was more of an automated 
process vs. a user-triggered function.


++ on the notion that this would be a useful and integrated 
quality-of-life tool for operations. Am in favor.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Contribution work flow

2014-09-13 Thread pcrews

Shar,

Hi!

1)  install git-review and set it up (poke around openstack docs)
2)  after crafting your patch in a new branch (git branch 
name-of-branch-you-are-working-on), commit the changes (git add -a), 
craft a commit message, save it, and then type git review
If everything is correct, it will submit to openstack's ci machine 
(review.openstack.org) and you can track CI testing + reviews and such.


Also - cool to see you working on OpenStack!

Cheers,
Patrick

On 09/13/2014 03:09 AM, Sharan Kumar M wrote:


Hi,

I am about to submit my first patch. I saw the contributions guidelines
in the documentations. Just to make it clear, is it that I issue a pull
request in GitHub, which automatically pushes my patch to gerrit? Also,
I found something called change-Id in the commit message. Is it the hash
code for the git commit? If yes, should we prefix a 'I' in the beginning
of hash code?

Thanks,
Sharan Kumar M


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][mysql] IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE

2014-05-20 Thread pcrews

On 05/20/2014 10:07 AM, Jay Pipes wrote:

On 05/19/2014 02:32 PM, sridhar basam wrote:

On Mon, May 19, 2014 at 1:30 PM, Jay Pipes mailto:jaypi...@gmail.com>> wrote:

Stackers,

On Friday in Atlanta, I had the pleasure of moderating the database
session at the Ops Meetup track. We had lots of good discussions and
heard important feedback from operators on DB topics.

For the record, I would not bring this point up so publicly unless I
believed it was a serious problem affecting a large segment of
users. When doing an informal survey of the users/operators in the
room at the start of the session, out of approximately 200 people in
the room, only a single person was using PostgreSQL, about a dozen
were using standard MySQL master/slave replication, and the rest
were using MySQL Galera clustering. So, this is a real issue for a
large segment of the operators -- or at least the ones at the
session. :)


​We are one of those operators that use Galera for replicating our mysql
databases. We used to  see issues with deadlocks when having multiple
mysql writers in our mysql cluster. As a workaround we have our haproxy
configuration in an active-standby configuration for our mysql VIP.

I seem to recall we had a lot of the deadlocks happen through Neutron.
When we go through our Icehouse testing, we will redo our multimaster
mysql setup and provide feedback on the issues we see.


Thanks very much, Sridar, much appreciated.

This issue was raised at the Neutron IRC meeting yesterday, and we've
agreed to take a staged approach. We will first work on documentation to
add to the operations guide that explains the issues (and the tradeoffs
of going to a single-writer cluster configuration vs. just having the
clients retry some request). Later stages will work on a non-locking
quota-management algorithm, possibly in conjunction with Climate, and
looking into how to use coarser-grained file locks or a distributed lock
manager for handling cross-component deterministic reads in Neutron.

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Am late to this topic, but wanted to share this in case anyone wanted to 
read further on this behavior with galera - 
http://www.mysqlperformanceblog.com/2012/08/17/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/


--
patrick


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] moratorium on new negative tests in Tempest

2013-11-12 Thread pcrews

On 11/12/2013 12:20 PM, Monty Taylor wrote:



On 11/12/2013 02:33 PM, David Kranz wrote:

On 11/12/2013 01:36 PM, Clint Byrum wrote:

Excerpts from Sean Dague's message of 2013-11-12 10:01:06 -0800:

During the freeze phase of Havana we got a ton of new contributors
coming on board to Tempest, which was super cool. However it meant we
had this new influx of negative tests (i.e. tests which push invalid
parameters looking for error codes) which made us realize that human
creation and review of negative tests really doesn't scale. David Kranz
is working on a generative model for this now.


Are there some notes or other source material we can follow to understand
this line of thinking? I don't agree or disagree with it, as I don't
really understand, so it would be helpful to have the problems enumerated
and the solution hypothesis stated. Thanks!

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I am working on this with Marc Koderer but we only just started and are
not quite ready. But since you asked now...

The problem is that the current implementation of negative tests is that
each "case" is represented as code in a method and targets a particular
set of api arguments and expected result. In most (but not all) of these
tests there is boilerplate code surrounding the real content which is
the actual arguments being passed and the value expected. That
boilerplate code has to be written correctly and reviewed. The general
form of the solution has to be worked out but basically would involve
expressing these tests declaratively, perhaps in a yaml file. In order
to do this we will need some kind of json schema for each api. The main
implementation around this is defining the yaml attributes that make it
easy to express the test cases, and somehow coming up with the json
schema for each api.

In addition, we would like to support "fuzz testing" where arguments
are, at least partially, randomly generated and the return values are
only examined for 4xx vs something else. This would be possible if we
had json schemas. The main work is to write a generator and methods for
creating bad values including boundary conditions for types with ranges.
I had thought a bit about this last year and poked around for an
existing framework. I didn't find anything that seemed to make the job
much easier but if any one knows of such a thing (python, hopefully)
please let me know.

The negative tests for each api would be some combination of
declaratively specified cases and auto-generated ones.

With regard to the json schema, there have been various attempts at this
in the past, including some ideas of how wsme/pecan will help, and it
might be helpful to have more project coordination. I can see a few
options:

1. Tempest keeps its own json schema data
2. Each project keeps its own json schema in a way that supports
automated extraction
3. There are several use cases for json schema like this and it gets
stored in some openstacky place that is not in tempest

So that is the starting point. Comments and suggestions welcome! Marc
and I just started working on an etherpad
https://etherpad.openstack.org/p/bp_negative_tests but any one is
welcome to contribute there.


We actually did this back in the good old Drizzle days- and by we, I
mean Patrick Crews, who I copied here. He can refer to the research
better than I can, but AIUI, generative schema-driven testing of things
like this is certainly the right direction. It's about 10 years behind
the actual state of the art of the research, but it's in all ways
superior to making human combinations of input parameters and output
behaviors.


Thanks, Monty.
As Monty has stated, similar issues have been encountered in database 
testing.
They are also complex, richly features systems that present interesting 
testing challenges.
The best research regarding stochastic / randomized / high-volume / 
machine-generated test cases that I have seen has come from Microsoft's 
SQL Server team and it is this research that informed the creation of 
the random query generator tool for MySQL systems.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.3435&rep=rep1&type=pdf 
<-- MS paper on their db testing tools


We've been doing similar work with the test suite for libra - we 
organize things by api actions and define validation code (if name > 
max_len, we expect return value NNN, if user=bad, we expect MMM, etc) 
We have singular test cases (create_lb, update_lb, update_lb_nodes) and 
we feed in various parameters (names, number of nodes, etc) to produce 
several iterations of the test w/ different inputs.


This allows us to have one chunk of code that appropriately describes 
the api action's behavior while letting us quickly make new tests for 
that action by simply creating a new yaml file or adding to an existing one.


Some background:
Basically testing co