Re: [openstack-dev] [tempest] Small doubt in Tempest setup

2018-08-06 Thread Attila Fazekas
I was tried to be quick and become wrong. ;-)

Here are the working ways:

On Mon, Aug 6, 2018 at 3:49 PM, Attila Fazekas  wrote:

> Please use ostestr or stestr instead of testr.
>
> $ git clone https://github.com/openstack/tempest
> $ cd tempest/
> $ stestr init
> $ stestr list
>
> $ git clone https://github.com/openstack/tempest
> $ cd tempest/
> $ ostestr -l #old way, also worked, does to steps
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest] Small doubt in Tempest setup

2018-08-06 Thread Attila Fazekas
Please use ostestr or stestr instead of testr.

$ git clone https://github.com/openstack/tempest
$ cd tempest/
$ stestr --list

$ ostestr -l #old way, also worked

These tools handling the config creation implicitly.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-29 Thread Attila Fazekas
I have overlay2 and super fast disk I/O (memory cheat + SSD),
just the CPU freq is not high. The CPU is a Broadwell
and actually it has lot more core (E5-2630V4). Even a 5 year old gamer CPU
can be 2 times
faster on a single core, but cannot compete with all of the cores ;-)

This machine have seen faster setup time,  but I'll return to this in an
another topic.

On Tue, Sep 26, 2017 at 6:16 PM, Michał Jastrzębski 
wrote:

> On 26 September 2017 at 07:34, Attila Fazekas  wrote:
> > decompressing those registry tar.gz takes ~0.5 min on 2.2 GHz CPU.
> >
> > Fully pulling all container takes something like ~4.5 min (from
> localhost,
> > one leaf request at a time),
> > but on the gate vm  we usually have 4 core,
> > so it is possible to go bellow 2 min with better pulling strategy,
> > unless we hit some disk limit.
>
> Check your $docker info. If you kept defaults, storage driver will be
> devicemapper on loopback, which is awfully slow and not very reliable.
> Overlay2 is much better and should speed things up quite a bit. For me
> deployment of 5 node openstack on vms similar to gate took 6min (I had
> registry available in same network). Also if you pull single image it
> will download all base images as well, so next one will be
> significantly faster.
>
> >
> > On Sat, Sep 23, 2017 at 5:12 AM, Michał Jastrzębski 
> > wrote:
> >>
> >> On 22 September 2017 at 17:21, Paul Belanger 
> >> wrote:
> >> > On Fri, Sep 22, 2017 at 02:31:20PM +, Jeremy Stanley wrote:
> >> >> On 2017-09-22 15:04:43 +0200 (+0200), Attila Fazekas wrote:
> >> >> > "if DevStack gets custom images prepped to make its jobs
> >> >> > run faster, won't Triple-O, Kolla, et cetera want the same and
> where
> >> >> > do we draw that line?). "
> >> >> >
> >> >> > IMHO we can try to have only one big image per distribution,
> >> >> > where the packages are the union of the packages requested by all
> >> >> > team,
> >> >> > minus the packages blacklisted by any team.
> >> >> [...]
> >> >>
> >> >> Until you realize that some projects want packages from UCA, from
> >> >> RDO, from EPEL, from third-party package repositories. Version
> >> >> conflicts mean they'll still spend time uninstalling the versions
> >> >> they don't want and downloading/installing the ones they do so we
> >> >> have to optimize for one particular set and make the rest
> >> >> second-class citizens in that scenario.
> >> >>
> >> >> Also, preinstalling packages means we _don't_ test that projects
> >> >> actually properly declare their system-level dependencies any
> >> >> longer. I don't know if anyone's concerned about that currently, but
> >> >> it used to be the case that we'd regularly add/break the package
> >> >> dependency declarations in DevStack because of running on images
> >> >> where the things it expected were preinstalled.
> >> >> --
> >> >> Jeremy Stanley
> >> >
> >> > +1
> >> >
> >> > We spend a lot of effort trying to keep the 6 images we have in
> nodepool
> >> > working
> >> > today, I can't imagine how much work it would be to start adding more
> >> > images per
> >> > project.
> >> >
> >> > Personally, I'd like to audit things again once we roll out zuulv3, I
> am
> >> > sure
> >> > there are some tweaks we could make to help speed up things.
> >>
> >> I don't understand, why would you add images per project? We have all
> >> the images there.. What I'm talking about is to leverage what we'll
> >> have soon (registry) to lower time of gates/DIB infra requirements
> >> (DIB would hardly need to refresh images...)
> >>
> >> >
> >> > 
> __
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> >> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> 
> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: openstack-dev

Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-26 Thread Attila Fazekas
decompressing those registry tar.gz takes ~0.5 min on 2.2 GHz CPU.

Fully pulling all container takes something like ~4.5 min (from localhost,
one leaf request at a time),
but on the gate vm  we usually have 4 core,
so it is possible to go bellow 2 min with better pulling strategy,
unless we hit some disk limit.


On Sat, Sep 23, 2017 at 5:12 AM, Michał Jastrzębski 
wrote:

> On 22 September 2017 at 17:21, Paul Belanger 
> wrote:
> > On Fri, Sep 22, 2017 at 02:31:20PM +, Jeremy Stanley wrote:
> >> On 2017-09-22 15:04:43 +0200 (+0200), Attila Fazekas wrote:
> >> > "if DevStack gets custom images prepped to make its jobs
> >> > run faster, won't Triple-O, Kolla, et cetera want the same and where
> >> > do we draw that line?). "
> >> >
> >> > IMHO we can try to have only one big image per distribution,
> >> > where the packages are the union of the packages requested by all
> team,
> >> > minus the packages blacklisted by any team.
> >> [...]
> >>
> >> Until you realize that some projects want packages from UCA, from
> >> RDO, from EPEL, from third-party package repositories. Version
> >> conflicts mean they'll still spend time uninstalling the versions
> >> they don't want and downloading/installing the ones they do so we
> >> have to optimize for one particular set and make the rest
> >> second-class citizens in that scenario.
> >>
> >> Also, preinstalling packages means we _don't_ test that projects
> >> actually properly declare their system-level dependencies any
> >> longer. I don't know if anyone's concerned about that currently, but
> >> it used to be the case that we'd regularly add/break the package
> >> dependency declarations in DevStack because of running on images
> >> where the things it expected were preinstalled.
> >> --
> >> Jeremy Stanley
> >
> > +1
> >
> > We spend a lot of effort trying to keep the 6 images we have in nodepool
> working
> > today, I can't imagine how much work it would be to start adding more
> images per
> > project.
> >
> > Personally, I'd like to audit things again once we roll out zuulv3, I am
> sure
> > there are some tweaks we could make to help speed up things.
>
> I don't understand, why would you add images per project? We have all
> the images there.. What I'm talking about is to leverage what we'll
> have soon (registry) to lower time of gates/DIB infra requirements
> (DIB would hardly need to refresh images...)
>
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [devstack] pike time growth in August

2017-09-22 Thread Attila Fazekas
The main offenders reported by devstack does not seams to explain the
growth visible on OpenstackHealth [1] .
The logs also stated to disappear which does not makes easy to figure out.


Which code/infra changes can be related ?


http://status.openstack.org/openstack-health/#/test/devstack?resolutionKey=day&duration=P6M
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-22 Thread Attila Fazekas
"if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?). "

IMHO we can try to have only one big image per distribution,
where the packages are the union of the packages requested by all team,
minus the packages blacklisted by any team.

You need to provide a bug link(s) (distribution/upstream bug) for
blacklisting
a package.

Very unlikely we will run out from the disk space juts because of the too
many packages,
usually if a package causes harm to anything it is a distro/upstream bug
which expected
to be solved within 1..2 cycle in the worst case scenario.

If the above thing proven to be not working, we need to draw the line based
on the
expected usage frequency.




On Wed, Sep 20, 2017 at 3:46 PM, Jeremy Stanley  wrote:

> On 2017-09-20 15:17:28 +0200 (+0200), Attila Fazekas wrote:
> [...]
> > The image building was the good old working solution and unless
> > the image build become a super expensive thing, this is still the
> > best option.
> [...]
>
> It became a super expensive thing, and that's the main reason we
> stopped doing it. Now that Nodepool has grown support for
> distributed/parallel image building and uploading, the cost model
> may have changed a bit in that regard so I agree it doesn't hurt to
> revisit that decision. Nevertheless it will take a fair amount of
> convincing that the savings balances out the costs (not just in
> resource consumption but also administrative overhead and community
> impact... if DevStack gets custom images prepped to make its jobs
> run faster, won't Triple-O, Kolla, et cetera want the same and where
> do we draw that line?).
> --
> Jeremy Stanley
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-20 Thread Attila Fazekas
On Wed, Sep 20, 2017 at 3:11 AM, Ian Wienand  wrote:

> On 09/20/2017 09:30 AM, David Moreau Simard wrote:
>
>> At what point does it become beneficial to build more than one image per
>> OS
>> that is more aggressively tuned/optimized for a particular purpose ?
>>
>
> ... and we can put -dsvm- in the jobs names to indicate it should run
> on these nodes :)
>
> Older hands than myself will remember even more issues, but the
> "thicker" the base-image has been has traditionally just lead to a lot
> more corners for corner-cases can hide in.  We saw this all the time
> with "snapshot" images where we'd be based on upstream images that
> would change ever so slightly and break things, leading to
> diskimage-builder and the -minimal build approach.
>
> That said, in a zuulv3 world where we are not caching all git and have
> considerably smaller images, a nodepool that has a scheduler that
> accounts for flavor sizes and could conceivably understand similar for
> images, and where we're building with discrete elements that could
> "bolt-on" things like a list-of-packages install sanely to daily
> builds ... it's not impossible to imagine.
>
> -i


The problem is these package install steps are not really I/O bottle-necked
in most cases,
even with a regular DSL speed you can  frequently see
the decompress and the post config steps takes more time.

The site-local cache/mirror has visible benefit, but does not eliminates
the issues.

The main enemy is the single threaded CPU intensive operation in most
install/config related script,
the 2th most common issue is serially requesting high latency steps, which
does not reaches neither
the CPU or I/O possibilities at the end.

The fat images are generally cheaper even if your cloud has only 1Gb
Ethernet for image transfer.
You gain more by baking the packages into the image than the 1GbE can steal
from you, because
you also save time what would be loosen on CPU intensive operations or from
random disk access.

It is safe to add all distro packages used  by devstack to the cloud image.

Historically we had issues with some base image packages which presence
changed the
behavior of some component ,for example firewalld vs. libvirt (likely an
already solved issue),
these packages got explicitly removed by devstack in case of necessary.
Those packages not requested by devstack !

Fedora/Centos also has/had issues with overlapping with pypi packages on
main filesystem,
(too long story, pointing fingers ..) , generally not a good idea to add
packages from pypi to
an image which content might be overridden by the distro's package manager.

The distribution package install time delays the gate response,
when the slowest ruining job delayed by this, than the whole response is
delayed.

It Is an user facing latency issue, which should be solved even if the cost
would be higher.

The image building was the good old working solution and unless the image
build
become a super expensive thing, this is still the best option.

site-local mirror also expected to help making the image build step(s)
faster and safer.

The other option is the ready scripts.


>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-19 Thread Attila Fazekas
The gate-tempest-dsvm-neutron-full-ubuntu-xenial job is 20..30 min slower
than it supposed to be/used to be.

The extra time has multiple reasons and it is not because we test more :( .
Usually we are just less smart than before.

Huge time increment is visible in devstack as well.
devstack is advertised as:

Running devstack ... this takes 10 - 15 minutes (logs in
logs/devstacklog.txt.gz)

The actual time is 20 - 25 min according to openstack health:
http://status.openstack.org/openstack-health/#/test/devstack?resolutionKey=day&duration=P6M


Let's start with the first obvious difference compared to the old-time
jobs.:
The jobs does 120..220 sec apt-get install and packages defined
/files/debs/general are missing from the images before starting the job.

We used to bake multiple packages into the images based on the package list
provided by devstack in order to save time.

Why this does not happens anymore ?
Is anybody working on solving this issue ?
Is any blocker technical issue / challenge exists ?
Was it a design decision ?

We have similar issue with pypi usage as well.

PS.:
Generally a good idea to group these kind of package install commands to
one huge pip/apt-get/yum .. invocation, because these tools has significant
start up time and they also need to process the dependency graph at
install/update.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-26 Thread Attila Fazekas
On Mon, Jul 24, 2017 at 10:53 AM, Dmitry Tantsur 
wrote:

> These questions are to the operators, and should be asked on
> openstack-operators IMO (maybe with tuning the overall tone to be a bit
> less aggressive).
>

So the question looks like this without tuning:
 - Do you think is it good idea to spam the users with internal data which
useless for them unless they want to use it against you ?


>
> On 07/24/2017 10:23 AM, Attila Fazekas wrote:
>
>> Thanks for your answer.
>>
>> The real question is do we agree in the
>> internalULR usage what suggested in [1] is a bad security practice
>> and should not be told to operators at all.
>>
>> Also we should try to get rid off the enpointTypes in keystone v4.
>>
>
> Let's not seriously talk about keystone v4 at this point, we haven't
> gotten rid of v2 so far.
>
> Eventually it will come, but until that the story told to operators could
be we are going to remove the
interfaces (admin/internal/public) from the keystone catalog.


>
>> Do we have any good (not just making happy funny dev envs) to keep
>> endpoint types ?
>>
>
> I suspect any external SSL termination proxy. And anything else that will
> make the URLs exposed to end users look different from ones exposed to
> services.
>

The only real question is how many people would mind to use SSL/TLS also
internally
across the  services, when https is the one provided to the end-users.

It does not means the LB to backend needs to be SSL, it can still remain
HTTP regardless to the catalog entry.

If the internal both-way no-SSL communication is really important for the
deployers and we do not
want to change how the keystone API behave we might put
the service urls next to to auth urls in the  keystone_authtoken_section
kind of sections .

Many service has multiple  keystone_authtoken_section [3],
but for example `heat` does not have dedicated auth like section for all
related services.

The options available keystoneauth1 is usually directly exposed to the
service config,
so introducing a `catalog_override` option which accepts a json file can be
the simplest option.

Again, it is only required if you really want to use different protocol
internally than  for the public.
This should not be in a security best practice guide either, but if there
is a real user request for it so be it.


>
> Speaking of DNS, I also suspect there may be a micro-optimization in not
> making the services use it when talking to each other, while still
> providing names to end users.
>
>

If we are speaking about micro optimization, the above way would
open up the way to have services to choose `same host`, `same segment`
service instances
when it makes sense (usually does not).

Most of the networking libraries has built in intelligence to cache DNS
responses,
the dns lookup typically causes extra <0.1ms latency an openstack service
frequently needs
more than 5 ms to respond.

But if you really want to do some micro optimization here,
there are multiple small dns services available which can run on the
localhost and provide faster response
then a remote one and they are also able to hide dns infrastructure
downtimes.

The devstack vms are using unbound for DNS caching.

As always, you can use /etc/hosts file to bypass the DNS lookups,
however the /etc/hosts not expected to do Round Robin, but if you were
happy without DNS
 you will not notice it.

nscd might have surprising changing behavior, but available in all Linux
distros,
likely you want to decrease the negative-time-to-live times in most cases.


[3]
https://docs.openstack.org/ocata/config-reference/shared-file-systems/samples/manila.conf.html


>
>>
>>
>> On Fri, Jul 21, 2017 at 1:37 PM, Giulio Fidente > <mailto:gfide...@redhat.com>> wrote:
>>
>> Only a comment about the status in TripleO
>>
>> On 07/21/2017 12:40 PM, Attila Fazekas wrote:
>>
>> [...]
>>
>> > We should seriously consider using names instead of ip address also
>> > on the devstack gates to avoid people thinking the catalog entries
>> > meant to be used with ip address and keystone is a replacement for
>> DNS.
>>
>> this is configurable, you can have names or ips in the keystone
>> endpoints ... actually you can chose to use names or ips independently
>> for each service and even for the different endpoints
>> (Internal/Admin/Public) of the same service
>>
>> if an operator, like you suggested, configures the DNS to resolve
>> different IPs for the same name basing on where the request comes
>> from,
>> then he can use the same 'hostname' for all Public, Admin and Internal
>> endpoints which I *think* is what you're 

Re: [openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-25 Thread Attila Fazekas
>> "While it may not be a setup that everyone wants, for some deployers
having a public and internal is important."
To be more precise:
In some deployment based on in the initial directives the operators run
into an issue where the `public and internal named urls in keystone` looked
like a possible solution.

>> "Those deployers do not seem to mind the RFC1918 showing up in the
catalog, "

These deployers either just showing the public interfaces just to the
trusted administrators(/power users) or not aware of the possible risks.


>> "if they're doing point-to-point firewalling (as they should be) the
private addresses should not be considered 'secret'  "

Based on your trust in your network setup , you should also consider
removing the authentication from (for example) mariadb,
because if you really think your services are safe from the public you do
not need it either.

A random example for why this address leakage can be dangerous.:
 - Based on your private address the attacker is able to figure out (or
just better guess) where is your non-openstack openstack backend services
 - One of your backend services has weak or no authentication
 - You have an openstack service which is able to connect to arbitrary
address on user request (connection to the backend is explicitly allowed by
firewall)
 ---> possible one hit exploitation

If you think these were never was in an OpenStack together, think again and
read the CVEs and the
deployment guides and script.

We must not make the task for the crackers easier by exposing internal
information.
The addresses are frequently not dangerous alone, but in an OpenStack sized
thing it
can become very dangerous together with another `minor` issues.

Another randomly picked issue regarding to an internal url expose:
 1. have service which has some internal info which would be useful for
another service
 2. expose the internal data to all users on the API
 3. have different backend where the same filed is confidential by nature

We will run into similar issue again and again if we are not strict about
not
exposing internal info.

Whatever you think you are solving by internal urls,
can be solved in multiple other ways without leaking information, in most
cases
we also does not need to modify an OpenStack service code for solving it.

Because the internal urls are useless for the unprivileged users, they
should
not receive them at all, even tough we might have users who simply do not
care
about this, they will not die if we move to more secure solution.


>
> On Fri, Jul 21, 2017 at 1:37 PM, Giulio Fidente > <mailto:gfide...@redhat.com>> wrote:
>>
>> Only a comment about the status in TripleO
>>
>> On 07/21/2017 12:40 PM, Attila Fazekas wrote:
>>
>> [...]
>>
>> > We should seriously consider using names instead of ip address also
>> > on the devstack gates to avoid people thinking the catalog entries
>> > meant to be used with ip address and keystone is a replacement for
>> DNS.
>>
>> this is configurable, you can have names or ips in the keystone
>> endpoints ... actually you can chose to use names or ips independently
>> for each service and even for the different endpoints
>> (Internal/Admin/Public) of the same service
>>
>> if an operator, like you suggested, configures the DNS to resolve
>> different IPs for the same name basing on where the request comes
>> from,
>> then he can use the same 'hostname' for all Public, Admin and Internal
>> endpoints which I *think* is what you're suggesting
>>
>> also using names is the default when ssl is enabled
>>
>> check environments/ssl/tls-endpoints-public-dns.yaml and note how
>> EndpointMap can resolve to CLOUDNAME or IP_ADDRESS
>>
>> adding Juan on CC as he did a great work around this and can help
>> further
>> --
>> Giulio Fidente
>> GPG KEY: 08D733BA
>>
>>
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-24 Thread Attila Fazekas
Thanks for your answer.

The real question is do we agree in the
internalULR usage what suggested in [1] is a bad security practice
and should not be told to operators at all.

Also we should try to get rid off the enpointTypes in keystone v4.

Do we have any good (not just making happy funny dev envs) to keep
endpoint types ?



On Fri, Jul 21, 2017 at 1:37 PM, Giulio Fidente  wrote:

> Only a comment about the status in TripleO
>
> On 07/21/2017 12:40 PM, Attila Fazekas wrote:
>
> [...]
>
> > We should seriously consider using names instead of ip address also
> > on the devstack gates to avoid people thinking the catalog entries
> > meant to be used with ip address and keystone is a replacement for DNS.
>
> this is configurable, you can have names or ips in the keystone
> endpoints ... actually you can chose to use names or ips independently
> for each service and even for the different endpoints
> (Internal/Admin/Public) of the same service
>
> if an operator, like you suggested, configures the DNS to resolve
> different IPs for the same name basing on where the request comes from,
> then he can use the same 'hostname' for all Public, Admin and Internal
> endpoints which I *think* is what you're suggesting
>
> also using names is the default when ssl is enabled
>
> check environments/ssl/tls-endpoints-public-dns.yaml and note how
> EndpointMap can resolve to CLOUDNAME or IP_ADDRESS
>
> adding Juan on CC as he did a great work around this and can help further
> --
> Giulio Fidente
> GPG KEY: 08D733BA
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-21 Thread Attila Fazekas
Hi All,

I thought it is already well know fact the endpoint types are there ONLY
for historical reasons, today they just exists to confuse the one who tries
to deploy OpenStack,
but it is considered as a deprecated concept and it will die out sooner or
later.

The keystone v3 API already allows to not define internal or admin
endpoints at all.

I just noticed the current documentation encourages the internal endpoint
usage. [1]

Is there anybody here who thinks it is a great idea to show private address
to the end users ?
Even tough some people might consider this cwe-200, but I hope at least
looks bad to everyone.

The internal endpoints should not be used for telling internal information
to the
OpenStack services itself.  We are not putting mariadb and rabbitmq address
to the catalog as well, we have config files for that.

Ideally the end users should not even know we are using different network
paths or not,
so the internalURL entries should not be different addresses than the
public one
or they should not be defined at all.

I hope nobody really thinks the public catalog entries expected to contain
ip address instead
 of domain names by any best practice guide.

We are just using ip address in the catalog for dev/test environment,
but  in an ideal case the identity url should start with https:// ,
and it should continue with a domain name, which have several A and 
entry
and the certificate wound not be for a self signed private ip address.

Is there anybody who really thinks we are putting  http:///..
into the catalog on the gate because it is the best practice ?

You can configure your DNS server properly [2] or use the /etc/hosts file,
when for some reason you want some nodes to use different ip address
for reaching the OpenStack services.

Keystone does not needs to solve anything there,
these issues are solved decodes before OpenStack even existed.

I cannot take the single internalURL usage as a serious response for
`isolated networks` ,
because it does not scales when you want divide your network even more.
Adding internal2URL, internal3URL is not a great idea either.

We should seriously consider using names instead of ip address also
on the devstack gates to avoid people thinking the catalog entries
meant to be used with ip address and keystone is a replacement for DNS.

Using https likely a bad idea in a regular dev environment,
but I hope we agree sending unencrypted credentials over the wire
is not a recommended best practice.

Best Regards,
Attila


[1]
https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configuration-recommendations.html

[2]
https://serverfault.com/questions/332440/dns-bind-how-to-return-a-different-ip-based-on-requests-subnet
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [keystone] We still have a not identical HEAD response

2017-07-11 Thread Attila Fazekas
Hi all,

Long time ago it was discussed to make the keystone HEAD responses
 right [1] as the RFC [2][3] recommends:

"  A response to the HEAD method is identical to what an equivalent
   request made with a GET would have been, except it lacks a body. "

So, the status code needs to be identical as well !

Recently  turned out, keystone is still not correct in all cases [4].

'Get role inference rule' (GET), 'Confirm role inference rule' (HEAD)
 has the same URL pattern, but they differs in the status code (200/204)
 which is not allowed! [5]

This is the only documented case where both the HEAD and GET defined and
the HEAD has a 204 response.

Are you going to fix this [4] as it was fixed before [6] ?

Best Regards,
Attila

PS.:
 Here is the tempest change for accepting the right code [7].

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-July/039140.html
[2] https://tools.ietf.org/html/rfc7231#section-4.3.2
[3] https://tools.ietf.org/html/rfc7234#section-4.3.5
[4] https://bugs.launchpad.net/keystone/+bug/1701541
[5]
https://developer.openstack.org/api-ref/identity/v3/?expanded=confirm-role-inference-rule-detail,get-role-inference-rule-detail
[6] https://bugs.launchpad.net/keystone/+bug/1334368
[7] https://review.openstack.org/#/c/479286/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] Create subnetpool on dynamic credentials

2017-05-22 Thread Attila Fazekas
In order to twist things even more ;-),
We should consider making tempest working in environment where the users
instead of getting IPV4 floating IP, they are allowed to get a globally
route-able
IPV6 range (prefix/ subnet from a subnetpool) .

Tempest should be able to do connectivity tests against vms,
hosted in these subnets.

This should work regardless to the test account usage,
and it's likely requires some extra tweak in our devstack environments as
well.

Best Regards,
Attila

On Mon, May 22, 2017 at 3:22 PM, Andrea Frittoli 
wrote:

> Hi Hongbin,
>
> If several of your test cases require a subnet pool, I think the simplest
> solution would be creating one in the resource creation step of the tests.
> As I understand it, subnet pools can be created by regular projects (they
> do not require admin credentials).
>
> The main advantage that I can think of for having subnet pools provisioned
> as part of the credential provider code is that - in case of
> pre-provisioned credentials - the subnet pool would be created and delete
> once per test user as opposed to once per test class.
>
> That said I'm not opposed to the proposal in general, but if possible I
> would prefer to avoid adding complexity to an already complex part of the
> code.
>
> andrea
>
> On Sun, May 21, 2017 at 2:54 AM Hongbin Lu  wrote:
>
>> Hi QA team,
>>
>>
>>
>> I have a proposal to create subnetpool/subnet pair on dynamic
>> credentials: https://review.openstack.org/#/c/466440/ . We (Zun team)
>> have use cases for using subnets with subnetpools. I wanted to get some
>> early feedback on this proposal. Will this proposal be accepted? If not,
>> would appreciate alternative suggestion if any. Thanks in advance.
>>
>>
>>
>> Best regards,
>>
>> Hongbin
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest] Proposing Fanglei Zhu for Tempest core

2017-05-18 Thread Attila Fazekas
+1, Totally agree.

Best Regards,
Attila

On Tue, May 16, 2017 at 10:22 AM, Andrea Frittoli  wrote:

> Hello team,
>
> I'm very pleased to propose Fanglei Zhu (zhufl) for Tempest core.
>
> Over the past two cycle Fanglei has been steadily contributing to Tempest
> and its community.
> She's done a great deal of work in making Tempest code cleaner, easier to
> read, maintain and
> debug, fixing bugs and removing cruft. Both her code as well as her
> reviews demonstrate a
> very good understanding of Tempest internals and of the project future
> direction.
> I believe Fanglei will make an excellent addition to the team.
>
> As per the usual, if the current Tempest core team members would please
> vote +1
> or -1(veto) to the nomination when you get a chance. We'll keep the polls
> open
> for 5 days or until everyone has voted.
>
> References:
> https://review.openstack.org/#/q/owner:zhu.fanglei%2540zte.com.cn
> https://review.openstack.org/#/q/reviewer:zhufl
>
> Thank you,
>
> Andrea (andreaf)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] pingtest vs tempest

2017-04-18 Thread Attila Fazekas
On Tue, Apr 18, 2017 at 11:04 AM, Arx Cruz  wrote:

>
>
> On Tue, Apr 18, 2017 at 10:42 AM, Steven Hardy  wrote:
>
>> On Mon, Apr 17, 2017 at 12:48:32PM -0400, Justin Kilpatrick wrote:
>> > On Mon, Apr 17, 2017 at 12:28 PM, Ben Nemec 
>> wrote:
>> > > Tempest isn't really either of those things.  According to another
>> message
>> > > in this thread it takes around 15 minutes to run just the smoke tests.
>> > > That's unacceptable for a lot of our CI jobs.
>> >
>>
>
> I rather spend 15 minutes running tempest than add a regression or a new
> bug, which already happen in the past.
>
> The smoke tests might not be the best test selection anyway, you should
pick some scenario which does
for example snapshot of images and volumes. yes, these are the slow ones,
but they can run in parallel.

Very likely you do not really want to run all tempest test, but 10~20
minute time,
sounds reasonable for a sanity test.

The tempest config utility also should be extended by some parallel
capability,
and should be able to use already downloaded (part of the image) resources.

Tempest/testr/subunit worker balance is not always the best,
technically would be possible to do dynamic balancing, but it would require
a lot of work.
Let me know when it becomes the main concern, I can check what can/cannot
be done.



>
>> > Ben, is the issue merely the time it takes? Is it the affect that time
>> > taken has on hardware availability?
>>
>> It's both, but the main constraint is the infra job timeout, which is
>> about
>> 2.5hrs - if you look at our current jobs many regularly get close to (and
>> sometimes exceed this), so we just don't have the time budget available to
>> run exhasutive tests every commit.
>>
>
> We have green light from infra to increase the job timeout to 5 hours, we
> do that in our periodic full tempest job.
>

Sounds good, but I am afraid it could hurt more than helping, it could
delay other things get fixed by lot
especially if we got some extra flakiness, because of foobar.

You cannot have all possible tripleo configs on the gate anyway,
so something will pass which will require a quick fix.

IMHO the only real solution, is making the before test-run steps faster or
shorter.

Do you have any option to start the tempest running jobs in a more
developed state ?
I mean, having more things already done at the start time
(images/snapshot)
and just do a fast upgrade at the beginning of the job.

Openstack installation can be completed in a `fast` way (~minute) on
RHEL/Fedora systems
after the yum steps, also if you are able to aggregate all yum step to
single
command execution (transaction) you generally able to save a lot of time.

There is plenty of things what can be made more efficient before the test
run,
when you start considering everything evil which can be accounted for more
than 30 sec
of time, this can happen soon.

For example just executing the cpython interpreter for the openstack
commands is above 30 sec,
the work what they are doing can be done in much much faster way.

Lot of install steps actually does not depends on each other,
it allows more things to be done in parallel, we generally can have more
core than Ghz.



>
>>
>> > Should we focus on how much testing we can get into N time period?
>> > Then how do we decide an optimal N
>> > for our constraints?
>>
>> Well yeah, but that's pretty much how/why we ended up with pingtest, it's
>> simple, fast, and provides an efficient way to do smoke tests, e.g
>> creating
>> just one heat resource is enough to prove multiple OpenStack services are
>> running, as well as the DB/RPC etc etc.
>>
>> > I've been working on a full up functional test for OpenStack CI builds
>> > for a long time now, it works but takes
>> > more than 10 hours. IF you're interested in results kick through to
>> > Kibana here [0]. Let me know off list if you
>> > have any issues, the presentation of this data is all experimental
>> still.
>>
>> This kind of thing is great, and I'd support more exhaustive testing via
>> periodic jobs etc, but the reality is we need to focus on "bang for buck"
>> e.g the deepest possible coverage in the most minimal amount of time for
>> our per-commit tests - we rely on the project gates to provide a full API
>> surface test, and we need to focus on more basic things like "did the
>> service
>> start", and "is the API accessible".  Simple crud operations on a subset
>> of
>> the API's is totally fine for this IMO, whether via pingtest or some other
>> means.
>>
>>
> Right now we do have a periodic job running full tempest, with a few
> skips, and because of the lack of tempest tests in the patches, it's being
> pretty hard to keep it stable enough to have a 100% pass, and of course,
> also the installation very often fails (like in the last five days).
> For example, [1] is the latest run we have in periodic job that we get
> results from tempest, and we have 114 failures that was caused by some new
> code/change, and I have no idea which one 

Re: [openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

2017-02-10 Thread Attila Fazekas
I wonder, can we switch to CINDER_ISCSI_HELPER="lioadm"  ?

On Fri, Feb 10, 2017 at 9:17 AM, Miguel Angel Ajo Pelayo <
majop...@redhat.com> wrote:

> I believe those are traces left by the reference implementation of cinder
> setting very high debug level on tgtd. I'm not sure if that's related or
> the culprit at all (probably the culprit is a mix of things).
>
> I wonder if we could disable such verbosity on tgtd, which certainly is
> going to slow down things.
>
> On Fri, Feb 10, 2017 at 9:07 AM, Antonio Ojea  wrote:
>
>> I guess it's an infra issue, specifically related to the storage, or the
>> network that provide the storage.
>>
>> If you look at the syslog file [1] , there are a lot of this entries:
>>
>> Feb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: iscsi_task_tx_start(2024) 
>> no more dataFeb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: iscsi_task_tx_start(1996) 
>> found a task 71 131072 0 0Feb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: iscsi_data_rsp_build(1136) 
>> 131072 131072 0 26214471Feb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: __cmd_done(1281) (nil) 
>> 0x2563000 0 131072
>>
>> grep tgtd syslog.txt.gz| wc
>>   139602 1710808 15699432
>>
>> [1] http://logs.openstack.org/95/429095/2/check/gate-tempest-dsv
>> m-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz
>>
>>
>>
>> On Fri, Feb 10, 2017 at 5:59 AM, Ihar Hrachyshka 
>> wrote:
>>
>>> Hi all,
>>>
>>> I noticed lately a number of job failures in neutron gate that all
>>> result in job timeouts. I describe
>>> gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
>>> timeouts happening in other jobs too.
>>>
>>> The failure mode is all operations, ./stack.sh and each tempest test
>>> take significantly more time (like 50% to 150% more, which results in
>>> job timeout triggered). An example of what I mean can be found in [1].
>>>
>>> A good run usually takes ~20 minutes to stack up devstack; then ~40
>>> minutes to pass full suite; a bad run usually takes ~30 minutes for
>>> ./stack.sh; and then 1:20h+ until it is killed due to timeout.
>>>
>>> It affects different clouds (we see rax, internap, infracloud-vanilla,
>>> ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
>>> pypi or apt mirrors because then we would see slowdown in ./stack.sh
>>> phase only.
>>>
>>> We can't be sure that CPUs are the same, and devstack does not seem to
>>> dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
>>> if it would help anyway). Neither we have a way to learn whether
>>> slowliness could be a result of adherence to RFC1149. ;)
>>>
>>> We discussed the matter in neutron channel [2] though couldn't figure
>>> out the culprit, or where to go next. At this point we assume it's not
>>> neutron's fault, and we hope others (infra?) may have suggestions on
>>> where to look.
>>>
>>> [1] http://logs.openstack.org/95/429095/2/check/gate-tempest-dsv
>>> m-neutron-dvr-ubuntu-xenial/35aa22f/console.html#_2017-02-09
>>> _04_47_12_874550
>>> [2] http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/
>>> %23openstack-neutron.2017-02-10.log.html#t2017-02-10T04:06:01
>>>
>>> Thanks,
>>> Ihar
>>>
>>> 
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone] Do we really need two listening ports ?

2017-02-02 Thread Attila Fazekas
Today the '-admin' version almost able to fully passing on tempest:
http://logs.openstack.org/91/428091/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/9e4f5e6/logs/stackviz/#/stdin

https://review.openstack.org/#/c/428091/ (Without the port merge,  but it
could be the next step )

At the first looks does not seams to be impossible, to have a
'keystone-wsgi' entry which good
enough to pass on tempest and possible on all real user use case as well,
even tough it might not be 100% bug compatible with the old version, and
It requires some tweaks on the routes on the keystone side.

Would be nice to have a a wsgi entry in the keystone repository
which is also passing on at least test_user_update_own_password, and
advertised as `the merged` entry by
the keystone project.






On Wed, Feb 1, 2017 at 4:35 PM, Dolph Mathews 
wrote:

> On Wed, Feb 1, 2017 at 6:59 AM Thomas Goirand  wrote:
>
>> On 02/01/2017 10:54 AM, Attila Fazekas wrote:
>> > Hi all,
>> >
>> > Typically we have two keystone service listening on two separate ports
>> > 35357 and 5000.
>> >
>> > Historically one of the port had limited functionality, but today I do
>> > not see why we want
>> > to have two separate service/port from the same code base for similar
>> > purposes.
>>
>
> If you're running v2, you do need two endpoints (admin and public;
> keystone does not really have a use case for an internal endpoint). The
> specific port numbers don't particularly matter (other than 35357 is
> conveniently registered with IANA) and should not be hardcoded or assumed
> by clients (and are not, AFAIK). In the case of v2, it is effectively a
> different service running on each port; there's at least one unfortunately
> subtle difference in behavior between admin and public.
>
> If you're *only* running v3, you can run a single process and put the same
> endpoint URL in the service catalog, for both the admin and public
> endpoint. Arbitrary ports don't matter (so just use 443).
>
>
>> >
>> > Effective we use double amount of memory than it is really required,
>> > because both port is served by completely different worker instances,
>> > typically from the same physical server.
>> >
>> > I wonder, would it be difficult to use only a single port or at least
>> > the same pool of workers for all keystone(identity, auth..) purposes?
>> >
>> > Best Regards,
>> > Attila
>>
>> This has been discussed and agreed a long time ago, but nobody did the
>> work.
>
>
> A lot of work has gone into freeing keystone from having to run on two
> ports (Adam Young, in particular, deserves a ton of credit here). You just
> need to consume that operational flexibility.
>
>
>> Please do get rid of the 2nd port. And when you're at it, also get
>> rid of the admin and internal endpoint in the service catalog.
>
>
> v3 has never presumed anything other than a public endpoint. Admin and
> internal are strictly optional and only exist for backwards compatibility
> with v2 (so, just use v3).
>
>
>>
>
>
>> Cheers,
>>
>> Thomas Goirand (zigo)
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
> -Dolph
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [keystone] Do we really need two listening ports ?

2017-02-01 Thread Attila Fazekas
Hi all,

Typically we have two keystone service listening on two separate ports
35357 and 5000.

Historically one of the port had limited functionality, but today I do not
see why we want
to have two separate service/port from the same code base for similar
purposes.

Effective we use double amount of memory than it is really required,
because both port is served by completely different worker instances,
typically from the same physical server.

I wonder, would it be difficult to use only a single port or at least the
same pool of workers for all keystone(identity, auth..) purposes?

Best Regards,
Attila
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [QA][all] Propose to remove negative tests from Tempest

2016-03-19 Thread Attila Fazekas
Most negative test supposed to be very simple and we should not spend
too much time in them.

The right question:
Are we able to run 100 negative test/sec ?
Where is the time spent ?

If we are able to solve the main issue,
probably we do not need to worry about how many negative test we have.

Not all negative test a simple dumb thing.
If we would do smarter test selection, likely
we would need to keep only the slower ones.
So at the and almost nothing gained.


BTW, we can increase the number of tempest workers.

Best Regards,
Attila

- Original Message -
> From: "Ken'ichi Ohmichi" 
> To: "OpenStack Development Mailing List" 
> Sent: Thursday, March 17, 2016 2:20:11 AM
> Subject: [openstack-dev] [QA][all] Propose to remove negative tests from  
> Tempest
> 
> Hi
> 
> I have one proposal[1] related to negative tests in Tempest, and
> hoping opinions before doing that.
> 
> Now Tempest contains negative tests and sometimes patches are being
> posted for adding more negative tests, but I'd like to propose
> removing them from Tempest instead.
> 
> Negative tests verify surfaces of REST APIs for each component without
> any integrations between components. That doesn't seem integration
> tests which are scope of Tempest.
> In addition, we need to spend the test operating time on different
> component's gate if adding negative tests into Tempest. For example,
> we are operating negative tests of Keystone and more
> components on the gate of Nova. That is meaningless, so we need to
> avoid more negative tests into Tempest now.
> 
> If wanting to add negative tests, it is a nice option to implement
> these tests on each component repo with Tempest plugin interface. We
> can avoid operating negative tests on different component gates and
> each component team can decide what negative tests are valuable on the
> gate.
> 
> In long term, all negative tests will be migrated into each component
> repo with Tempest plugin interface. We will be able to operate
> valuable negative tests only on each gate.
> 
> Any thoughts?
> 
> Thanks
> Ken Ohmichi
> 
> ---
> [1]: https://review.openstack.org/#/c/293197/
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cross-project] [all] Quotas -- service vs. library

2016-03-16 Thread Attila Fazekas

NO : For any kind of extra quota service.

In other places I saw other reasons for a quota service or similar,
 the actual cost of this approach is higher than most people would think so NO.


Maybe Library,
But I do not want to see for example the bad pattern used in nova to spread 
everywhere.

The quota usage handling MUST happen in the same DB transaction as the 
resource record (volume, server..) create/update/delete  .

There is no need for.:
- reservation-expirer services or periodic tasks ..
- there is no need for quota usage correcter shell scripts or whatever
- multiple commits


We have a transaction capable DB, to help us,
not using it would be lame.


[2] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061338.html

- Original Message -
> From: "Nikhil Komawar" 
> To: "OpenStack Development Mailing List" 
> Sent: Wednesday, March 16, 2016 7:25:26 AM
> Subject: [openstack-dev] [cross-project] [all] Quotas -- service vs. library
> 
> Hello everyone,
> 
> tl;dr;
> I'm writing to request some feedback on whether the cross project Quotas
> work should move ahead as a service or a library or going to a far
> extent I'd ask should this even be in a common repository, would
> projects prefer to implement everything from scratch in-tree? Should we
> limit it to a guideline spec?
> 
> But before I ask anymore, I want to specifically thank Doug Hellmann,
> Joshua Harlow, Davanum Srinivas, Sean Dague, Sean McGinnis and  Andrew
> Laski for the early feedback that has helped provide some good shape to
> the already discussions.
> 
> Some more context on what the happenings:
> We've this in progress spec [1] up for providing context and platform
> for such discussions. I will rephrase it to say that we plan to
> introduce a new 'entity' in the Openstack realm that may be a library or
> a service. Both concepts have trade-offs and the WG wanted to get more
> ideas around such trade-offs from the larger community.
> 
> Service:
> This would entail creating a new project and will introduce managing
> tables for quotas for all the projects that will use this service. For
> example if Nova, Glance, and Cinder decide to use it, this 'entity' will
> be responsible for handling the enforcement, management and DB upgrades
> of the quotas logic for all resources for all three projects. This means
> less pain for projects during the implementation and maintenance phase,
> holistic view of the cloud and almost a guarantee of best practices
> followed (no clutter or guessing around what different projects are
> doing). However, it results into a big dependency; all projects rely on
> this one service for right enforcement, avoiding races (if do not
> incline on implementing some of that in-tree) and DB
> migrations/upgrades. It will be at the core of the cloud and prone to
> attack vectors, bugs and margin of error.
> 
> Library:
> A library could be thought of in two different ways:
> 1) Something that does not deal with backed DB models, provides a
> generic enforcement and management engine. To think ahead a little bit
> it may be a ABC or even a few standard implementation vectors that can
> be imported into a project space. The project will have it's own API for
> quotas and the drivers will enforce different types of logic; per se
> flat quota driver or hierarchical quota driver with custom/project
> specific logic in project tree. Project maintains it's own DB and
> upgrades thereof.
> 2) A library that has models for DB tables that the project can import
> from. Thus the individual projects will have a handy outline of what the
> tables should look like, implicitly considering the right table values,
> arguments, etc. Project has it's own API and implements drivers in-tree
> by importing this semi-defined structure. Project maintains it's own
> upgrades but will be somewhat influenced by the common repo.
> 
> Library would keep things simple for the common repository and sourcing
> of code can be done asynchronously as per project plans and priorities
> without having a strong dependency. On the other hand, there is a
> likelihood of re-implementing similar patterns in different projects
> with individual projects taking responsibility to keep things up to
> date. Attack vectors, bugs and margin of error are project responsibilities
> 
> Third option is to avoid all of this and simply give guidelines, best
> practices, right packages to each projects to implement quotas in-house.
> Somewhat undesirable at this point, I'd say. But we're all ears!
> 
> Thank you for reading and I anticipate more feedback.
> 
> [1] https://review.openstack.org/#/c/284454/
> 
> --
> 
> Thanks,
> Nikhil
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


[openstack-dev] [heat] Why heat needs a keystone user per resource ?

2015-07-09 Thread Attila Fazekas
Hi,

Heat creates a keystone user for every resource which uses a CFN_SIGNAL.
Heat also stores their AWS credentials in the heat.resource_data table.

These credentials/users are restricted to operate only on limited (1?) resource,
with very limited operations (3?). Normally these resource users are member of 
only
a special heat domain and tenant.

Looks like heat has everything to have CFN/hashmac working without touching 
the keystone service.

Why heat needs to store anything in keystone regarding to the CFN_SIGNALS ?
Is these credentials supposed to be used anywhere else than on heat?

Best Regards,
Attila


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-12 Thread Attila Fazekas




- Original Message -
> From: "Robert Collins" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Tuesday, May 12, 2015 3:06:21 AM
> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> 
> On 12 May 2015 at 10:12, Attila Fazekas  wrote:
> >
> >
> >
> >
> 
> >> If you can illustrate a test script that demonstrates the actual failing
> >> of OS threads that does not occur greenlets here, that would make it
> >> immediately apparent what it is you're getting at here.
> >>
> >
> > http://www.fpaste.org/220824/raw/
> >
> > I just put together hello word C example and a hello word threading
> > example,
> > and replaced the print with sleep(3).
> >
> > When I use the sleep(3) from python, the 5 thread program runs in ~3
> > second,
> > when I use the sleep(3) from native code, it runs ~15 sec.
> >
> > So yes, it is very likely a GIL lock wait related issue,
> > when the native code is not assisting.
> 
> Your test code isn't releasing the GIL here, and I'd expect C DB
> drivers to be releasing the GIL: you've illustrated how a C extension
> can hold the GIL, but not whether thats happening.

Yes.

And you are right the C driver wrapper releases the GIL at every important 
mysql C driver call. (Py_BEGIN_ALLOW_THREADS)

Good to know :)


> 
> > Do you need a DB example, by using the mysql C driver,
> > and waiting in an actual I/O primitive ?
> 
> waiting in an I/O primitive is fine as long as the GIL has been released.

http://www.fpaste.org/221101/

Actually the eventlet version of the play/test code
is producing the mentioned error:
'Lock wait timeout exceeded; try restarting transaction'.

I have not seen the above issue with the regular python threads.

The driver does not cooperates with the event hub :(


PS.:
The 'Deadlock found when trying to get lock; try restarting transaction'
would be different situation, and it is not related to the eventlet issue.

> 
> -Rob
> 
> 
> --
> Robert Collins 
> Distinguished Technologist
> HP Converged Cloud
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas




- Original Message -
> From: "Mike Bayer" 
> To: openstack-dev@lists.openstack.org
> Sent: Monday, May 11, 2015 9:07:13 PM
> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> 
> 
> 
> On 5/11/15 2:02 PM, Attila Fazekas wrote:
> >
> > Not just with local database connections,
> > the 10G network itself also fast. Is is possible you spend more time even
> > on
> > the kernel side tcp/ip stack (and the context switch..) (Not in physical
> > I/O wait)
> > than in the actual work on the DB side. (Check netperf TCP_RR)
> >
> > The scary part of a blocking I/O call is when you have two
> > python thread (or green thread) and one of them is holding a DB lock the
> > other
> > is waiting for the same lock in a native blocking I/O syscall.
> that's a database deadlock and whether you use eventlet, threads,
> asycnio or even just two transactions in a single-threaded script, that
> can happen regardless.  if your two eventlet "non blocking" greenlets
> are waiting forever for a deadlock,  you're just as deadlocked as if you
> have OS threads.
> 
> 
> > If you do a read(2) in native code, the python itself might not be able to
> > preempt it
> > Your transaction might be finished with `DB Lock wait timeout`,
> > with 30 sec of doing nothing, instead of scheduling to the another python
> > thread,
> > which would be able to release the lock.
> 
> 
> Here's the "you're losing me" part because Python threads are OS
> threads, so Python isn't directly involved trying to "preempt" anything,
> unless you're referring to the effect of the GIL locking up the
> program.   However, it's pretty easy to make two threads in Python hit a
> database and do a deadlock against each other, and the rest of the
> program's threads continue to run just fine; in a DB deadlock situation
> you are blocked on IO and IO releases the GIL.
> 
> If you can illustrate a test script that demonstrates the actual failing
> of OS threads that does not occur greenlets here, that would make it
> immediately apparent what it is you're getting at here.
>

http://www.fpaste.org/220824/raw/

I just put together hello word C example and a hello word threading example,
and replaced the print with sleep(3).

When I use the sleep(3) from python, the 5 thread program runs in ~3 second,
when I use the sleep(3) from native code, it runs ~15 sec.

So yes, it is very likely a GIL lock wait related issue,
when the native code is not assisting.
 
Do you need a DB example, by using the mysql C driver,
and waiting in an actual I/O primitive ?

The greenthreads will not help here.

If I would import the python time.sleep from the C code it might help.

Using pure python driver helps to avoid this kind of issues,
but in this case you have the `cPython is slow` issue.

> 
> >
> >> [1]
> >> http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas




- Original Message -
> From: "Mike Bayer" 
> To: openstack-dev@lists.openstack.org
> Sent: Monday, May 11, 2015 4:44:58 PM
> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> 
> 
> 
> On 5/11/15 9:58 AM, Attila Fazekas wrote:
> >
> >
> >
> > - Original Message -
> >> From: "John Garbutt" 
> >> To: "OpenStack Development Mailing List (not for usage questions)"
> >> 
> >> Cc: "Dan Smith" 
> >> Sent: Saturday, May 9, 2015 12:45:26 PM
> >> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> >>
> >> On 30 April 2015 at 18:54, Mike Bayer  wrote:
> >>> On 4/30/15 11:16 AM, Dan Smith wrote:
> >>>>> There is an open discussion to replace mysql-python with PyMySQL, but
> >>>>> PyMySQL has worse performance:
> >>>>>
> >>>>> https://wiki.openstack.org/wiki/PyMySQL_evaluation
> >>>> My major concern with not moving to something different (i.e. not based
> >>>> on the C library) is the threading problem. Especially as we move in the
> >>>> direction of cellsv2 in nova, not blocking the process while waiting for
> >>>> a reply from mysql is going to be critical. Further, I think that we're
> >>>> likely to get back a lot of performance from a supports-eventlet
> >>>> database connection because of the parallelism that conductor currently
> >>>> can only provide in exchange for the footprint of forking into lots of
> >>>> workers.
> >>>>
> >>>> If we're going to move, shouldn't we be looking at something that
> >>>> supports our threading model?
> >>> yes, but at the same time, we should change our threading model at the
> >>> level
> >>> of where APIs are accessed to refer to a database, at the very least
> >>> using
> >>> a
> >>> threadpool behind eventlet.   CRUD-oriented database access is faster
> >>> using
> >>> traditional threads, even in Python, than using an eventlet-like system
> >>> or
> >>> using explicit async.  The tests at
> >>> http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
> >>> show this.With traditional threads, we can stay on the C-based MySQL
> >>> APIs and take full advantage of their speed.
> >> Sorry to go back in time, I wanted to go back to an important point.
> >>
> >> It seems we have three possible approaches:
> >> * C lib and eventlet, blocks whole process
> >> * pure python lib, and eventlet, eventlet does its thing
> >> * go for a C lib and dispatch calls via thread pool
> > * go with pure C protocol lib, which explicitly using `python patch-able`
> >I/O function (Maybe others like.: threading, mutex, sleep ..)
> >
> > * go with pure C protocol lib and the python part explicitly call
> >for `decode` and `encode`, the C part just do CPU intensive operations,
> >and it never calls for I/O primitives .
> >
> >> We have a few problems:
> >> * performance sucks, we have to fork lots of nova-conductors and api nodes
> >> * need to support python2.7 and 3.4, but its not currently possible
> >> with the lib we use?
> >> * want to pick a lib that we can fix when there are issues, and work to
> >> improve
> >>
> >> It sounds like:
> >> * currently do the first one, it sucks, forking nova-conductor helps
> >> * seems we are thinking the second one might work, we sure get py3.4 +
> >> py2.7 support
> >> * the last will mean more work, but its likely to be more performant
> >> * worried we are picking a unsupported lib with little future
> >>
> >> I am leaning towards us moving to making DB calls with a thread pool
> >> and some fast C based library, so we get the 'best' performance.
> >>
> >> Is that a crazy thing to be thinking? What am I missing here?
> > Using the python socket from C code:
> > https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100
> >
> > Also possible to implement a mysql driver just as a protocol parser,
> > and you are free to use you favorite event based I/O strategy (direct epoll
> > usage)
> > even without eventlet (or similar).
> >
> > The issue with ultramysql, it does not implements
> > the `standard` python DB API, so you would need to add an extra w

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas




- Original Message -
> From: "John Garbutt" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Cc: "Dan Smith" 
> Sent: Saturday, May 9, 2015 12:45:26 PM
> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> 
> On 30 April 2015 at 18:54, Mike Bayer  wrote:
> > On 4/30/15 11:16 AM, Dan Smith wrote:
> >>> There is an open discussion to replace mysql-python with PyMySQL, but
> >>> PyMySQL has worse performance:
> >>>
> >>> https://wiki.openstack.org/wiki/PyMySQL_evaluation
> >>
> >> My major concern with not moving to something different (i.e. not based
> >> on the C library) is the threading problem. Especially as we move in the
> >> direction of cellsv2 in nova, not blocking the process while waiting for
> >> a reply from mysql is going to be critical. Further, I think that we're
> >> likely to get back a lot of performance from a supports-eventlet
> >> database connection because of the parallelism that conductor currently
> >> can only provide in exchange for the footprint of forking into lots of
> >> workers.
> >>
> >> If we're going to move, shouldn't we be looking at something that
> >> supports our threading model?
> >
> > yes, but at the same time, we should change our threading model at the
> > level
> > of where APIs are accessed to refer to a database, at the very least using
> > a
> > threadpool behind eventlet.   CRUD-oriented database access is faster using
> > traditional threads, even in Python, than using an eventlet-like system or
> > using explicit async.  The tests at
> > http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
> > show this.With traditional threads, we can stay on the C-based MySQL
> > APIs and take full advantage of their speed.
> 
> Sorry to go back in time, I wanted to go back to an important point.
> 
> It seems we have three possible approaches:
> * C lib and eventlet, blocks whole process
> * pure python lib, and eventlet, eventlet does its thing
> * go for a C lib and dispatch calls via thread pool

* go with pure C protocol lib, which explicitly using `python patch-able` 
  I/O function (Maybe others like.: threading, mutex, sleep ..)

* go with pure C protocol lib and the python part explicitly call
  for `decode` and `encode`, the C part just do CPU intensive operations,
  and it never calls for I/O primitives .   

> We have a few problems:
> * performance sucks, we have to fork lots of nova-conductors and api nodes
> * need to support python2.7 and 3.4, but its not currently possible
> with the lib we use?
> * want to pick a lib that we can fix when there are issues, and work to
> improve
> 
> It sounds like:
> * currently do the first one, it sucks, forking nova-conductor helps
> * seems we are thinking the second one might work, we sure get py3.4 +
> py2.7 support
> * the last will mean more work, but its likely to be more performant
> * worried we are picking a unsupported lib with little future
> 
> I am leaning towards us moving to making DB calls with a thread pool
> and some fast C based library, so we get the 'best' performance.
> 
> Is that a crazy thing to be thinking? What am I missing here?

Using the python socket from C code:
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

Also possible to implement a mysql driver just as a protocol parser,
and you are free to use you favorite event based I/O strategy (direct epoll 
usage)
even without eventlet (or similar).

The issue with ultramysql, it does not implements
the `standard` python DB API, so you would need to add an extra wrapper to 
SQLAlchemy.

> 
> Thanks,
> John
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Service group foundations and features

2015-05-11 Thread Attila Fazekas




- Original Message -
> From: "John Garbutt" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Saturday, May 9, 2015 1:18:48 PM
> Subject: Re: [openstack-dev] [nova] Service group foundations and features
> 
> On 7 May 2015 at 22:52, Joshua Harlow  wrote:
> > Hi all,
> >
> > In seeing the following:
> >
> > - https://review.openstack.org/#/c/169836/
> > - https://review.openstack.org/#/c/163274/
> > - https://review.openstack.org/#/c/138607/
> >
> > Vilobh and I are starting to come to the conclusion that the service group
> > layers in nova really need to be cleaned up (without adding more features
> > that only work in one driver), or removed or other... Spec[0] has
> > interesting findings on this:
> >
> > A summary/highlights:
> >
> > * The zookeeper service driver in nova has probably been broken for 1 or
> > more releases, due to eventlet attributes that are gone that it via
> > evzookeeper[1] library was using. Evzookeeper only works for eventlet <
> > 0.17.1. Please refer to [0] for details.
> > * The memcache service driver really only uses memcache for a tiny piece of
> > the service liveness information (and does a database service table scan to
> > get the list of services). Please refer to [0] for details.
> > * Nova-manage service disable (CLI admin api) does interact with the
> > service
> > group layer for the 'is_up'[3] API (but it also does a database service
> > table scan[4] to get the list of services, so this is inconsistent with the
> > service group driver API 'get_all'[2] view on what is enabled/disabled).
> > Please refer to [9][10] for nova manage service enable disable for details.
> >   * Nova service delete (REST api) seems to follow a similar broken pattern
> > (it also avoids calling into the service group layer to delete a service,
> > which means it only works with the database layer[5], and therefore is
> > inconsistent with the service group 'get_all'[2] API).
> >
> > ^^ Doing the above makes both disable/delete agnostic about other backends
> > available that may/might manage service group data for example zookeeper,
> > memcache, redis etc... Please refer [6][7] for details. Ideally the API
> > should follow the model used in [8] so that the extension, admin interface
> > as well as the API interface use the same servicegroup interface which
> > should be *fully* responsible for managing services. Doing so we will have
> > a
> > consistent view of services data, liveness, disabled/enabled and so-on...
> >
> > So with no disrespect to the authors of 169836 and 163274 (or anyone else
> > involved), I am wondering if we can put a request in to figure out how to
> > get the foundation of the service group concepts stabilized (or other...)
> > before adding more features (that only work with the DB layer).
> >
> > What is the path to request some kind of larger coordination effort by the
> > nova folks to fix the service group layers (and the concepts that are not
> > disjoint/don't work across them) before continuing to add features on-top
> > of
> > a 'shakey' foundation?
> >
> > If I could propose something it would probably work out like the following:
> >
> > Step 0: Figure out if the service group API + layer(s) should be
> > maintained/tweaked at all (nova-core decides?)
> >
> > If maintain it:
> >
> >  - Have an agreement that nova service extension, admin
> > interface(nova-manage) and API go through a common path for
> > update/delete/read.
> >   * This common path should likely be the servicegroup API so as to have a
> > consistent view of data and that also helps nova to add different
> > data-stores (keeping the services data in a DB and getting numerous updates
> > about liveliness every few seconds of N number of compute where N is pretty
> > high can be detrimental to Nova's performance)
> >  - At the same time allow 163274 to be worked on (since it fixes a
> >  edge-case
> > that was asked about in the initial addition of the delete API in its
> > initial code commit @ https://review.openstack.org/#/c/39998/)
> >  - Delay 169836 until the above two/three are fixed (and stabilized); it's
> > down concept (and all other usages of services that are hitting a database
> > mentioned above) will need to go through the same service group foundation
> > that is currently being skipped.
> >
> > Else:
> >   - Discard 138607 and start removing the service group code (and just use
> > the DB for all the things).
> >   - Allow 163274 and 138607 (since those would be additions on-top of the
> >   DB
> > layer that will be preserved).
> >
> > Thoughts?
> 
> I wonder about this approach:
> 
> * I think we need to go back and document what we want from the
> "service group" concept.
> * Then we look at the best approach to implement that concept.
> * Then look at the best way to get to a happy place from where we are now,
> ** Noting we will need "live" upgrade for (at least) the most widely
> used drivers
> 
> Does that make any s

Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in DevStack [was: Status of the nova-network to Neutron migration work]

2015-04-28 Thread Attila Fazekas
You can tcpdump the ovs ports as usual.

Please keep in mind ovs does not have `single contention` port.
OVS does MAC learning by default and you may not see `learned` uni-cast traffic
on a random trunk port. You MAY see BUM traffic, but many of them also can be 
canceled
by neutron-ml2-ovs, AFAIK it is not enabled by default. 

OVS behaves like a real switch, real switches also does not have 5 Tbit/sec 
ports for monitoring :(
If you need to tcpudump on a port which is not visible in the userspace 
(internal patch links) by default
you should do port mirroring. [1]

Usually you do not need to dump the traffic,
What you should do as basic trouble shooting is checking the tags on the ports,
(`ovsdb-client dump` show everything, excluding the oflow rules)

Hopefully the root cause is fixed, but you should check is the port not trunk
when it needs to be tagged.

Neutron also dedicates the vlan-4095 on br-int as dead vlan,
If you have a port in this, it can mean a miss configuration
or a message lost in the void or something Exceptional happened.

If you really need to redirect exceptional `out of band` traffic to a special 
port
or to an external service (controller) it would be more complex thing
then just doing the mirroring.

[1] http://www.yet.org/2014/09/openvswitch-troubleshooting/

PS.:
OVS does not generates ICMP packets in many cases when a real `L3` switch would 
do,
thats why the MTU size differences causes issues and requires extra care at 
configuration,
when ovs used with tunneling. (OVS also can be used with vlans)

Probably this caused the most headache for many user.

PS2.:
Somewhere I read the ovs had the PMTUD support, but it was removed because
it was not conforming to the standard.
It just does silent packet drop :(
 


- Original Message -
> From: "Jeremy Stanley" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Tuesday, April 21, 2015 5:00:24 PM
> Subject: Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in 
> DevStack [was: Status of the nova-network
> to Neutron migration work]
> 
> On 2015-04-21 03:19:04 -0400 (-0400), Attila Fazekas wrote:
> [...]
> > IMHO the OVS is less complex than netfilter (iptables, *tables),
> > if someone able to deal with reading the netfilter rules he should
> > be able to deal with OVS as well.
> 
> In a simple DevStack setup, you really have that many
> iptables/ebtables rules?
> 
> > OVS has debugging tools for internal operations, I guess you are
> > looking for something else. I do not have any `good debugging`
> > tool for net-filter either.
> [...]
> 
> Complexity of connecting tcpdump to the bridge was the primary
> concern here (convenient means of debugging network problems when
> you're using OVS, less tools for debugging OVS itself though it can
> come down to that at times as well). Also ebtables can easily be
> configured to log every frame it blocks, forwards or rewrites
> (presumably so can the OVS flow handler? but how?).
> --
> Jeremy Stanley
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack][nova] Does anyone use Zookeeper, Memcache Nova ServiceGroup Driver ?

2015-04-28 Thread Attila Fazekas
How many compute nodes do you want to manage ?

If it less than ~1000, you do not need to care.
If you have more, just use SSD with good write IOPS value.

Mysql actually can be fast with enough memory and good SSD. 
Even faster than [1].

zk as technology is good, the current nova driver is not. Not recommended.
The current mc driver does lot of tcp ping-pong for every node, 
it can be slower than the SQL.

IMHO At high compute node count you would face with scheduler latency issues
sooner than sg driver issues. (It is not Log(N) :()

The sg drivers was introduced to eliminate 100 Update/sec at 1000 Host,
but it caused all service is being fetched from the DB even if at the given code
part you just need to alive services. 


[1] 
http://www.percona.com/blog/2013/10/18/innodb-scalability-issues-tables-without-primary-keys/

- Original Message -
> From: "Vilobh Meshram" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> , "OpenStack
> Mailing List (not for usage questions)" 
> Sent: Tuesday, April 28, 2015 1:21:58 AM
> Subject: [openstack-dev] [openstack][nova] Does anyone use Zookeeper, 
> Memcache Nova ServiceGroup Driver ?
> 
> Hi,
> 
> Does anyone use Zookeeper[1], Memcache[2] Nova ServiceGroup Driver ?
> 
> If yes how has been your experience with it. It was noticed that most of the
> deployment try to use the default Database driver[3]. Any experiences with
> Zookeeper, Memcache driver will be helpful.
> 
> -Vilobh
> 
> [1]
> https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/zk.py
> [2]
> https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/mc.py
> [3]
> https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in DevStack [was: Status of the nova-network to Neutron migration work]

2015-04-21 Thread Attila Fazekas




- Original Message -
> From: "Jeremy Stanley" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, April 17, 2015 9:35:07 PM
> Subject: Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in 
> DevStack [was: Status of the nova-network
> to Neutron migration work]
> 
> On 2015-04-17 11:49:23 -0700 (-0700), Kevin Benton wrote:
> > I definitely understand that. But what is the major complaint from
> > operators? I understood that quote to imply it was around
> > Neutron's model of self-service networking.
> 
> My takeaway from Tom's message was that there was a concern about
> "complexity" in all forms (not just of the API but also due to the
> lack of maturity, documentation and debuggability of the underlying
> technology), and that the self-service networking model was simply
> one example of that. Perhaps I was reading between the lines too
> much because of prior threads on both the operators and developers
> mailing lists. Anyway, I'm sure Tom will clarify what he meant if
> necessary.
> 

IMHO the OVS is less complex than netfilter (iptables, *tables),
if someone able to deal with reading the netfilter rules
he should be able to deal with OVS as well.

OVS has debugging tools for internal operations, I guess you are looking
for something else.
I do not have any `good debugging` tool for net-filter either.

The way how openstack/neutron/devstack by default uses OVS is simpler,
than most small (non openstack related) OVS example trying to explain.

I kind of agree with the lack of documentation part. 
A documentation which explains howto use OVS
in same way as neutron does would be helpfull for new comers.  

> > If the main reason the remaining Nova-net operators don't want to
> > use Neutron is due to the fact that they don't want to deal with
> > the Neutron API, swapping some implementation defaults isn't
> > really going to get us anywhere on that front.
> 
> This is where I think the subthread has definitely wandered off
> topic too. Swapping implementation defaults in DevStack because it's
> quicker and easier to get running on the typical
> all-in-one/single-node setup and faster to debug problems with
> (particularly when you're trying to work on non-network-related bits
> and just need to observe the network communication between your
> services) doesn't seem like it should have a lot to do with the
> recommended default configuration for a large production deployment.
> One size definitely does not fit all.
> 
> > It's an important distinction because it determines what
> > actionable items we can take (e.g. what Salvatore mentioned in his
> > email about defaults). Does that make sense?
> 
> It makes sense in the context of the Neutron/Nova network parity
> topic, but not so much in the context of the DevStack default
> settings topic. DevStack needs a simple default that just works, and
> doesn't need the kitchen sink. You can turn on more complex options
> as you need to test them out. In some ways this has parallels to the
> complexity concerns the operator community has over Neutron and OVS,
> but I think they're still relatively distinct topics.
> --
> Jeremy Stanley
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-17 Thread Attila Fazekas




- Original Message -
> From: "joehuang" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, April 17, 2015 9:46:12 AM
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> Hi, Attila,
> 
> only address the issue of agent status/liveness management is not enough for
> Neutron scalability. The concurrent dynamic load impact on large scale ( for
> example 100k managed nodes with the dynamic load like security group rule
> update, routers_updated, etc ) should also be taken into account too. So
> even if is agent status/liveness management improved in Neutron, that
> doesn't mean the scalability issue totally being addressed.
> 

This story is not about the heartbeat.
https://bugs.launchpad.net/neutron/+bug/1438159

What I am looking for is managing lot of nodes, with minimal `controller` 
resources.

The actual required system changes like (for example regarding to vm boot) 
per/sec
is relative low, even if you have many nodes and vms. - Consider the instances 
average lifetime -

The `bug` is for the resources what the agents are related and querying many 
times,
BTW: I am thinking about several alternatives and other variants.

In neutron case a `system change` can affect multiple agents
like security group rule change.

It seams possible to have all agents to `query` a resource only once,
and being notified by any subsequent change `for free`. (IP, sec group rule, 
new neighbor) 

This is the scenario when the message brokers can shine and scale,
and it also offloads lot of work from the DB.


> And on the other hand, Nova already supports several segregation concepts,
> for example, Cells, Availability Zone... If there are 100k nodes to be
> managed by one OpenStack instances, it's impossible to work without hardware
> resources segregation. It's weird to put agent liveness manager in
> availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is
> power off, then all agents in AZ2 lost management.
> 
>
> The benchmark is already here for scalability "test report for million ports
> scalability of Neutron "
> http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
> 
> The cascading may be not perfect, but at least it provides a feasible way if
> we really want scalability.
> 
> I am also working to evolve OpenStack to a world no need to worry about
> "OpenStack Scalability Issue" based on cascading:
> 
> "Tenant level virtual OpenStack service over hybrid or federated or multiple
> OpenStack based clouds":
> 
> There are lots of OpenStack based clouds, each tenant will be allocated with
> one cascading OpenStack as the virtual OpenStack service, and single
> OpenStack API endpoint served for this tenant. The tenant's resources can be
> distributed or dynamically scaled to multi-OpenStack based clouds, these
> clouds may be federated with KeyStone, or using shared KeyStone, or  even
> some OpenStack clouds built in AWS or Azure, or VMWare vSphere.
>
> 
> Under this deployment scenario, unlimited scalability in a cloud can be
> achieved, no unified cascading layer, tenant level resources orchestration
> among multi-OpenStack clouds fully distributed(even geographically). The
> database and load for one casacding OpenStack is very very small, easy for
> disaster recovery or backup. Multiple tenant may share one cascading
> OpenStack to reduce resource waste, but the principle is to keep the
> cascading OpenStack as thin as possible.
>
> You can find the information here:
> https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case
> 
> Best Regards
> Chaoyi Huang ( joehuang )
> 
> -Original Message-
> From: Attila Fazekas [mailto:afaze...@redhat.com]
> Sent: Thursday, April 16, 2015 3:06 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> 
> 
> - Original Message -
> > From: "joehuang" 
> > To: "OpenStack Development Mailing List (not for usage questions)"
> > 
> > Sent: Sunday, April 12, 2015 3:46:24 AM
> > Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> > 
> > 
> > 
> > As Kevin talking about agents, I want to remind that in TCP/IP stack,
> > port ( not Neutron Port ) is a two bytes field, i.e. port ranges from
> > 0 ~ 65535, supports maximum 64k port number.
> > 
> > 
> > 
> > " above 100k managed node " means more than 100k L2 agents/L3
> > agents... will be alive under Neutron.
> &

Re: [openstack-dev] [all] QPID incompatible with python 3 and untested in gate -- what to do?

2015-04-16 Thread Attila Fazekas




- Original Message -
> From: "Ken Giusti" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Thursday, April 16, 2015 4:47:50 PM
> Subject: Re: [openstack-dev] [all] QPID incompatible with python 3 and 
> untested in gate -- what to do?
> 
> On Wed, Apr 15, 2015 at 8:18 PM, Joshua Harlow  wrote:
> > Ken Giusti wrote:
> >>
> >> On Wed, Apr 15, 2015 at 1:33 PM, Doug Hellmann
> >> wrote:
> >>>
> >>> Excerpts from Ken Giusti's message of 2015-04-15 09:31:18 -0400:
> 
>  On Tue, Apr 14, 2015 at 6:23 PM, Joshua Harlow
>  wrote:
> >
> > Ken Giusti wrote:
> >>
> >> Just to be clear: you're asking specifically about the 0-10 based
> >> impl_qpid.py driver, correct?   This is the driver that is used for
> >> the "qpid://" transport (aka rpc_backend).
> >>
> >> I ask because I'm maintaining the AMQP 1.0 driver (transport
> >> "amqp://") that can also be used with qpidd.
> >>
> >> However, the AMQP 1.0 driver isn't yet Python 3 compatible due to its
> >> dependency on Proton, which has yet to be ported to python 3 - though
> >> that's currently being worked on [1].
> >>
> >> I'm planning on porting the AMQP 1.0 driver once the dependent
> >> libraries are available.
> >>
> >> [1]: https://issues.apache.org/jira/browse/PROTON-490
> >
> >
> > What's the expected date on this as it appears this also blocks python
> > 3
> > work as well... Seems like that hasn't been updated since nov 2014
> > which
> > doesn't inspire that much confidence (especially for what appears to be
> > mostly small patches).
> >
>  Good point.  I reached out to the bug owner.  He got it 'mostly
>  working' but got hung up on porting the proton unit tests.   I've
>  offered to help this along and he's good with that.  I'll make this a
>  priority to move this along.
> 
>  In terms of availability - proton tends to do releases about every 4-6
>  months.  They just released 0.9, so the earliest availability would be
>  in that 4-6 month window (assuming that should be enough time to
>  complete the work).   Then there's the time it will take for the
>  various distros to pick it up...
> 
>  so, definitely not 'real soon now'. :(
> >>>
> >>> This seems like a case where if we can get the libs we need to a point
> >>> where they install via pip, we can let the distros catch up instead of
> >>> waiting for them.
> >>>
> >>
> >> Sadly just the python wrappers are available via pip.  Its C extension
> >> requires that the native proton shared library (libqpid-proton) is
> >> available.   To date we've relied on the distro to provide that
> >> library.
> >
> >
> > How does that (c extension) work with eventlet? Does it?
> >
> 
> I haven't experienced any issues in my testing.
> 
> To be clear - the libqpid-proton library is non-blocking and
> non-threading.  It's simply an protocol processing engine - the driver
> hands it raw network data and messages magically pop out (and vise
> versa).
> 
>  All I/O, blocking, threading etc is done in the python driver itself.
> I suspect there's nothing eventlet needs to do that requires
> overloading functionality provided by the binary proton library, but
> my knowledge of eventlet is pretty slim.
> 

Usually to make a C code I/O lib eventlet friendly you need to use python
sockets. It is possible to explicitly use python socket on the C code.
For. Ex.: 
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

If the driver using python sockets and just passing the data to the C code as 
you said,
it is fine!

> >
> >>
> >>> Similarly, if we have *an* approach for Python 3 on oslo.messaging, that
> >>> means the library isn't blocking us from testing applications with
> >>> Python 3. If some of the drivers lag, their test jobs may need to be
> >>> removed or disabled if the apps start testing under Python 3.
> >>>
> >>> Doug
> >>>
> >>>
> >>> __
> >>> OpenStack Development Mailing List (not for usage questions)
> >>> Unsubscribe:
> >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >>
> >>
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> --
> Ken Giusti  (kgiu...@gmail.com)
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

_

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-16 Thread Attila Fazekas




- Original Message -
> From: "joehuang" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Sunday, April 12, 2015 3:46:24 AM
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> As Kevin talking about agents, I want to remind that in TCP/IP stack, port (
> not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535,
> supports maximum 64k port number.
> 
> 
> 
> " above 100k managed node " means more than 100k L2 agents/L3 agents... will
> be alive under Neutron.
> 
> 
> 
> Want to know the detail design how to support 99.9% possibility for scaling
> Neutron in this way, and PoC and test would be a good support for this idea.
> 

Would you consider something as PoC which uses the technology in similar way,
with a similar port - security problem, but with a lower level API
than neutron using currently ?

Is it an acceptable flaw:
If you kill -9 the q-svc 1 times at the `right` millisec the rabbitmq
memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure)
The memory can be freed without broker restart, it also gets freed on
agent restart.


> 
> 
> "I'm 99.9% sure, for scaling above 100k managed node,
> we do not really need to split the openstack to multiple smaller openstack,
> or use significant number of extra controller machine."
> 
> 
> 
> Best Regards
> 
> 
> 
> Chaoyi Huang ( joehuang )
> 
> 
> 
> From: Kevin Benton [blak...@gmail.com]
> Sent: 11 April 2015 12:34
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> Which periodic updates did you have in mind to eliminate? One of the few
> remaining ones I can think of is sync_routers but it would be great if you
> can enumerate the ones you observed because eliminating overhead in agents
> is something I've been working on as well.
> 
> One of the most common is the heartbeat from each agent. However, I don't
> think we can't eliminate them because they are used to determine if the
> agents are still alive for scheduling purposes. Did you have something else
> in mind to determine if an agent is alive?
> 
> On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas < afaze...@redhat.com >
> wrote:
> 
> 
> I'm 99.9% sure, for scaling above 100k managed node,
> we do not really need to split the openstack to multiple smaller openstack,
> or use significant number of extra controller machine.
> 
> The problem is openstack using the right tools SQL/AMQP/(zk),
> but in a wrong way.
> 
> For example.:
> Periodic updates can be avoided almost in all cases
> 
> The new data can be pushed to the agent just when it needed.
> The agent can know when the AMQP connection become unreliable (queue or
> connection loose),
> and needs to do full sync.
> https://bugs.launchpad.net/neutron/+bug/1438159
> 
> Also the agents when gets some notification, they start asking for details
> via the
> AMQP -> SQL. Why they do not know it already or get it with the notification
> ?
> 
> 
> - Original Message -
> > From: "Neil Jerram" < neil.jer...@metaswitch.com >
> > To: "OpenStack Development Mailing List (not for usage questions)" <
> > openstack-dev@lists.openstack.org >
> > Sent: Thursday, April 9, 2015 5:01:45 PM
> > Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> > 
> > Hi Joe,
> > 
> > Many thanks for your reply!
> > 
> > On 09/04/15 03:34, joehuang wrote:
> > > Hi, Neil,
> > > 
> > > From theoretic, Neutron is like a "broadcast" domain, for example,
> > > enforcement of DVR and security group has to touch each regarding host
> > > where there is VM of this project resides. Even using SDN controller, the
> > > "touch" to regarding host is inevitable. If there are plenty of physical
> > > hosts, for example, 10k, inside one Neutron, it's very hard to overcome
> > > the "broadcast storm" issue under concurrent operation, that's the
> > > bottleneck for scalability of Neutron.
> > 
> > I think I understand that in general terms - but can you be more
> > specific about the broadcast storm? Is there one particular message
> > exchange that involves broadcasting? Is it only from the server to
> > agents, or are there 'broadcasts' in other directions as well?
> > 
> > (I presume you are talking about control plane messages here, i.e.
> > between Neutron components. Is that right? Obviou

Re: [openstack-dev] 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Attila Fazekas




- Original Message -
> From: "Wangbibo" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Monday, April 13, 2015 10:51:39 AM
> Subject: [openstack-dev] 答复:  [neutron] Neutron scaling datapoints?
> 
> 
> 
> Hi Kevin,
> 
> 
> 
> Totally agree with you that heartbeat from each agent is something that we
> cannot eliminate currently. Agent status depends on it, and further
> scheduler and HA depends on agent status.
> 

Actually we could eliminate it regarding to the q-agt,
The q-agt can be monitored by the n-cpu
n-cpu also should change his status to dead when the q-agt dead.

So neutron could reuse the aliveness data from n-cpu aliveness.

Sooner or later I will suggest a direct
connection between n-cpu and q-agt anyway.

--

Also it is possible to implement the is_up by a dummy message send:

- All agent has to have an auto-delete queue, which consumed only by the agent.

- The is_up can use the Default exchange and start publishing a message
  with `immediate` flag.
  If the broker does not refuses it, the target system is alive.

https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.publish

This method has the same issue as the current memcached driver,
each is_up is a tcp request/response which consumes too many
time and resources when you `list` 100k node.

---

Actually the recommended method is:

Have a service which is:
 - HA (3(+) node)
 - Really able to use multiple threads (not cpython)
 - Does not do a real state change when the service state did not changed
 - The availability is based on the tcp connection health, which is checked 
either by
   - Frequent TCP keep-alive packages managed by the kernel
   - By an Application level payload
 - The service state interested parties are notified about state changes only
   when state change happened

For ex.: Zookeeper with ephemeral znodes.

Have a second service:
 - Which is subscribed to the first one (it can use tooz)
 - Consulting the service state changes with the database (add is_alive field 
to the table)
 - You can run multiple instances for HA,
   they does the same DB change with small delay, no split brain issue,
   but you should not run more than 3 instances.

Benefits of this approach compered to other zk based approaches.:

- Does not have all API worker to keep all service data in memory !
- Does not have all worker to cross reference the DB data with something else
  (especially in list context)
- Selecting only the dead or alive nodes will be simple and efficient

Cons.:

- 0.001 DB UPDATE/sec expected at 100k node (nothing :))
- Additional service component, but actually it saves memory 

PS.:
The zk has one other advantage compared to mc or the current db driver:
 Faster state change detection and report. 
 
> 
> I proposed a Liberty spec for introducing open framework/pluggable agent
> status drivers.[1][2] It allows us to use some other 3 rd party backend to
> monitor agent status, such as zookeeper, memcached. Meanwhile, it guarantees
> backward compatibility so that users could still use db-based status
> monitoring mechanism as their default choice.
> 
> 
> 
> Base on that, we may do further optimization on issues Attila and you
> mentioned. Thanks.
> 
> 
> 
> [1] BP -
> https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers
> 
> [2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/
> 
> 
> 
> Best,
> 
> Robin
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 发件人 : Kevin Benton [mailto:blak...@gmail.com]
> 发送时间 : 2015 年 4 月 11 日 12:35
> 收件人 : OpenStack Development Mailing List (not for usage questions)
> 主题 : Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> 
> 
> Which periodic updates did you have in mind to eliminate? One of the few
> remaining ones I can think of is sync_routers but it would be great if you
> can enumerate the ones you observed because eliminating overhead in agents
> is something I've been working on as well.
> 
> 
> 
> 
> 
> One of the most common is the heartbeat from each agent. However, I don't
> think we can't eliminate them because they are used to determine if the
> agents are still alive for scheduling purposes. Did you have something else
> in mind to determine if an agent is alive?
> 
> 
> 
> 
> 
> On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas < afaze...@redhat.com >
> wrote:
> 
> I'm 99.9% sure, for scaling above 100k managed node,
> we do not really need to split the openstack to multiple smaller openstack,
> or use significant number of extra controller machine.
> 
> The problem is openstack using the right tools SQL/AMQP/(zk),
> but in a wrong way.
> 
> For example

Re: [openstack-dev] [nova][database][quotas] reservations table ??

2015-04-13 Thread Attila Fazekas




- Original Message -
> From: "Kevin L. Mitchell" 
> To: openstack-dev@lists.openstack.org
> Sent: Friday, April 10, 2015 5:47:26 PM
> Subject: Re: [openstack-dev] [nova][database][quotas] reservations table ??
> 
> On Fri, 2015-04-10 at 02:38 -0400, Attila Fazekas wrote:
> > I noticed the nova DB has reservations table with an expire field (+24h)
> > and a periodic task
> > in the scheduler (60 sec) for expire the otherwise not deleted records [2].
> > 
> > Both the table and the observed operations are strange.
> > 
> > What this table and its operations are trying to solve ?
> > Why does it needed ?
> > Why this solution was chosen ?
> 
> It might help to know that this is reservations for the quota system.
> The basic reason that this exists is because of parallelism: say the
> user makes a request to boot a new instance, and that new instance would
> fill their quota.  Nova begins processing the request, but while it's
> doing so, the user makes a second (or third, fourth, fifth, etc.)
> request.  With a reservation, we can count the first request against
> their quota and reject the extra requests; without a reservation, we
> have no way of knowing that nova is already processing a request, and so
> could allow the user to vastly exceed their quota.
> 
Just the very existence of the `expire` makes the solution very suspicious.

As I see the operations does no ensure parallel safe quota enforcement 
at resource creation and based on stale data. (wireshark)

It is based on a data originated from different transaction,
 even without SELECT .. WITH SHARED LOCK.

When moving the delta to/from reservations the service puts a lock 
(SELECT .. FOR UPDATE) on all same tenant related quota_usages row,
this is the only safety mechanism I saw.
Alone it is not enough.

No quota related table touched in the same transaction
when the instance state changed (or created). :(

---
The reservations table is not really needed.

What is really needed is doing the quota_usages changes
 and resource state changes in the same transaction !

The transactions are all or nothing constructs,
nothing can happen which needs any `expire` thing.
 
The transaction needs to ensure really it does the state change.
It can mean just read it with SELECT .. FOR UPDATE  
for an existing record (for ex.: instance)

The transaction also needs to ensure quota check happened 
based on not stale data -> SELECT .. WITH SHARED LOCK for
- quota limit queries
- for calculating the actual number of things or for just reading the
  values from the quota_usages

In most cases, the quota check and update can be merged to a single UPDATE 
statement
and it fully can happen on the DB side, without actually fetching
any quota related information by the service.

The mysql UPDATE statement with the right expressions and sub-queries,
automatically can place the minimum required locks and do the update when 
needed.

The number of changed rows returned by the UPDATE,
can indicate is the quota successfully allocated (passed the check) or not.

When it's not successful, just ROLLBACK and tell something to the user about
the `Out of Quota` issue.
  
It is recommended to put the quota check close to the end of the transaction,
in order to minimize the lock hold time related to quota_usages table.

At the end we will not lock the quota_usages twice (as we do now),
and we do not left behind 4 virtually deleted rows in a `bonus` table,
and do not use +1 extra transaction and +8 extra UPDATE  per instance create,
and consistency is ensured.


> > PS.:
> > Is the uuid in the table referenced by anything?
> 
> Once the operation that allocated the reservation completes, it either
> rolls back the reservation (in the case of failure) or it commits the
> reservation (updating a cache quota usages table).  This involves
> updating the reservation table to delete the reservation, and a UUID
> helps match up the specific row.  (Or rows; most operations involve more
> than one quota and thus more than one row.)  The expiration logic is to
> deal with the case that the operation never completed because nova
> crashed in the middle, and provides a stop-gap measure to ensure that
> the usage isn't counted against the user forever.

Just to confirm, the same UUID just exists in the reservation table only,
and temporary in one workers memory .?


> --
> Kevin L. Mitchell 
> Rackspace
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


PS.:
The `Refresh` is also strange thing in this context.


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread Attila Fazekas




- Original Message -
> From: "joehuang" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Sunday, April 12, 2015 1:20:48 PM
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> Hi, Kevin,
> 
> 
> 
> I assumed that all agents are connected to same IP address of RabbitMQ, then
> the connection will exceed the port ranges limitation.
> 
https://news.ycombinator.com/item?id=1571300

"TCP connections are identified by the (src ip, src port, dest ip, dest port) 
tuple."

"The server doesn't need multiple IPs to handle > 65535 connections. All the 
server connections to a given IP are to the same port. For a given client, the 
unique key for an http connection is (client-ip, PORT, server-ip, 80). The only 
number that can vary is PORT, and that's a value on the client. So, the client 
is limited to 65535 connections to the server. But, a second client could also 
have another 65K connections to the same server-ip:port."

> 
> For a RabbitMQ cluster, for sure the client can connect to any one of member
> in the cluster, but in this case, the client has to be designed in fail-safe
> manner: the client should be aware of the cluster member failure, and
> reconnect to other survive member. No such mechnism has been implemented
> yet.
> 
> 
> 
> Other way is to use LVS or DNS based like load balancer, or something else.
> If you put one load balancer ahead of a cluster, then we have to take care
> of the port number limitation, there are so many agents will require
> connection concurrently, 100k level, and the requests can not be rejected.
> 
> 
> 
> Best Regards
> 
> 
> 
> Chaoyi Huang ( joehuang )
> 
> 
> 
> From: Kevin Benton [blak...@gmail.com]
> Sent: 12 April 2015 9:59
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> The TCP/IP stack keeps track of connections as a combination of IP + TCP
> port. The two byte port limit doesn't matter unless all of the agents are
> connecting from the same IP address, which shouldn't be the case unless
> compute nodes connect to the rabbitmq server via one IP address running port
> address translation.
> 
> Either way, the agents don't connect directly to the Neutron server, they
> connect to the rabbit MQ cluster. Since as many Neutron server processes can
> be launched as necessary, the bottlenecks will likely show up at the
> messaging or DB layer.
> 
> On Sat, Apr 11, 2015 at 6:46 PM, joehuang < joehu...@huawei.com > wrote:
> 
> 
> 
> 
> 
> As Kevin talking about agents, I want to remind that in TCP/IP stack, port (
> not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535,
> supports maximum 64k port number.
> 
> 
> 
> " above 100k managed node " means more than 100k L2 agents/L3 agents... will
> be alive under Neutron.
> 
> 
> 
> Want to know the detail design how to support 99.9% possibility for scaling
> Neutron in this way, and PoC and test would be a good support for this idea.
> 
> 
> 
> "I'm 99.9% sure, for scaling above 100k managed node,
> we do not really need to split the openstack to multiple smaller openstack,
> or use significant number of extra controller machine."
> 
> 
> 
> Best Regards
> 
> 
> 
> Chaoyi Huang ( joehuang )
> 
> 
> 
> From: Kevin Benton [ blak...@gmail.com ]
> Sent: 11 April 2015 12:34
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> Which periodic updates did you have in mind to eliminate? One of the few
> remaining ones I can think of is sync_routers but it would be great if you
> can enumerate the ones you observed because eliminating overhead in agents
> is something I've been working on as well.
> 
> One of the most common is the heartbeat from each agent. However, I don't
> think we can't eliminate them because they are used to determine if the
> agents are still alive for scheduling purposes. Did you have something else
> in mind to determine if an agent is alive?
> 
> On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas < afaze...@redhat.com >
> wrote:
> 
> 
> I'm 99.9% sure, for scaling above 100k managed node,
> we do not really need to split the openstack to multiple smaller openstack,
> or use significant number of extra controller machine.
> 
> The problem is openstack using the right tools SQL/AMQP/(zk),
> but in a wrong way.
> 
> For example.:
> Periodic updates can be avoided almost in 

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread Attila Fazekas




- Original Message -
> From: "Kevin Benton" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Sunday, April 12, 2015 4:17:29 AM
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> So IIUC tooz would be handling the liveness detection for the agents. That
> would be nice to get ride of that logic in Neutron and just register
> callbacks for rescheduling the dead.
> 
> Where does it store that state, does it persist timestamps to the DB like
> Neutron does? If so, how would that scale better? If not, who does a given
> node ask to know if an agent is online or offline when making a scheduling
> decision?
> 
You might find interesting the proposed solution in this bug:
https://bugs.launchpad.net/nova/+bug/1437199

> However, before (what I assume is) the large code change to implement tooz, I
> would like to quantify that the heartbeats are actually a bottleneck. When I
> was doing some profiling of them on the master branch a few months ago,
> processing a heartbeat took an order of magnitude less time (<50ms) than the
> 'sync routers' task of the l3 agent (~300ms). A few query optimizations
> might buy us a lot more headroom before we have to fall back to large
> refactors.
> Kevin Benton wrote:
> 
> 
> 
> One of the most common is the heartbeat from each agent. However, I
> don't think we can't eliminate them because they are used to determine
> if the agents are still alive for scheduling purposes. Did you have
> something else in mind to determine if an agent is alive?
> 
> Put each agent in a tooz[1] group; have each agent periodically heartbeat[2],
> have whoever needs to schedule read the active members of that group (or use
> [3] to get notified via a callback), profit...
> 
> Pick from your favorite (supporting) driver at:
> 
> http://docs.openstack.org/ developer/tooz/compatibility. html
> 
> [1] http://docs.openstack.org/ developer/tooz/compatibility. html#grouping
> [2] https://github.com/openstack/ tooz/blob/0.13.1/tooz/ coordination.py#L315
> [3] http://docs.openstack.org/ developer/tooz/tutorial/group_
> membership.html#watching- group-changes
> 
> 
> __ __ __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request@lists. openstack.org?subject: unsubscribe
> http://lists.openstack.org/ cgi-bin/mailman/listinfo/ openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][tempest] Service tag blueprint incomplete

2015-03-16 Thread Attila Fazekas




- Original Message -
> From: "Rohan Kanade" 
> To: openstack-dev@lists.openstack.org
> Sent: Monday, March 16, 2015 1:13:12 PM
> Subject: [openstack-dev] [qa][tempest] Service tag blueprint incomplete
> 
> Hi,
> 
> I could find some tests in tempest are still not tagged with services as per
> blueprint < https://blueprints.launchpad.net/tempest/+spec/add-service-tags
> >
> 
> eg: .tempest.api.compute.test_live_block_migration:test_iscsi_volume (should
> have "volume" tag)
> 
> I have started adding tags where appropriate
> 
> https://review.openstack.org/#/c/164634/
> 
> 
> Please correct me if above observation is wrong.

The identity _service_ refers to keystone, that test is not really 
related more to keystone, than any other swift test
 when the auth backed is keystone.

Implicitly in practice almost all test is using keystone,
this it is not mentioned explicitly everywhere. The tests in the `identity` 
directory are considered taged, by the test selection logic.

Also we does not mention `image` when booting a nova server.

The service tags main expected usage is on the scenarios,
or when certain api has explicit feature regarding to other services.

The mentioned swift test case grants WORLD readability,
it is definitely AAA related thing, but in this case not really keystone 
related.

If we would like to distinguish the AAA related test we might need a different 
tag,
or we need to redefine the meaning of the `identity` tag. 

PS.:
AAA = Auth* =  Authentication, Authorization, and Accounting
> 
> Regards,
> Rohan Kanade
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] if by "archived" you mean, "wipes out your tables completely", then sure, it works fine

2015-03-16 Thread Attila Fazekas
Hi Mike,

The points was, there is no real need or real use case for archiving the db
as the nova-mange does.

What is the exact use case ? Auditing ? Accounting ?

* Keystone allows permanent delete, if you need to do auditing probably
  the user accounts would the primary target for saving.

* The logs+elasticsearch(or just grep) and ceilometer+mongodb is designed to
  help in `archiving` and keep the things what you actually need.

* After one year you can have ~100M deleted server instance record 
  in the shadow tables (+ the related rows), what to do with them ? Truncate ?
  If you have proper indexes on the main tables the deleted records mostly just
  consumes disk space, otherwise they also causes serious performance issues.

If anybody would like to keep the deleted things in SQL for whatever reason,
he very likely want to do in a different database instance on a different 
server,
it is also likely he would like to do some transformation(OLAP) instead of 
attacking
the production DB with full table scans while also invalidating the `Buffer 
Pool` content.

The feature as it is does not makes sense even after fixing the existing bugs.
I do not know what would be it's actual use case, even if there is one, 
probably it is
 not the best approach.

My suggestion is just nuke it,
and came up with `simple` script which archives the old records to /dev/null.
$ nova-mange db flush 7d 
This would deletes the soft-deleted records in small chunks (like token-flush). 

(or just stop doing soft-delete.)


- Original Message -
> From: "Mike Bayer" 
> To: "Attila Fazekas" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, March 13, 2015 5:04:21 PM
> Subject: Re: [openstack-dev] [nova] if by "archived" you mean,"wipes 
> out your tables completely", then sure, it
> works fine
> 
> 
> 
> Attila Fazekas  wrote:
> 
> > The archiving has issues since very long time [1],
> > something like this [2] is expected to replace it.
> 
> 
> yeah I was thinking of just rewriting the archive routine in Nova to be
> reasonable, but I can build this routine into Oslo.db as well as a generic
> “move rows with criteria X into tables”. Archiving as it is is mostly
> useless if it isn’t considering dependencies between tables
> (https://bugs.launchpad.net/nova/+bug/1183523) so the correct approach would
> need to consider tables and potentially rows in terms of foreign key
> dependency. This is what the unit of work was built to handle. Though I’m
> not sure I can make this a generic ORM play since we want to be able to
> delete “only N” rows, and it would probably be nice for the system to not
> spend its time reading in the entire DB if it is only tasked with a few
> dozen rows, so it might need to implement its own mini-unit-of-work system
> that works against the same paradigm but specific to this use case.
> 
> The simplest case is that we address the archival of tables in order of
> foreign key dependency. However, that has two issues in the “generic” sense.
> One is that there can be cycles between tables, or a table that refers to
> itself has a cycle to itself. So in those cases the archival on a “sort the
> tables” basis needs to be broken into a “sort the rows” basis. This is what
> SQLAlchemy’s unit of work does and I’d adapt that here.
> 
> The other possible, but probably unlikely, issue is that to address this
> “generically”, if a row “Table A row 1” is referred to by a “Table B row 2”,
> it might not be assumable that it is safe to remove “Table B Row 2” and
> *not* “Table A row 1”. The application may rely on both of these rows being
> present, and the SQLAlchemy pattern where this is the case is the so-called
> “joined table inheritance” case. But the “joined table inheritance” pattern
> is actually not very easy to adapt to the “shadow” model so I doubt anyone
> is doing that.

IMHO we should forget about solving how to move them safely to a different 
table,
the issue is how to delete them in relative small transactions
 ~100 instances(+referenced/related records), without causing full table scans 
or causing reference violation issues.

keystone token-flush also has a logic to do the delete in smaller chunks,
in order to do not stall regular processing for a long time or hit DB 
replication
limit issues. keystone targets to do 1000 row delete per transaction with 
mysql, 
some cases actually the deleted row number differs.

PS.:
Adding indexes on the deleted_at fields is acceptable.

> > The archiving just move trash to the other side of the desk,
> > usually just permanently deleting everything what is deleted
> > for more than 7 day is better for everyone.
> > 
> > For now, maybe just wiping out the shadow tables and the exi

Re: [openstack-dev] [nova] if by "archived" you mean, "wipes out your tables completely", then sure, it works fine

2015-03-13 Thread Attila Fazekas
The archiving has issues since very long time [1],
something like this [2] is expected to replace it.

The archiving just move trash to the other side of the desk,
usually just permanently deleting everything what is deleted
for more than 7 day is better for everyone.

For now, maybe just wiping out the shadow tables and the existing nova-mange 
functionality is better choice. [3]

[1] https://bugs.launchpad.net/nova/+bug/1305892
[2] https://blueprints.launchpad.net/nova/+spec/db-purge-engine
[3] https://bugs.launchpad.net/nova/+bug/1426873

- Original Message -
> From: "Mike Bayer" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, March 13, 2015 12:29:55 AM
> Subject: [openstack-dev] [nova] if by "archived" you mean,"wipes out your 
> tables completely", then sure, it works
> fine
> 
> Hello Nova -
> 
> Not sure if I’m just staring at this for too long, or if
> archive_deleted_rows_for_table() is just not something we ever use.
> Because it looks like it’s really, really broken very disastrously, and I’m
> wondering if I’m just missing something in front of me.
> 
> Let’s look at what it does!
> 
> First, archive_deleted_rows() calls it with a table name. These names are
> taken by collecting every single table name from nova.db.sqlalchemy.models.
> 
> Then, the function uses table reflection (that is, doesn’t look in the model
> at all, just goes right to the database) to load the table definitions:
> 
> table = Table(tablename, metadata, autoload=True)
> shadow_tablename = _SHADOW_TABLE_PREFIX + tablename
> rows_archived = 0
> try:
> shadow_table = Table(shadow_tablename, metadata, autoload=True)
> except NoSuchTableError:
> # No corresponding shadow table; skip it.
> return rows_archived
> 
> this is pretty heavy handed and wasteful from an efficiency point of view,
> and I’d like to fix this too, but let’s go with it. Now we have the two
> tables.
> 
> Then we do this:
> 
> deleted_column = table.c.deleted
> query_insert = sql.select([table],
>   deleted_column != deleted_column.default).\
>   order_by(column).limit(max_rows)
> query_delete = sql.select([column],
>   deleted_column != deleted_column.default).\
>   order_by(column).limit(max_rows)
> 
> We make some SELECT statements that we’re going to use to find “soft
> deleted” rows, and these will be embedded into an INSERT
> and a DELETE. It is trying to make a statement like “SELECT .. FROM
> table WHERE deleted != ”, so that it finds rows where
> “deleted” has been changed to something, e.g. the row was
> soft deleted.
> 
> But what’s the value of “deleted_default” ?   Remember, all this
> table knows is what the database just told us about it, because it only
> uses reflection.  Let’s see what the “deleted” column in a table like
> instance_types looks like:
> 
> MariaDB [nova]> show create table instance_types;
> | instance_types | CREATE TABLE `instance_types` (
>   `created_at` datetime DEFAULT NULL,
> 
>   …  [omitted] ...
> 
>   `deleted` int(11) DEFAULT NULL,
> )
> 
> The default that we get for this column is NULL. That is very interesting!
> Because, if we look at the *Python-side value of deleted*, we see something
> that is quite the opposite of NULL, e.g. a thing that is most certainly not
> null:
> 
> class SoftDeleteMixin(object):
> deleted_at = Column(DateTime)
> deleted = Column(Integer, default=0)
> 
> See that zero there? That’s a ***Python-side default***. It is **not the
> server default**!! You will **not** get it from reflection, the database has
> no clue about it (oddly enough, this entire subject matter is fully
> documented in SQLAlchemy’s documentation, and guess what, the docs are free!
> Read them all you like, I won’t ask for a dime, no questions asked!).
> 
> So, all of our INSERTS **will** put a zero, not NULL, into that column.
> Let’s look in instance_types and see:
> 
> MariaDB [nova]> select id, name, deleted from instance_types;
> ++---+-+
> | id | name  | deleted |
> ++---+-+
> |  3 | m1.large  |   0 |
> |  1 | m1.medium |   0 |
> |  7 | m1.micro  |   0 |
> |  6 | m1.nano   |   0 |
> |  5 | m1.small  |   0 |
> |  2 | m1.tiny   |   0 |
> |  4 | m1.xlarge |   0 |
> ++---+-+
> 7 rows in set (0.00 sec)
> 
> No NULLs.  The value of non-deleted rows is zero.
> 
> What does this all mean?
> 
> It means, when this archival routine runs, it runs queries like this:
> 
> INSERT INTO shadow_quota_usages SELECT quota_usages.created_at,
> quota_usages.updated_at, quota_usages.deleted_at, quota_usages.id,
> quota_usages.project_id, quota_usages.resource, quota_usages.in_use,
> quota_usages.reserved, quota_usages.until_refresh, quota_usages.deleted,
> quota_usages.user_id
> FROM quota_usages
> WHERE quota_usages.deleted IS NO

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-10 Thread Attila Fazekas




- Original Message -
> From: "Attila Fazekas" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Tuesday, March 10, 2015 12:48:00 PM
> Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
> supported in nova-scheduler
> 
> 
> 
> 
> 
> - Original Message -
> > From: "Nikola Đipanov" 
> > To: openstack-dev@lists.openstack.org
> > Sent: Tuesday, March 10, 2015 10:53:01 AM
> > Subject: Re: [openstack-dev] [nova] blueprint about multiple workers
> > supported in nova-scheduler
> > 
> > On 03/06/2015 03:19 PM, Attila Fazekas wrote:
> > > Looks like we need some kind of _per compute node_ mutex in the critical
> > > section,
> > > multiple scheduler MAY be able to schedule to two compute node at same
> > > time,
> > > but not for scheduling to the same compute node.
> > > 
> > > If we don't want to introduce another required component or
> > > reinvent the wheel there are some possible trick with the existing
> > > globally
> > > visible
> > > components like with the RDMS.
> > > 
> > > `Randomized` destination choose is recommended in most of the possible
> > > solutions,
> > > alternatives are much more complex.
> > > 
> > > One SQL example:
> > > 
> > > * Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related
> > > table.
> > > 
> > > When the scheduler picks one (or multiple) node, he needs to verify is
> > > the
> > > node(s) are
> > > still good before sending the message to the n-cpu.
> > > 
> > > It can be done by re-reading the ONLY the picked hypervisor(s) related
> > > data.
> > > with `LOCK IN SHARE MODE`.
> > > If the destination hyper-visors still OK:
> > > 
> > > Increase the sched_cnt value exactly by 1,
> > > test is the UPDATE really update the required number of rows,
> > > the WHERE part needs to contain the previous value.
> > > 
> > > You also need to update the resource usage on the hypervisor,
> > >  by the expected cost of the new vms.
> > > 
> > > If at least one selected node was ok, the transaction can be COMMITed.
> > > If you were able to COMMIT the transaction, the relevant messages
> > >  can be sent.
> > > 
> > > The whole process needs to be repeated with the items which did not
> > > passed
> > > the
> > > post verification.
> > > 
> > > If a message sending failed, `act like` migrating the vm to another host.
> > > 
> > > If multiple scheduler tries to pick multiple different host in different
> > > order,
> > > it can lead to a DEADLOCK situation.
> > > Solution: Try to have all scheduler to acquire to Shared RW locks in the
> > > same order,
> > > at the end.
> > > 
> > > Galera multi-writer (Active-Active) implication:
> > > As always, retry on deadlock.
> > > 
> > > n-sch + n-cpu crash at the same time:
> > > * If the scheduling is not finished properly, it might be fixed manually,
> > > or we need to solve which still alive scheduler instance is
> > > responsible for fixing the particular scheduling..
> > > 
> > 
> > So if I am reading the above correctly - you are basically proposing to
> > move claims to the scheduler (we would atomically check if there were
> > changes since the time we picked the host with the UPDATE .. WHERE using
> > LOCK IN SHARE MODE (assuming REPEATABLE READS is the used isolation
> > level) and then updating the usage, a.k.a doing the claim in the same
> > transaction.
> > 
> > The issue here is that we still have a window between sending the
> > message, and the message getting picked up by the compute host (or
> > timing out) or the instance outright failing, so for sure we will need
> > to ack/nack the claim in some way on the compute side.
> > 
> > I believe something like this has come up before under the umbrella term
> > of "moving claims to the scheduler", and was discussed in some detail on
> > the latest Nova mid-cycle meetup, but only artifacts I could find were a
> > few lines on this etherpad Sylvain pointed me to [1] that I am copying
> > here:
> > 
> >
> > """
> > * White board the scheduler service interface
> >  ** note: this design won't change the existing way/logic of reconciling
> > nova db != hype

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-10 Thread Attila Fazekas




- Original Message -
> From: "Nikola Đipanov" 
> To: openstack-dev@lists.openstack.org
> Sent: Tuesday, March 10, 2015 10:53:01 AM
> Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
> supported in nova-scheduler
> 
> On 03/06/2015 03:19 PM, Attila Fazekas wrote:
> > Looks like we need some kind of _per compute node_ mutex in the critical
> > section,
> > multiple scheduler MAY be able to schedule to two compute node at same
> > time,
> > but not for scheduling to the same compute node.
> > 
> > If we don't want to introduce another required component or
> > reinvent the wheel there are some possible trick with the existing globally
> > visible
> > components like with the RDMS.
> > 
> > `Randomized` destination choose is recommended in most of the possible
> > solutions,
> > alternatives are much more complex.
> > 
> > One SQL example:
> > 
> > * Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related table.
> > 
> > When the scheduler picks one (or multiple) node, he needs to verify is the
> > node(s) are
> > still good before sending the message to the n-cpu.
> > 
> > It can be done by re-reading the ONLY the picked hypervisor(s) related
> > data.
> > with `LOCK IN SHARE MODE`.
> > If the destination hyper-visors still OK:
> > 
> > Increase the sched_cnt value exactly by 1,
> > test is the UPDATE really update the required number of rows,
> > the WHERE part needs to contain the previous value.
> > 
> > You also need to update the resource usage on the hypervisor,
> >  by the expected cost of the new vms.
> > 
> > If at least one selected node was ok, the transaction can be COMMITed.
> > If you were able to COMMIT the transaction, the relevant messages
> >  can be sent.
> > 
> > The whole process needs to be repeated with the items which did not passed
> > the
> > post verification.
> > 
> > If a message sending failed, `act like` migrating the vm to another host.
> > 
> > If multiple scheduler tries to pick multiple different host in different
> > order,
> > it can lead to a DEADLOCK situation.
> > Solution: Try to have all scheduler to acquire to Shared RW locks in the
> > same order,
> > at the end.
> > 
> > Galera multi-writer (Active-Active) implication:
> > As always, retry on deadlock.
> > 
> > n-sch + n-cpu crash at the same time:
> > * If the scheduling is not finished properly, it might be fixed manually,
> > or we need to solve which still alive scheduler instance is
> > responsible for fixing the particular scheduling..
> > 
> 
> So if I am reading the above correctly - you are basically proposing to
> move claims to the scheduler (we would atomically check if there were
> changes since the time we picked the host with the UPDATE .. WHERE using
> LOCK IN SHARE MODE (assuming REPEATABLE READS is the used isolation
> level) and then updating the usage, a.k.a doing the claim in the same
> transaction.
> 
> The issue here is that we still have a window between sending the
> message, and the message getting picked up by the compute host (or
> timing out) or the instance outright failing, so for sure we will need
> to ack/nack the claim in some way on the compute side.
> 
> I believe something like this has come up before under the umbrella term
> of "moving claims to the scheduler", and was discussed in some detail on
> the latest Nova mid-cycle meetup, but only artifacts I could find were a
> few lines on this etherpad Sylvain pointed me to [1] that I am copying here:
> 
>
> """
> * White board the scheduler service interface
>  ** note: this design won't change the existing way/logic of reconciling
> nova db != hypervisor view
>  ** gantt should just return claim ids, not entire claim objects
>  ** claims are acked as being in use via the resource tracker updates
> from nova-compute
>  ** we still need scheduler retries for exceptional situations (admins
> doing things outside openstack, hardware changes / failures)
>  ** retry logic in conductor? probably a separate item/spec
> """
> 
> As you can see - not much to go on (but that is material for a separate
> thread that I may start soon).
>
In my example, the resource needs to be considered as used before we get
anything back from the compute.
The resource can be `freed` at error handling,
hopefully be migrating to another node.
 
> The problem I have with this particular approach is that while it claims
> to fix some of the races (and probably does) it doe

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-10 Thread Attila Fazekas




- Original Message -
> From: "Jay Pipes" 
> To: openstack-dev@lists.openstack.org
> Sent: Wednesday, March 4, 2015 9:22:43 PM
> Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
> supported in nova-scheduler
> 
> On 03/04/2015 01:51 AM, Attila Fazekas wrote:
> > Hi,
> >
> > I wonder what is the planned future of the scheduling.
> >
> > The scheduler does a lot of high field number query,
> > which is CPU expensive when you are using sqlalchemy-orm.
> > Does anyone tried to switch those operations to sqlalchemy-core ?
> 
> Actually, the scheduler does virtually no SQLAlchemy ORM queries. Almost
> all database access is serialized from the nova-scheduler through the
> nova-conductor service via the nova.objects remoting framework.
> 

It does not helps you.

> > The scheduler does lot of thing in the application, like filtering
> > what can be done on the DB level more efficiently. Why it is not done
> > on the DB side ?
> 
> That's a pretty big generalization. Many filters (check out NUMA
> configuration, host aggregate extra_specs matching, any of the JSON
> filters, etc) don't lend themselves to SQL column-based sorting and
> filtering.
> 

What a basic SQL query can do,
and what is the limit of the SQL is two different thing.
Even if you do not move everything to the DB side,
the dataset the application need to deal with could be limited.

> > There are use cases when the scheduler would need to know even more data,
> > Is there a plan for keeping `everything` in all schedulers process memory
> > up-to-date ?
> > (Maybe zookeeper)
> 
> Zookeeper has nothing to do with scheduling decisions -- only whether or
> not a compute node's "service descriptor" is active or not. The end goal
> (after splitting the Nova scheduler out into Gantt hopefully at the
> start of the L release cycle) is to have the Gantt database be more
> optimized to contain the resource usage amounts of all resources
> consumed in the entire cloud, and to use partitioning/sharding to scale
> the scheduler subsystem, instead of having each scheduler process handle
> requests for all resources in the cloud (or cell...)
> 
What is the current optional usage of zookeeper, 
and what it could be used for is very different thing.
The resource tracking is possible. 

> > The opposite way would be to move most operation into the DB side,
> > since the DB already knows everything.
> > (stored procedures ?)
> 
> See above. This assumes that the data the scheduler is iterating over is
> well-structured and consistent, and that is a false assumption.

With stored procedures you can do almost anything,
and in many ceases it is more readable than an complex query.

> 
> Best,
> -jay
> 
> > Best Regards,
> > Attila
> >
> >
> > - Original Message -
> >> From: "Rui Chen" 
> >> To: "OpenStack Development Mailing List (not for usage questions)"
> >> 
> >> Sent: Wednesday, March 4, 2015 4:51:07 AM
> >> Subject: [openstack-dev] [nova] blueprint about multiple workers supported
> >>in nova-scheduler
> >>
> >> Hi all,
> >>
> >> I want to make it easy to launch a bunch of scheduler processes on a host,
> >> multiple scheduler workers will make use of multiple processors of host
> >> and
> >> enhance the performance of nova-scheduler.
> >>
> >> I had registered a blueprint and commit a patch to implement it.
> >> https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support
> >>
> >> This patch had applied in our performance environment and pass some test
> >> cases, like: concurrent booting multiple instances, currently we didn't
> >> find
> >> inconsistent issue.
> >>
> >> IMO, nova-scheduler should been scaled horizontally on easily way, the
> >> multiple workers should been supported as an out of box feature.
> >>
> >> Please feel free to discuss this feature, thanks.
> >>
> >> Best Regards
> >>
> >>
> >> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: opensta

[openstack-dev] [Tempest] isolation default config change notification

2015-03-09 Thread Attila Fazekas
Hi All,

This is follow up on [1].
Running the full tempest test-suite in parallel without the 
allow_tenant_isolation=True settings, can cause random not too obvious
failures, which caused lot of issue to tempest newcomers. 

There are special uses case when you might want to disable it,
for example when you would like to run just for several test cases for
benchmarking, when you know it is safe for sure, and you do not want to
include account creation related times to the result.

Now, the other case when you might want to disable this feature, when
you running tempest without admin account. This is expected to change
with the upcoming `test accounts` [2], where allow_tenant_isolation=True
is expected to be the recommended configuration. 
 
Best Regards,
Attila

[1] https://review.openstack.org/#/c/157052/
[2] https://blueprints.launchpad.net/tempest/+spec/test-accounts

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][api] Microversions. And why do we need API extensions for new API functionality?

2015-03-09 Thread Attila Fazekas




- Original Message -
> From: "Christopher Yeoh" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Monday, March 9, 2015 1:04:15 PM
> Subject: Re: [openstack-dev] [nova][api] Microversions. And why do we need 
> API extensions for new API functionality?
> 
> 
> 
> On Mon, Mar 9, 2015 at 10:08 PM, John Garbutt < j...@johngarbutt.com > wrote:
> 
> 
> Hi,
> 
> I think I agree with Jay here, but let me explain...
> 
> On 8 March 2015 at 12:10, Alex Xu < sou...@gmail.com > wrote:
> > Thanks for Jay point this out! If we have agreement on this and document
> > it,
> > that will be great for guiding developer how to add new API.
> 
> +1
> 
> Please could you submit a dev ref for this?
> 
> We can argue on the review, a bit like this one:
> https://github.com/openstack/nova/blob/master/doc/source/devref/policy_enforcement.rst
> 
> > For modularity, we need define what should be in a separated module(it is
> > extension now.) There are three cases:
> > 
> > 1. Add new resource
> > This is totally worth to put in a separated module.
> 
> +1
> 
> > 2. Add new sub-resource
> > like server-tags, I prefer to put in a separated module, I don't think
> > put another 100 lines code in the servers.py is good choice.
> 
> -1
> 
> I hate the idea of show instance extension code for version 2.4 living
> separately to the rest of the instance show logic, when it really
> doesn't have to.
> 
> It feels too heavyweight in its current form.
> 
> 
> If the only thing server-tags did was to add a parameter then we wouldn't
> need a new extension,
> but its not, it adds another resource with associated actions
> 
> 
> Maybe we need a more modular way of expressing the extension within
> the same file?
> 
> 
> I think servers.py is simply to big. Its much harder to read and debug than
> any other plugin just because of its size - or
> maybe I just need a 50" monitor :) I'd rather ensure functionality common
> server-tags and the API is kept together rather than
> spread through servers.py
> 
No, it isn't.
It is bellow 2k line. I usually use low level tools even for python related 
debugging.
For ex.: strace, gdb..
With the extension I get lot of files which may be involved may be not.
This causes me additional headache, because more difficult to see which file
is involved. After an strace I usually know what is the mistake, I just need to 
find
it in the code.
I do not like when I had to open more than 3 files, after I see what went wrong.
I some cases I use gdb, just to get python stack traces just before the first 
incorrect
step is detected, in other cases git grep is sufficient.

Actually for me the extensions increases the required monitor number,
some cases I also need to use more complicated approaches.
I tied lot of python profiler tool as well, but there is no single all cases 
win version,
extra custom hack is required in many cases to get something close what I want.

> 
> > 3. extend attributes and methods for a existed resource
> > like add new attributes for servers, we can choice one of existed module
> > to put it in. Just like this patch https://review.openstack.org/#/c/155853/
> 
> +1
> 
> I wish it was easier to read, but I hope thats fixable long term.
> 
> > 2015-03-08 8:31 GMT+08:00 Jay Pipes < jaypi...@gmail.com >:
> >> Now that microversions have been introduced to the Nova API (meaning we
> >> can now have novaclient request, say, version 2.3 of the Nova API using
> >> the
> >> special X-OpenStack-Nova-API-Version HTTP header), is there any good
> >> reason
> >> to require API extensions at all for *new* functionality.
> 
> As above, a new "resource" probably should get a new "plugins/v3" module
> right?
> 
> It feels (at worst) borderline in the os-server-tags case, due to the
> extra actions.
> 
> >> What is the point of creating a new "plugin"/API extension for this new
> >> functionality? Why can't we just modify the
> >> nova/api/openstack/compute/server.py Controller.show() method and decorate
> >> it with a 2.4 microversion that adds a "tags" attribute to the returned
> >> server dictionary?
> >> 
> >> Similarly, new microversion API functionality should live in a module, as
> >> a top-level (or subcollection) Controller in /nova/api/openstack/compute/,
> >> and should not be in the /nova/api/openstack/compute/plugins/ directory.
> >> Why? Because it's not a plugin.
> 
> Everything is a "plugin" in v3, no more distinction between core vs
> plugin. It needs renaming really.
> 
> It should look just like servers, I guess, which is a top level item:
> https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/plugins/v3/servers.py
> 
> >> Why are we continuing to use these awkward, messy, and cumbersome API
> >> extensions?
> 
> We certainly should never be forced to add an extension to advertise
> new functionality anymore.
> 
> Its a big reason why I want to see the API micro-versions succeed.
> 
> Yep, there is I think no reason except to support 

Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming features (was: [nova] blueprint about multiple workers)

2015-03-09 Thread Attila Fazekas




- Original Message -
> From: "Mike Bayer" 
> To: "Attila Fazekas" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, March 6, 2015 2:20:45 AM
> Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming 
> features (was: [nova] blueprint about
> multiple workers)
> 
> 
> 
> Attila Fazekas  wrote:
> 
> > I see lot of improvements,
> > but cPython is still cPython.
> > 
> > When you benchmarking query related things, please try to
> > get the actual data from the returned objects
> 
> that goes without saying. I’ve been benching SQLAlchemy and DBAPIs for many
> years. New performance improvements tend to be the priority for pretty much
> every major release.
> 
> > and try to do
> > something with data what is not expected to be optimized out even by
> > a smarter compiler.
> 
> Well I tend to favor breaking out the different elements into individual
> tests here, though I guess if you’re trying to trick a JIT then the more
> composed versions may be more relevant. For example, I could already tell
> you that the AttributeDict thing would perform terribly without having to
> mix it up with the DB access. __getattr__ is a poor performer (learned that
> in SQLAlchemy 0.1 about 9 years ago).
Equivalent things also slower in perl. 
> 
> > Here is my play script and several numbers:
> > http://www.fpaste.org/193999/25585380/raw/
> > Is there any faster ORM way for the same op?
> 
> Absolutely, as I’ve been saying for months all the way back in my wiki entry
> on forward, query for individual columns, also skip the session.rollback()
> and do a close() instead (the transaction is still rolled back, we just skip
> the bookkeeping we don’t need).  You get the nice attribute access
> pattern too:

The script probably will be extended with explicit transaction management,
I agree my close / rollback usage is bad and ugly.
Also thanks for the URL usage fix.

> 
> http://www.fpaste.org/194098/56040781/
> 
> def query_sqla_cols(self):
> "SQLAlchemy yield(100) named tuples"
> session = self.Session()
> start = time.time()
> summary = 0
> for obj in session.query(
> Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100):
> summary += obj.id + obj.A + obj.B + obj.C
> session.rollback()
> end = time.time()
> return [end-start, summary]
> 
> def query_sqla_cols_a3(self):
> "SQLAlchemy yield(100) named tuples 3*access"
> session = self.Session()
> start = time.time()
> summary = 0
> for obj in session.query(
> Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100):
> summary += obj.id + obj.A + obj.B + obj.C
> summary += obj.id + obj.A + obj.B + obj.C
> summary += obj.id + obj.A + obj.B + obj.C
> session.rollback()
> end = time.time()
> return [end-start, summary/3]
> 
> 
> Here’s that:
> 
> 0 SQLAlchemy yield(100) named tuples: time: 0.635045 (data [18356026L])
> 1 SQLAlchemy yield(100) named tuples: time: 0.630911 (data [18356026L])
> 2 SQLAlchemy yield(100) named tuples: time: 0.641687 (data [18356026L])
> 0 SQLAlchemy yield(100) named tuples 3*access: time: 0.807285 (data
> [18356026L])
> 1 SQLAlchemy yield(100) named tuples 3*access: time: 0.814160 (data
> [18356026L])
> 2 SQLAlchemy yield(100) named tuples 3*access: time: 0.829011 (data
> [18356026L])
> 
> compared to the fastest Core test:
> 
> 0 SQlAlchemy core simple: time: 0.707205 (data [18356026L])
> 1 SQlAlchemy core simple: time: 0.702223 (data [18356026L])
> 2 SQlAlchemy core simple: time: 0.708816 (data [18356026L])
> 
> 
> This is using 1.0’s named tuple which is faster than the one in 0.9. As I
> discussed in the migration notes I linked, over here
> http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#new-keyedtuple-implementation-dramatically-faster
> is where I discuss how I came up with that named tuple approach.
> 
> In 0.9, the tuples are much slower (but still faster than straight entities):
> 
> 0 SQLAlchemy yield(100) named tuples: time: 1.083882 (data [18356026L])
> 1 SQLAlchemy yield(100) named tuples: time: 1.097783 (data [18356026L])
> 2 SQLAlchemy yield(100) named tuples: time: 1.113621 (data [18356026L])
> 0 SQLAlchemy yield(100) named tuples 3*access: time: 1.204280 (data
> [18356026L])
> 1 SQLAlchemy yield(100) named tuples 3*access: time: 1.245768 (data
> [18356026L])
> 2 SQLAlchemy yield(100) named tupl

Re: [openstack-dev] [nova][api] Microversions. And why do we need API extensions for new API functionality?

2015-03-09 Thread Attila Fazekas
I agree with Jay.

The extension layer is also expensive in CPU usage,
and it also makes more difficult to troubleshoot issues.


- Original Message -
> From: "Jay Pipes" 
> To: "OpenStack Development Mailing List" , 
> "Sergey Nikitin"
> 
> Sent: Sunday, March 8, 2015 1:31:34 AM
> Subject: [openstack-dev] [nova][api] Microversions. And why do we need API 
> extensions for new API functionality?
> 
> Hi Stackers,
> 
> Now that microversions have been introduced to the Nova API (meaning we
> can now have novaclient request, say, version 2.3 of the Nova API using
> the special X-OpenStack-Nova-API-Version HTTP header), is there any good
> reason to require API extensions at all for *new* functionality.
> 
> Sergey Nikitin is currently in the process of code review for the final
> patch that adds server instance tagging to the Nova API:
> 
> https://review.openstack.org/#/c/128940
> 
> Unfortunately, for some reason I really don't understand, Sergey is
> being required to create an API extension called "os-server-tags" in
> order to add the server tag functionality to the API. The patch
> implements the 2.4 Nova API microversion, though, as you can see from
> this part of the patch:
> 
> https://review.openstack.org/#/c/128940/43/nova/api/openstack/compute/plugins/v3/server_tags.py
> 
> What is the point of creating a new "plugin"/API extension for this new
> functionality? Why can't we just modify the
> nova/api/openstack/compute/server.py Controller.show() method and
> decorate it with a 2.4 microversion that adds a "tags" attribute to the
> returned server dictionary?
> 
> https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/servers.py#L369
> 
> Because we're using an API extension for this new server tags
> functionality, we are instead having the extension "extend" the server
> dictionary with an "os-server-tags:tags" key containing the list of
> string tags.
> 
> This is ugly and pointless. We don't need to use API extensions any more
> for this stuff.
> 
> A client knows that server tags are supported by the 2.4 API
> microversion. If the client requests the 2.4+ API, then we should just
> include the "tags" attribute in the server dictionary.
> 
> Similarly, new microversion API functionality should live in a module,
> as a top-level (or subcollection) Controller in
> /nova/api/openstack/compute/, and should not be in the
> /nova/api/openstack/compute/plugins/ directory. Why? Because it's not a
> plugin.
> 
> Why are we continuing to use these awkward, messy, and cumbersome API
> extensions?
> 
> Please, I am begging the Nova core team. Let us stop this madness. No
> more API extensions.
> 
> Best,
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-06 Thread Attila Fazekas
Can you check is this patch does the right thing[1]:

[1] https://review.openstack.org/#/c/112523/6

- Original Message -
> From: "Fredy Neeser" 
> To: openstack-dev@lists.openstack.org
> Sent: Friday, March 6, 2015 6:01:08 PM
> Subject: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes:   
> Avoiding the MTU pitfalls
> 
> Hello world
> 
> I recently created a VXLAN test setup with single-NIC compute nodes
> (using OpenStack Juno on Fedora 20), conciously ignoring the OpenStack
> advice of using nodes with at least 2 NICs ;-) .
> 
> The fact that both native and encapsulated traffic needs to pass through
> the same NIC does create some interesting challenges, but finally I got
> it working cleanly, staying clear of MTU pitfalls ...
> 
> I documented my findings here:
> 
>[1]
> http://blog.systemathic.ch/2015/03/06/openstack-vxlan-with-single-nic-compute-nodes/
>[2]
> http://blog.systemathic.ch/2015/03/05/openstack-mtu-pitfalls-with-tunnels/
> 
> For those interested in single-NIC setups, I'm curious what you think
> about [1]  (a small patch is needed to add "VLAN awareness" to the
> qg-XXX Neutron gateway ports).
> 
> 
> While catching up with Neutron changes for OpenStack Kilo, I came across
> the in-progress work on "MTU selection and advertisement":
> 
>[3]  Spec:
> https://github.com/openstack/neutron-specs/blob/master/specs/kilo/mtu-selection-and-advertisement.rst
>[4]  Patch review:  https://review.openstack.org/#/c/153733/
>[5]  Spec update:  https://review.openstack.org/#/c/159146/
> 
> Seems like [1] eliminates some additional MTU pitfalls that are not
> addressed by [3-5].
> 
> But I think it would be nice if we could achieve [1] while coordinating
> with the "MTU selection and advertisement" work [3-5].
> 
> Thoughts?
> 
> Cheers,
> - Fredy
> 
> Fredy ("Freddie") Neeser
> http://blog.systeMathic.ch
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-06 Thread Attila Fazekas




- Original Message -
> From: "Attila Fazekas" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, March 6, 2015 4:19:18 PM
> Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
> supported in nova-scheduler
> 
> Looks like we need some kind of _per compute node_ mutex in the critical
> section,
> multiple scheduler MAY be able to schedule to two compute node at same time,
> but not for scheduling to the same compute node.
> 
> If we don't want to introduce another required component or
> reinvent the wheel there are some possible trick with the existing globally
> visible
> components like with the RDMS.
> 
> `Randomized` destination choose is recommended in most of the possible
> solutions,
> alternatives are much more complex.
> 
> One SQL example:
> 
> * Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related table.
> 
> When the scheduler picks one (or multiple) node, he needs to verify is the
> node(s) are
> still good before sending the message to the n-cpu.
> 
> It can be done by re-reading the ONLY the picked hypervisor(s) related data.
> with `LOCK IN SHARE MODE`.
> If the destination hyper-visors still OK:
> 
> Increase the sched_cnt value exactly by 1,
> test is the UPDATE really update the required number of rows,
> the WHERE part needs to contain the previous value.

This part is very likely not needed, if all scheduler needs,
to update the (any) same field regarding to the same host, and they
acquire the RW lock for reading before they change it to WRITE lock.

Other strategy might consider pre acquiring the write lock only,
but the write intent is not sure before we re-read and verify the data.  
 
> 
> You also need to update the resource usage on the hypervisor,
>  by the expected cost of the new vms.
> 
> If at least one selected node was ok, the transaction can be COMMITed.
> If you were able to COMMIT the transaction, the relevant messages
>  can be sent.
> 
> The whole process needs to be repeated with the items which did not passed
> the
> post verification.
> 
> If a message sending failed, `act like` migrating the vm to another host.
> 
> If multiple scheduler tries to pick multiple different host in different
> order,
> it can lead to a DEADLOCK situation.
> Solution: Try to have all scheduler to acquire to Shared RW locks in the same
> order,
> at the end.
> 
> Galera multi-writer (Active-Active) implication:
> As always, retry on deadlock.
> 
> n-sch + n-cpu crash at the same time:
> * If the scheduling is not finished properly, it might be fixed manually,
> or we need to solve which still alive scheduler instance is
> responsible for fixing the particular scheduling..
> 
> 
> - Original Message -
> > From: "Nikola Đipanov" 
> > To: openstack-dev@lists.openstack.org
> > Sent: Friday, March 6, 2015 10:29:52 AM
> > Subject: Re: [openstack-dev] [nova] blueprint about multiple workers
> > supported in nova-scheduler
> > 
> > On 03/06/2015 01:56 AM, Rui Chen wrote:
> > > Thank you very much for in-depth discussion about this topic, @Nikola
> > > and @Sylvain.
> > > 
> > > I agree that we should solve the technical debt firstly, and then make
> > > the scheduler better.
> > > 
> > 
> > That was not necessarily my point.
> > 
> > I would be happy to see work on how to make the scheduler less volatile
> > when run in parallel, but the solution must acknowledge the eventually
> > (or never really) consistent nature of the data scheduler has to operate
> > on (in it's current design - there is also the possibility of offering
> > an alternative design).
> > 
> > I'd say that fixing the technical debt that is aimed at splitting the
> > scheduler out of Nova is a mostly orthogonal effort.
> > 
> > There have been several proposals in the past for how to make the
> > scheduler horizontally scalable and improve it's performance. One that I
> > remember from the Atlanta summit time-frame was the work done by Boris
> > and his team [1] (they actually did some profiling and based their work
> > on the bottlenecks they found). There are also some nice ideas in the
> > bug lifeless filed [2] since this behaviour particularly impacts ironic.
> > 
> > N.
> > 
> > [1] https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
> > [2] https://bugs.launchpad.net/nova/+bug/1341420
> > 
> > 
> > > Best Regards.
> > > 
> > > 2015-03-05 21:12 GMT+08:00 Sylvain Bauza  > > <mailto:sba...@redhat.com

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-06 Thread Attila Fazekas
Looks like we need some kind of _per compute node_ mutex in the critical 
section,
multiple scheduler MAY be able to schedule to two compute node at same time,
but not for scheduling to the same compute node.

If we don't want to introduce another required component or
reinvent the wheel there are some possible trick with the existing globally 
visible
components like with the RDMS.

`Randomized` destination choose is recommended in most of the possible 
solutions,
alternatives are much more complex.

One SQL example:

* Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related table.

When the scheduler picks one (or multiple) node, he needs to verify is the 
node(s) are 
still good before sending the message to the n-cpu.

It can be done by re-reading the ONLY the picked hypervisor(s) related data.
with `LOCK IN SHARE MODE`.
If the destination hyper-visors still OK:

Increase the sched_cnt value exactly by 1,
test is the UPDATE really update the required number of rows,
the WHERE part needs to contain the previous value.

You also need to update the resource usage on the hypervisor,
 by the expected cost of the new vms.

If at least one selected node was ok, the transaction can be COMMITed.
If you were able to COMMIT the transaction, the relevant messages 
 can be sent.

The whole process needs to be repeated with the items which did not passed the
post verification.

If a message sending failed, `act like` migrating the vm to another host.

If multiple scheduler tries to pick multiple different host in different order,
it can lead to a DEADLOCK situation.
Solution: Try to have all scheduler to acquire to Shared RW locks in the same 
order,
at the end.

Galera multi-writer (Active-Active) implication:
As always, retry on deadlock. 

n-sch + n-cpu crash at the same time:
* If the scheduling is not finished properly, it might be fixed manually,
or we need to solve which still alive scheduler instance is 
responsible for fixing the particular scheduling..


- Original Message -
> From: "Nikola Đipanov" 
> To: openstack-dev@lists.openstack.org
> Sent: Friday, March 6, 2015 10:29:52 AM
> Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
> supported in nova-scheduler
> 
> On 03/06/2015 01:56 AM, Rui Chen wrote:
> > Thank you very much for in-depth discussion about this topic, @Nikola
> > and @Sylvain.
> > 
> > I agree that we should solve the technical debt firstly, and then make
> > the scheduler better.
> > 
> 
> That was not necessarily my point.
> 
> I would be happy to see work on how to make the scheduler less volatile
> when run in parallel, but the solution must acknowledge the eventually
> (or never really) consistent nature of the data scheduler has to operate
> on (in it's current design - there is also the possibility of offering
> an alternative design).
> 
> I'd say that fixing the technical debt that is aimed at splitting the
> scheduler out of Nova is a mostly orthogonal effort.
> 
> There have been several proposals in the past for how to make the
> scheduler horizontally scalable and improve it's performance. One that I
> remember from the Atlanta summit time-frame was the work done by Boris
> and his team [1] (they actually did some profiling and based their work
> on the bottlenecks they found). There are also some nice ideas in the
> bug lifeless filed [2] since this behaviour particularly impacts ironic.
> 
> N.
> 
> [1] https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
> [2] https://bugs.launchpad.net/nova/+bug/1341420
> 
> 
> > Best Regards.
> > 
> > 2015-03-05 21:12 GMT+08:00 Sylvain Bauza  > >:
> > 
> > 
> > Le 05/03/2015 13:00, Nikola Đipanov a écrit :
> > 
> > On 03/04/2015 09:23 AM, Sylvain Bauza wrote:
> > 
> > Le 04/03/2015 04:51, Rui Chen a écrit :
> > 
> > Hi all,
> > 
> > I want to make it easy to launch a bunch of scheduler
> > processes on a
> > host, multiple scheduler workers will make use of
> > multiple processors
> > of host and enhance the performance of nova-scheduler.
> > 
> > I had registered a blueprint and commit a patch to
> > implement it.
> > 
> > https://blueprints.launchpad.__net/nova/+spec/scheduler-__multiple-workers-support
> > 
> > 
> > 
> > This patch had applied in our performance environment
> > and pass some
> > test cases, like: concurrent booting multiple instances,
> > currently we
> > didn't find inconsistent issue.
> > 
> > IMO, nova-scheduler should been scaled horizontally on
> > easily way, the
> > multiple workers should been supported as an out of box
> > feature.
> > 

Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming features (was: [nova] blueprint about multiple workers)

2015-03-05 Thread Attila Fazekas
I see lot of improvements,
but cPython is still cPython.

When you benchmarking query related things, please try to
get the actual data from the returned objects and try to do
something with data what is not expected to be optimized out even by
a smarter compiler.

Here is my play script and several numbers:
http://www.fpaste.org/193999/25585380/raw/
Is there any faster ORM way for the same op?

Looks like still worth to convert the results to dict,
when you access the data multiple times.

dict is also the typical input type for the json serializers. 

The plain dict is good enough if you do not want to mange
which part is changed, especially when you are not planning to `save` it.

- Original Message -
> From: "Mike Bayer" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Wednesday, March 4, 2015 11:30:49 PM
> Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming  
> features (was: [nova] blueprint about
> multiple workers)
> 
> 
> 
> Mike Bayer  wrote:
> 
> > 
> > 
> > Attila Fazekas  wrote:
> > 
> >> Hi,
> >> 
> >> I wonder what is the planned future of the scheduling.
> >> 
> >> The scheduler does a lot of high field number query,
> >> which is CPU expensive when you are using sqlalchemy-orm.
> >> Does anyone tried to switch those operations to sqlalchemy-core ?
> > 
> > An upcoming feature in SQLAlchemy 1.0 will remove the vast majority of CPU
> > overhead from the query side of SQLAlchemy ORM by caching all the work done
> 
> Just to keep the Openstack community of what’s upcoming, here’s some more
> detail
> on some of the new SQLAlchemy performance features, which are based on the
> goals I first set up last summer at
> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy.
> 
> As 1.0 features a lot of new styles of doing things that are primarily in
> the name of performance, in order to help categorize and document these
> techniques, 1.0 includes a performance suite in examples/ which features a
> comprehensive collection of common database idioms run under timing and
> function-count profiling. These idioms are broken into major categories like
> “short selects”, “large resultsets”, “bulk inserts”, and serve not only as a
> way to compare the relative performance of different techniques, but also as
> a way to provide example code categorized into use cases that illustrate the
> variety of ways to achieve that case, including the tradeoffs for each,
> across Core and ORM. So in this case, we can see what the “baked” query
> looks like in the “short_selects” suite, which times how long it takes to
> perform 1 queries, each of which return one object or row:
> 
> https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/short_selects.py?at=ticket_3054#cl-73
> 
> The results of this suite look like the following:
> 
> test_orm_query : test a straight ORM query of the full entity. (1
> iterations); total time 7.363434 sec
> test_orm_query_cols_only : test an ORM query of only the entity columns.
> (1 iterations); total time 6.509266 sec
> test_baked_query : test a baked query of the full entity. (1 iterations);
> total time 1.999689 sec
> test_baked_query_cols_only : test a baked query of only the entity columns.
> (1 iterations); total time 1.990916 sec
> test_core_new_stmt_each_time : test core, creating a new statement each time.
> (1 iterations); total time 3.842871 sec
> test_core_reuse_stmt : test core, reusing the same statement (but recompiling
> each time). (1 iterations); total time 2.806590 sec
> test_core_reuse_stmt_compiled_cache : test core, reusing the same statement +
> compiled cache. (1 iterations); total time 0.659902 sec
> 
> Where above, “test_orm” and “test_baked” are both using the ORM API
> exclusively. We can see that the “baked” approach, returning column tuples
> is almost twice as fast as a naive Core approach, that is, one which
> constructs select() objects each time and does not attempt to use any
> compilation caching.
> 
> For the use case of fetching large numbers of rows, we can look at the
> large_resultsets suite
> (https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/large_resultsets.py?at=ticket_3054).
> This suite illustrates a single query which fetches 500K rows. The “Baked”
> approach isn’t relevant here as we are only emitting a query once, however
> the approach we use to fetch rows is significant. Here we can see that
> ORM-based “tuple” approaches are very close in speed to the fetching of rows
> using Core direc

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-04 Thread Attila Fazekas
Hi,

I wonder what is the planned future of the scheduling.

The scheduler does a lot of high field number query,
which is CPU expensive when you are using sqlalchemy-orm.
Does anyone tried to switch those operations to sqlalchemy-core ?

The scheduler does lot of thing in the application, like filtering 
what can be done on the DB level more efficiently. Why it is not done
on the DB side ? 

There are use cases when the scheduler would need to know even more data,
Is there a plan for keeping `everything` in all schedulers process memory 
up-to-date ?
(Maybe zookeeper)

The opposite way would be to move most operation into the DB side,
since the DB already knows everything. 
(stored procedures ?)

Best Regards,
Attila


- Original Message -
> From: "Rui Chen" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Wednesday, March 4, 2015 4:51:07 AM
> Subject: [openstack-dev] [nova] blueprint about multiple workers supported
> in nova-scheduler
> 
> Hi all,
> 
> I want to make it easy to launch a bunch of scheduler processes on a host,
> multiple scheduler workers will make use of multiple processors of host and
> enhance the performance of nova-scheduler.
> 
> I had registered a blueprint and commit a patch to implement it.
> https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support
> 
> This patch had applied in our performance environment and pass some test
> cases, like: concurrent booting multiple instances, currently we didn't find
> inconsistent issue.
> 
> IMO, nova-scheduler should been scaled horizontally on easily way, the
> multiple workers should been supported as an out of box feature.
> 
> Please feel free to discuss this feature, thanks.
> 
> Best Regards
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][glusterfs] Online Snapshot fails with GlusterFS

2015-03-02 Thread Attila Fazekas
I think here we should use the same approach 
as with neutron and nova.

Neutron[1] uses the nova admin service user declared here[2].

The 'nova_admin_username' looks like a similar parameter
to the 'os_privileged_user_name' declared with [3].

Later we may switch to dedicated per service `admin in nova` account,
but at this point it is not required.

Guest assisted snapshot may work with nfs or other backends,
so the os_privileged_* options should be defined with both gluster and nfs.

No tempest change should be required.

[1] 
https://github.com/openstack-dev/devstack/blob/db56ee8ef23a68650a3c3b26e5f3dd9b210b6040/lib/neutron#L1050
[2] 
https://github.com/openstack-dev/devstack/blob/a339efcd676b81804b2d5ab54d4bba8ecaba99b5/lib/nova#L361
[3] https://review.openstack.org/#/c/156940/

- Original Message -
> From: "Deepak Shetty" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Cc: "Deepak C Shetty" 
> Sent: Friday, February 27, 2015 8:57:00 AM
> Subject: Re: [openstack-dev] [tempest][glusterfs] Online Snapshot fails with  
> GlusterFS
> 
> Thanks Bharat for starting this thread
> 
> I would like to invite suggestions/opinions from tempest folks on whats the
> right way to get this to work ?
> 
> 1) Use priviledge user in cinder.conf
> 
> --or --
> 
> 2) Modify tempest volume snapshot_in_use testcase to bump the user to admin,
> run the test, revert back to demo before leaving the testcase
> 
> thanx,
> deepak
> 
> 
> On Fri, Feb 27, 2015 at 11:57 AM, Bharat Kumar < bharat.kobag...@redhat.com >
> wrote:
> 
> 
> 
> Hi,
> 
> As part of tempest job " gate-tempest-dsvm-full-glusterfs " run [1], the test
> case " test_snapshot_create_with_volume_in_use" [2] is failing.
> This is because demo user is unable to create online snapshots, due to nova
> policy rules[3].
> 
> To avoid this issue we can modify test case, to make "demo" user as an admin
> before creating snapshot and reverting after it finishes.
> 
> Another approach is to use privileged user (
> https://review.openstack.org/#/c/156940/ ) to create online snapshot.
> 
> [1]
> http://logs.openstack.org/11/159711/1/experimental/gate-tempest-dsvm-full-glusterfs/b2cb37e/
> [2]
> https://github.com/openstack/tempest/blob/master/tempest/api/volume/test_volumes_snapshots.py#L66
> [3] https://github.com/openstack/nova/blob/master/etc/nova/policy.json#L329
> --
> Warm Regards,
> Bharat Kumar Kobagana
> Software Engineer
> OpenStack Storage – RedHat India
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-12 Thread Attila Fazekas


- Original Message -
From: "Attila Fazekas" 
To: "Jay Pipes" 
Cc: "OpenStack Development Mailing List (not for usage questions)" 
, "Pavel Kholkin" 
Sent: Thursday, February 12, 2015 11:52:39 AM
Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
should know about Galera





- Original Message -
> From: "Jay Pipes" 
> To: "Attila Fazekas" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> , "Pavel
> Kholkin" 
> Sent: Wednesday, February 11, 2015 9:52:55 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 02/11/2015 06:34 AM, Attila Fazekas wrote:
> > - Original Message -
> >> From: "Jay Pipes" 
> >> To: "Attila Fazekas" 
> >> Cc: "OpenStack Development Mailing List (not for usage questions)"
> >> , "Pavel
> >> Kholkin" 
> >> Sent: Tuesday, February 10, 2015 7:32:11 PM
> >> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody
> >> should know about Galera
> >>
> >> On 02/10/2015 06:28 AM, Attila Fazekas wrote:
> >>> - Original Message -
> >>>> From: "Jay Pipes" 
> >>>> To: "Attila Fazekas" , "OpenStack Development
> >>>> Mailing
> >>>> List (not for usage questions)"
> >>>> 
> >>>> Cc: "Pavel Kholkin" 
> >>>> Sent: Monday, February 9, 2015 7:15:10 PM
> >>>> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things
> >>>> everybody
> >>>> should know about Galera
> >>>>
> >>>> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> >>>>> I do not see why not to use `FOR UPDATE` even with multi-writer or
> >>>>> Is the retry/swap way really solves anything here.
> >>>> 
> >>>>> Am I missed something ?
> >>>>
> >>>> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> >>>> that are needed to support SELECT FOR UPDATE statements across multiple
> >>>> cluster nodes.
> >>>
> >>> Galere does not replicates the row-level locks created by UPDATE/INSERT
> >>> ...
> >>> So what to do with the UPDATE?
> >>
> >> No, Galera replicates the write sets (binary log segments) for
> >> UPDATE/INSERT/DELETE statements -- the things that actually
> >> change/add/remove records in DB tables. No locks are replicated, ever.
> >
> > Galera does not do any replication at UPDATE/INSERT/DELETE time.
> >
> > $ mysql
> > use test;
> > CREATE TABLE test (id integer PRIMARY KEY AUTO_INCREMENT, data CHAR(64));
> >
> > $(echo 'use test; BEGIN;'; while true ; do echo 'INSERT INTO test(data)
> > VALUES ("test");'; done )  | mysql
> >
> > The writer1 is busy, the other nodes did not noticed anything about the
> > above pending
> > transaction, for them this transaction does not exists as long as you do
> > not call a COMMIT.
> >
> > Any kind of DML/DQL you issue without a COMMIT does not happened in the
> > other nodes perspective.
> >
> > Replication happens at COMMIT time if the `write sets` is not empty.
> 
> We're going in circles here. I was just pointing out that SELECT ... FOR
> UPDATE will never replicate anything. INSERT/UPDATE/DELETE statements
> will cause a write-set to be replicated (yes, upon COMMIT of the
> containing transaction).
> 
> Please see my repeated statements in this thread and others that the
> compare-and-swap technique is dependent on issuing *separate*
> transactions for each SELECT and UPDATE statement...
> 
> > When a transaction wins a voting, the other nodes rollbacks all transaction
> > which had a local conflicting row lock.
> 
> A SELECT statement in a separate transaction does not ever trigger a
> ROLLBACK, nor will an UPDATE statement that does not match any rows.
> That is IMO how increased throughput is achieved in the compare-and-swap
> technique versus the SELECT FOR UPDATE technique.
> 
yes, I mentioned this way in one bug [0].

But the related changes on the review, actually works as I said [1][2][3],
and the SELECT is not in a separated dedicated transaction.


[0] https://bugs.launchpad.net/neutron/+bug/1410854 [sorry I sent a wrong link 
before]
[1] https://review.openstack.org/#/c/143837/
[2] https://review.openstack.org/#/c/153558/
[3] https://review.openstack.org/#/c/149261/

> -jay
> 
> -jay
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-12 Thread Attila Fazekas




- Original Message -
> From: "Jay Pipes" 
> To: "Attila Fazekas" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> , "Pavel
> Kholkin" 
> Sent: Wednesday, February 11, 2015 9:52:55 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 02/11/2015 06:34 AM, Attila Fazekas wrote:
> > - Original Message -
> >> From: "Jay Pipes" 
> >> To: "Attila Fazekas" 
> >> Cc: "OpenStack Development Mailing List (not for usage questions)"
> >> , "Pavel
> >> Kholkin" 
> >> Sent: Tuesday, February 10, 2015 7:32:11 PM
> >> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody
> >> should know about Galera
> >>
> >> On 02/10/2015 06:28 AM, Attila Fazekas wrote:
> >>> - Original Message -
> >>>> From: "Jay Pipes" 
> >>>> To: "Attila Fazekas" , "OpenStack Development
> >>>> Mailing
> >>>> List (not for usage questions)"
> >>>> 
> >>>> Cc: "Pavel Kholkin" 
> >>>> Sent: Monday, February 9, 2015 7:15:10 PM
> >>>> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things
> >>>> everybody
> >>>> should know about Galera
> >>>>
> >>>> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> >>>>> I do not see why not to use `FOR UPDATE` even with multi-writer or
> >>>>> Is the retry/swap way really solves anything here.
> >>>> 
> >>>>> Am I missed something ?
> >>>>
> >>>> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> >>>> that are needed to support SELECT FOR UPDATE statements across multiple
> >>>> cluster nodes.
> >>>
> >>> Galere does not replicates the row-level locks created by UPDATE/INSERT
> >>> ...
> >>> So what to do with the UPDATE?
> >>
> >> No, Galera replicates the write sets (binary log segments) for
> >> UPDATE/INSERT/DELETE statements -- the things that actually
> >> change/add/remove records in DB tables. No locks are replicated, ever.
> >
> > Galera does not do any replication at UPDATE/INSERT/DELETE time.
> >
> > $ mysql
> > use test;
> > CREATE TABLE test (id integer PRIMARY KEY AUTO_INCREMENT, data CHAR(64));
> >
> > $(echo 'use test; BEGIN;'; while true ; do echo 'INSERT INTO test(data)
> > VALUES ("test");'; done )  | mysql
> >
> > The writer1 is busy, the other nodes did not noticed anything about the
> > above pending
> > transaction, for them this transaction does not exists as long as you do
> > not call a COMMIT.
> >
> > Any kind of DML/DQL you issue without a COMMIT does not happened in the
> > other nodes perspective.
> >
> > Replication happens at COMMIT time if the `write sets` is not empty.
> 
> We're going in circles here. I was just pointing out that SELECT ... FOR
> UPDATE will never replicate anything. INSERT/UPDATE/DELETE statements
> will cause a write-set to be replicated (yes, upon COMMIT of the
> containing transaction).
> 
> Please see my repeated statements in this thread and others that the
> compare-and-swap technique is dependent on issuing *separate*
> transactions for each SELECT and UPDATE statement...
> 
> > When a transaction wins a voting, the other nodes rollbacks all transaction
> > which had a local conflicting row lock.
> 
> A SELECT statement in a separate transaction does not ever trigger a
> ROLLBACK, nor will an UPDATE statement that does not match any rows.
> That is IMO how increased throughput is achieved in the compare-and-swap
> technique versus the SELECT FOR UPDATE technique.
> 
yes, I mentioned this way in one bug [0].

But the related changes on the review, actually works as I said [1][2][3],
and the SELECT is not in a separated dedicated transaction.


[0] https://blueprints.launchpad.net/nova/+spec/lock-free-quota-management
[1] https://review.openstack.org/#/c/143837/
[2] https://review.openstack.org/#/c/153558/
[3] https://review.openstack.org/#/c/149261/

> -jay
> 
> -jay
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-11 Thread Attila Fazekas




- Original Message -
> From: "Jay Pipes" 
> To: "Attila Fazekas" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> , "Pavel
> Kholkin" 
> Sent: Tuesday, February 10, 2015 7:32:11 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 02/10/2015 06:28 AM, Attila Fazekas wrote:
> > - Original Message -
> >> From: "Jay Pipes" 
> >> To: "Attila Fazekas" , "OpenStack Development Mailing
> >> List (not for usage questions)"
> >> 
> >> Cc: "Pavel Kholkin" 
> >> Sent: Monday, February 9, 2015 7:15:10 PM
> >> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody
> >> should know about Galera
> >>
> >> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> >>> I do not see why not to use `FOR UPDATE` even with multi-writer or
> >>> Is the retry/swap way really solves anything here.
> >> 
> >>> Am I missed something ?
> >>
> >> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> >> that are needed to support SELECT FOR UPDATE statements across multiple
> >> cluster nodes.
> >
> > Galere does not replicates the row-level locks created by UPDATE/INSERT ...
> > So what to do with the UPDATE?
> 
> No, Galera replicates the write sets (binary log segments) for
> UPDATE/INSERT/DELETE statements -- the things that actually
> change/add/remove records in DB tables. No locks are replicated, ever.

Galera does not do any replication at UPDATE/INSERT/DELETE time. 

$ mysql
use test;
CREATE TABLE test (id integer PRIMARY KEY AUTO_INCREMENT, data CHAR(64));

$(echo 'use test; BEGIN;'; while true ; do echo 'INSERT INTO test(data) VALUES 
("test");'; done )  | mysql

The writer1 is busy, the other nodes did not noticed anything about the above 
pending
transaction, for them this transaction does not exists as long as you do not 
call a COMMIT.

Any kind of DML/DQL you issue without a COMMIT does not happened in the other 
nodes perspective.

Replication happens at COMMIT time if the `write sets` is not empty.

When a transaction wins a voting, the other nodes rollbacks all transaction
which had a local conflicting row lock.


> > Why should I handle the FOR UPDATE differently?
> 
> Because SELECT FOR UPDATE doesn't change any rows, and therefore does
> not trigger any replication event in Galera.

What matters is the full transaction changed any row at COMMIT time or not.
The DMLs them-self does not starts a replication as `SELECT FOR UPDATE` does 
not.

>
> See here:
> 
> http://www.percona.com/blog/2014/09/11/openstack-users-shed-light-on-percona-xtradb-cluster-deadlock-issues/
> 
> -jay
> 
> >> https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ
> >>
> >> Best,
> >> -jay
> >>
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-10 Thread Attila Fazekas




- Original Message -
> From: "Jay Pipes" 
> To: "Attila Fazekas" , "OpenStack Development Mailing 
> List (not for usage questions)"
> 
> Cc: "Pavel Kholkin" 
> Sent: Monday, February 9, 2015 7:15:10 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> > I do not see why not to use `FOR UPDATE` even with multi-writer or
> > Is the retry/swap way really solves anything here.
> 
> > Am I missed something ?
> 
> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> that are needed to support SELECT FOR UPDATE statements across multiple
> cluster nodes.
> 

Galere does not replicates the row-level locks created by UPDATE/INSERT ...
So what to do with the UPDATE ?

Why should I handle the FOR UPDATE differently ?

> https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ
> 
> Best,
> -jay
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-10 Thread Attila Fazekas




- Original Message -
> From: "Jay Pipes" 
> To: openstack-dev@lists.openstack.org
> Sent: Monday, February 9, 2015 9:36:45 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 02/09/2015 03:10 PM, Clint Byrum wrote:
> > Excerpts from Jay Pipes's message of 2015-02-09 10:15:10 -0800:
> >> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> >>> I do not see why not to use `FOR UPDATE` even with multi-writer or
> >>> Is the retry/swap way really solves anything here.
> >> 
> >>> Am I missed something ?
> >>
> >> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> >> that are needed to support SELECT FOR UPDATE statements across multiple
> >> cluster nodes.
> >>
> >> https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ
> >
> > Attila acknowledged that. What Attila was saying was that by using it
> > with Galera, the box that is doing the FOR UPDATE locks will simply fail
> > upon commit because a conflicting commit has already happened and arrived
> > from the node that accepted the write. Further what Attila is saying is
> > that this means there is not such an obvious advantage to the CAS method,
> > since the rollback and the # updated rows == 0 are effectively equivalent
> > at this point, seeing as the prior commit has already arrived and thus
> > will not need to wait to fail certification and be rolled back.
> 
> No, that is not correct. In the case of the CAS technique, the frequency
> of rollbacks due to certification failure is demonstrably less than when
> using SELECT FOR UPDATE and relying on the certification timeout error
> to signal a deadlock.
> 
> > I am not entirely certain that is true though, as I think what will
> > happen in sequential order is:
> >
> > writer1: UPDATE books SET genre = 'Scifi' WHERE genre = 'sciencefiction';
> > writer1: --> send in-progress update to cluster
> > writer2: SELECT FOR UPDATE books WHERE id=3;
> > writer1: COMMIT
> > writer1: --> try to certify commit in cluster
> > ** Here is where I stop knowing for sure what happens **
> > writer2: certifies writer1's transaction or blocks?
> 
> It will certify writer1's transaction. It will only block another thread
> hitting writer2 requesting write locks or write-intent read locks on the
> same records.
> 
> > writer2: UPDATE books SET genre = 'sciencefiction' WHERE id=3;
> > writer2: COMMIT --> One of them is rolled back.
> >

The other transaction can be rolled back before you do an actual commit:
writer1: BEGIN
writer2: BEGIN
writer1: update test set val=42 where id=1;
writer2: update test set val=42 where id=1;
writer1: COMMIT
writer2: show variables;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting 
transaction

As you can see 2th transaction failed without issuing a COMMIT after the 1th 
one committed.
You could write anything to mysql on writer2 at this point,
 even invalid things returns with `Deadlock`.

> > So, at that point where I'm not sure (please some Galera expert tell
> > me):
> >
> > If what happens is as I suggest, writer1's transaction is certified,
> > then that just means the lock sticks around blocking stuff on writer2,
> > but that the data is updated and it is certain that writer2's commit will
> > be rolled back. However, if it blocks waiting on the lock to resolve,
> > then I'm at a loss to determine which transaction would be rolled back,
> > but I am thinking that it makes sense that the transaction from writer2
> > would be rolled back, because the commit is later.
> 
> That is correct. writer2's transaction would be rolled back. The
> difference is that the CAS method would NOT trigger a ROLLBACK. It would
> instead return 0 rows affected, because the UPDATE statement would
> instead look like this:
> 
> UPDATE books SET genre = 'sciencefiction' WHERE id = 3 AND genre = 'SciFi';
> 
> And the return of 0 rows affected would trigger a simple retry of the
> read and then update attempt on writer2 instead of dealing with ROLLBACK
> semantics on the transaction.
> 
> Note that in the CAS method, the SELECT statement and the UPDATE are in
> completely different transactions. This is a very important thing to
> keep in mind.
> 
> > All this to say that usually the reason for SELECT FOR UPDATE is not
> > to only do an update (the transactional semantics handle that), but
> > also to prevent the old row from being seen agai

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-09 Thread Attila Fazekas




- Original Message -
> From: "Jay Pipes" 
> To: openstack-dev@lists.openstack.org, "Pavel Kholkin" 
> Sent: Wednesday, February 4, 2015 8:04:10 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 02/04/2015 12:05 PM, Sahid Orentino Ferdjaoui wrote:
> > On Wed, Feb 04, 2015 at 04:30:32PM +, Matthew Booth wrote:
> >> I've spent a few hours today reading about Galera, a clustering solution
> >> for MySQL. Galera provides multi-master 'virtually synchronous'
> >> replication between multiple mysql nodes. i.e. I can create a cluster of
> >> 3 mysql dbs and read and write from any of them with certain consistency
> >> guarantees.
> >>
> >> I am no expert[1], but this is a TL;DR of a couple of things which I
> >> didn't know, but feel I should have done. The semantics are important to
> >> application design, which is why we should all be aware of them.
> >>
> >>
> >> * Commit will fail if there is a replication conflict
> >>
> >> foo is a table with a single field, which is its primary key.
> >>
> >> A: start transaction;
> >> B: start transaction;
> >> A: insert into foo values(1);
> >> B: insert into foo values(1); <-- 'regular' DB would block here, and
> >>report an error on A's commit
> >> A: commit; <-- success
> >> B: commit; <-- KABOOM
> >>
> >> Confusingly, Galera will report a 'deadlock' to node B, despite this not
> >> being a deadlock by any definition I'm familiar with.
> 
> It is a failure to certify the writeset, which bubbles up as an InnoDB
> deadlock error. See my article here:
> 
> http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/
> 
> Which explains this.

I do not see why not to use `FOR UPDATE` even with multi-writer or
Is the retry/swap way really solves anything here.

Using 'FOR UPDATE' in with 'repeatable read' isolation level, seams still more 
efficient
and has several advantages.

* The SELECT with 'FOR UPDATE' will read the committed version, so you do not 
really need to
  worry about when the transaction actually started. You will get fresh data 
before you reaching the
  actual UPDATE.

* In the article the example query will not return 
  new version of data in the same transaction even if you are retrying, so
  you need to restart the transaction anyway.

  When you are using the 'FOR UPDATE' way if any other transaction successfully 
commits conflicting
  row on any other galera writer, your pending transaction will be rolled back 
at your next statement,
  WITHOUT spending any time in certificating that transaction.
  In this perspective the checking the number after the update `Compare and 
swap` or
  handling an exception does not makes any difference.

* Using FOR UPDATE in a galera transaction (multi-writer) is not more evil than 
using UPDATE, 
  concurrent commit invalidates both of them in the same way (DBDeadlock).  

* The 'FOR UPDATE' if you are using just a `single writer` does not lets other 
threads to do useless work
  while wasting resources.

* The swap way also can be rolled back by galera almost anywhere (DBDeadLock).
  At the end the swap way looks like it just replaced  the exception handling,
  with a return code check + manual transaction restart.

Am I missed something ?

> > Yes ! and if I can add more information and I hope I do not make
> > mistake I think it's a know issue which comes from MySQL, that is why
> > we have a decorator to do a retry and so handle this case here:
> >
> >
> > http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n177
> 
> It's not an issue with MySQL. It's an issue with any database code that
> is highly contentious.
> 
> Almost all highly distributed or concurrent applications need to handle
> deadlock issues, and the most common way to handle deadlock issues on
> database records is using a retry technique. There's nothing new about
> that with Galera.
> 
> The issue with our use of the @_retry_on_deadlock decorator is *not*
> that the retry decorator is not needed, but rather it is used too
> frequently. The compare-and-swap technique I describe in the article
> above dramatically* reduces the number of deadlocks that occur (and need
> to be handled by the @_retry_on_deadlock decorator) and dramatically
> reduces the contention over critical database sections.
> 
> Best,
> -jay
> 
> * My colleague Pavel Kholkin is putting together the results of a
> benchmark run that compares the compare-and-swap method with the raw
> @_retry_on_deadlock decorator method. Spoiler: the compare-and-swap
> method cuts the runtime of the benchmark by almost *half*.
> 
> >> Essentially, anywhere that a regular DB would block, Galera will not
> >> block transactions on different nodes. Instead, it will cause one of the
> >> transactions to fail on commit. This is still ACID, but the semantics
> >> are quite different.
> >>
> >> The impact of this is that code which ma

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-05 Thread Attila Fazekas




- Original Message -
> From: "Matthew Booth" 
> To: openstack-dev@lists.openstack.org
> Sent: Thursday, February 5, 2015 12:32:33 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 05/02/15 11:01, Attila Fazekas wrote:
> > I have a question related to deadlock handling as well.
> > 
> > Why the DBDeadlock exception is not caught generally for all api/rpc
> > request ?
> > 
> > The mysql recommendation regarding to Deadlocks [1]:
> > "Normally, you must write your applications so that they are always
> >  prepared to re-issue a transaction if it gets rolled back because of a
> >  deadlock."
> 
> This is evil imho, although it may well be pragmatic. A deadlock (a real
> deadlock, that is) occurs because of a preventable bug in code. It
> occurs because 2 transactions have attempted to take multiple locks in a
> different order. Getting this right is hard, but it is achievable. The
> solution to real deadlocks is to fix the bugs.
>
> 
> Galera 'deadlocks' on the other hand are not deadlocks, despite being
> reported as such (sounds as though this is due to an implementation
> quirk?). They don't involve 2 transactions holding mutual locks, and
> there is never any doubt about how to proceed. They involve 2
> transactions holding the same lock, and 1 of them committed first. In a
> real deadlock they wouldn't get as far as commit. This isn't any kind of
> bug: it's normal behaviour in this environment and you just have to
> handle it.
>
> > Now the services are just handling the DBDeadlock in several places.
> > We have some logstash hits for other places even without galera.
> 
> I haven't had much success with logstash. Could you post a query which
> would return these? This would be extremely interesting.

Just use this:
message: "DBDeadlock"

If you would like to exclude the lock wait timeout ones:
message: "Deadlock found when trying to get lock"


> > Instead of throwing 503 to the end user, the request could be repeated
> > `silently`.
> > 
> > The users would be able repeat the request himself,
> > so the automated repeat should not cause unexpected new problem.
> 
> Good point: we could argue 'no worse than now', even if it's buggy.
> 
> > The retry limit might be configurable, the exception needs to be watched
> > before
> > anything sent to the db on behalf of the transaction or request.
> > 
> > Considering all request handler as potential deadlock thrower seams much
> > easier than,
> > deciding case by case.
> 
> Well this happens at the transaction level, and we don't quite have a
> 1:1 request:transaction relationship. We're moving towards it, but
> potentially long running requests will always have to use multiple
> transactions.
> 
> However, I take your point. I think retry on transaction failure is
> something which would benefit from standard handling in a library.
> 
> Matt
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
> 
> Phone: +442070094448 (UK)
> GPG ID:  D33C3490
> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-05 Thread Attila Fazekas
I have a question related to deadlock handling as well.

Why the DBDeadlock exception is not caught generally for all api/rpc request ?

The mysql recommendation regarding to Deadlocks [1]:
"Normally, you must write your applications so that they are always 
 prepared to re-issue a transaction if it gets rolled back because of a 
deadlock."

Now the services are just handling the DBDeadlock in several places.
We have some logstash hits for other places even without galera.

Instead of throwing 503 to the end user, the request could be repeated 
`silently`.

The users would be able repeat the request himself,
so the automated repeat should not cause unexpected new problem.

The retry limit might be configurable, the exception needs to be watched before
anything sent to the db on behalf of the transaction or request.

Considering all request handler as potential deadlock thrower seams much easier 
than,
deciding case by case.  

[1] http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html

- Original Message -
> From: "Matthew Booth" 
> To: openstack-dev@lists.openstack.org
> Sent: Thursday, February 5, 2015 10:36:55 AM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
> should know about Galera
> 
> On 04/02/15 17:05, Sahid Orentino Ferdjaoui wrote:
> >> * Commit will fail if there is a replication conflict
> >>
> >> foo is a table with a single field, which is its primary key.
> >>
> >> A: start transaction;
> >> B: start transaction;
> >> A: insert into foo values(1);
> >> B: insert into foo values(1); <-- 'regular' DB would block here, and
> >>   report an error on A's commit
> >> A: commit; <-- success
> >> B: commit; <-- KABOOM
> >>
> >> Confusingly, Galera will report a 'deadlock' to node B, despite this not
> >> being a deadlock by any definition I'm familiar with.
> > 
> > Yes ! and if I can add more information and I hope I do not make
> > mistake I think it's a know issue which comes from MySQL, that is why
> > we have a decorator to do a retry and so handle this case here:
> > 
> >   
> > http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n177
> 
> Right, and that remains a significant source of confusion and
> obfuscation in the db api. Our db code is littered with races and
> potential actual deadlocks, but only some functions are decorated. Are
> they decorated because of real deadlocks, or because of Galera lock
> contention? The solutions to those 2 problems are very different! Also,
> hunting deadlocks is hard enough work. Adding the possibility that they
> might not even be there is just evil.
> 
> Incidentally, we're currently looking to replace this stuff with some
> new code in oslo.db, which is why I'm looking at it.
> 
> Matt
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
> 
> Phone: +442070094448 (UK)
> GPG ID:  D33C3490
> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [fedora] Re: upstream f21 devstack test

2015-01-25 Thread Attila Fazekas
I have tried the old 'vmlinuz-3.17.4-301.fc21.x86_64' kernel in my env,
 with this version volume attachment related tests are failing, but in the test 
case
so I do not see the secondary network failures.

In my env with '3.17.8-300.fc21.x86_64' everything passes with nnet,
I would say 3.17.4-301.fc21.x86_64 version of kernel is buggy.

On the gate vm the new kernel(3.17.8-300.fc21.x86_64) was installed before the 
boot,
but the boot manager config still picks the old kernel. I tried to switch to 
the new
kernel in `your vm`, but the machine failed to reboot, maybe I miss configured 
the extlinux.conf or we have
some environment specific issue.
I lost your onhold vm. :(

Looks like https://bugs.launchpad.net/nova/+bug/1353939 was always triggered,  
ie.
the vm failed to delete which caused wrong iptables rules left behind, which 
caused
several subsequent ssh test failure if the test used the same fixedip as the 
test_rescued_vm_detach_volume.

Tempest could be stricter and fail the test suite at tearDownClass when the vm 
moves to ERROR state at delete.


- Eredeti üzenet -----
> Feladó: "Attila Fazekas" 
> Címzett: "Ian Wienand" 
> Másolatot kap: "Alvaro Lopez Ortega\", \"Jeremy Stanley\", \"Sean Dague\" 
> , \"dean Troyer\"
> , \"OpenStack Development Mailing List" 
> Elküldött üzenetek: Csütörtök, 2015. Január 22. 18:16:01
> Tárgy: [fedora] Re: upstream f21 devstack test
> 
> 
> 
> - Mail original -
> > De: "Attila Fazekas" 
> > À: "Ian Wienand" 
> > Cc: "Alvaro Lopez Ortega" , "Jeremy Stanley"
> > , "Sean Dague" ,
> > "dean Troyer" , "OpenStack Development Mailing List (not
> > for usage questions)"
> > 
> > Envoyé: Lundi 19 Janvier 2015 18:02:17
> > Objet: Re: upstream f21 devstack test
> > 
> > Per request moving this thread to the openstack-dev list.
> > 
> > I was not able to reproduce the issue so far either on the
> > vm you pointed me or in any of my VMs.
> > 
> > Several things I observed on `your` machine:
> > 1. The installed kernel is newer then the actually used (No known related
> > issue)
> 
> strace on libvirt does not wants to terminate properly on Ctrl+C,
> probably this not the only miss behavior related to processes.
> 
> The kernel version and hyper-visor type might be relevant to the
> 'Exception during message handling: Failed to terminate process 32495 with
> SIGKILL: Device or resource busy'
> 
> According to the strace the signal was sent, and the process was killed,
> but it is zombie until the strace not killed.
> 
> 
> > 2. On the First tempest (run logs are collected [0]) lp#1353939 was
> > triggered, but not related
> I was wrong.
> This was related. An exception during instance delete can live behind
> iptables rules, so not the correct security group rules will be applied.
> 
> In the other jenkins jobs this situation is rare.
> 
> On `your` vm 'tox -eall test_rescued_vm_detach_volume' triggers the issue
> almost always, in other env I was not able tor reproduce it so far.
> 
> > 3. After tried to reproduce the use many-many times I hit lp#1411525, the
> > patch
> >which introduced is already reverted.
> > 4. Once I saw 'Returning 400 to user: No nw_info cache associated with
> > instance' what I haven't
> >seen with nova network for a long time.  (once in 100 run)
> > 5. I see many annoying iscsi related logging, It also does not related to
> > the
> > connection issue,
> >IMHO the tgtadm can be considered as DEPRECATED thing, and we should
> >switch to lioadm.
> > 
> > So far, No Log entry found in connection to connection issue
> >  which would worth to search on logstash.
> > 
> > The nova network log is not sufficient to figure out the actual netfilter
> > state at any moment.
> > According the log it should have update the chains with something, but who
> > knows..
> > 
> > With the ssh connection issues you can do very few things as post-mortem
> > analyses.
> > Tempest normally deletes the related resources, so less evidences
> > remaining.
> > If the issue is reproducible some cases enough to alter the test to do not
> > destroy evidences,
> > but very frequently some kind of real debugger required.
> > 
> > Several suspected thing:
> > * The vm was able to acquire address via dhcp -> successful boot, has L2
> > connectivity.
> > * No evidence found for a dead qemu, no special libvirt operation requested
> > before failure.
> &g

Re: [openstack-dev] upstream f21 devstack test

2015-01-19 Thread Attila Fazekas
Per request moving this thread to the openstack-dev list.

I was not able to reproduce the issue so far either on the
vm you pointed me or in any of my VMs.

Several things I observed on `your` machine:
1. The installed kernel is newer then the actually used (No known related issue)
2. On the First tempest (run logs are collected [0]) lp#1353939 was triggered, 
but not related
3. After tried to reproduce the use many-many times I hit lp#1411525, the patch
   which introduced is already reverted.
4. Once I saw 'Returning 400 to user: No nw_info cache associated with 
instance' what I haven't
   seen with nova network for a long time.  (once in 100 run)
5. I see many annoying iscsi related logging, It also does not related to the 
connection issue,
   IMHO the tgtadm can be considered as DEPRECATED thing, and we should switch 
to lioadm.

So far, No Log entry found in connection to connection issue 
 which would worth to search on logstash.

The nova network log is not sufficient to figure out the actual netfilter state 
at any moment.
According the log it should have update the chains with something, but who 
knows..

With the ssh connection issues you can do very few things as post-mortem 
analyses.
Tempest normally deletes the related resources, so less evidences remaining.
If the issue is reproducible some cases enough to alter the test to do not 
destroy evidences,
but very frequently some kind of real debugger required.

Several suspected thing:
* The vm was able to acquire address via dhcp -> successful boot, has L2 
connectivity.
* No evidence found for a dead qemu, no special libvirt operation requested 
before failure.
* nnet claims it added the floating ip to the br100
* L3 issue / security group rules ?..

The basic network debug was removed form tempest[1]. I would like to recommend 
to revert that change
in order to have an idea at least the interfaces and netfilter was or wasn't in 
a good shape [1].

I also created a vm with enabled firewalld (normally it is not in my devstack 
setups), the 3
mentioned test case working fine even after running these tests for hours.
However the '/var/log/firewalld' contains COMMAD_FAILURES as in `your` vm. 

I will try run more full tempest+nnet@F21 in my env to have more sample for 
success rate.

So far I reproduced 0 ssh failure,
so I will scan the logs[0] again more carefully on `your` machine,
maybe I missed something, maybe those tests interfered with something less 
obvious.

I'll check the other gate f21 logs (~100 job/week),
 does anything happened when the issue started and/or is the issue still 
exists. 


So, I have nothing useful at the moment, but I did not given up.

[0] 
http://logs.openstack.org/87/139287/14/check/check-tempest-dsvm-f21/5f3d210/console.html.gz
[1] https://review.openstack.org/#/c/140531/


PS.:
F21's HaProxy is more sensitive to services which stops listening,
and it will not be evenly balanced. 
For a working F21 neutron job better listener is required: 
https://review.openstack.org/#/c/146039/ .
 


- Original Message -
> From: "Ian Wienand" 
> To: "Attila Fazekas" 
> Cc: "Alvaro Lopez Ortega" , "Jeremy Stanley" 
> , "Sean Dague" ,
> "dean Troyer" 
> Sent: Friday, January 16, 2015 5:24:38 AM
> Subject: upstream f21 devstack test
> 
> Hi Attila,
> 
> I don't know if you've seen, but upstream f21 testing is happening for
> devstack jobs.  As an experimental job I was getting good runs, but in
> the last day and a bit, all runs have started failing.
> 
> The failing tests are varied; a small sample I pulled:
> 
> [1]
> tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_compute_with_volumes
> [2]
> tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern[compute,image,network]
> [3]
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance[compute,image,network]
> 
> The common thread is that they can't ssh to the cirros instance
> started up.
> 
> So far I can not replicate this locally.  I know there were some
> firewalld/neutron issues, but this is not a neutron job.
> 
> Unfortunately, I'm about to head out the door on PTO until 2015-01-27.
> I don't like the idea of this being broken while I don't have time to
> look at it, so I'm hoping you can help out.
> 
> There is a failing f21 machine on hold at
> 
>  jenk...@xx.yy.zz.qq
Sanitized.
> 
> I've attached a private key that should let you log in.  This
> particular run failed in [4]:
> 
>  
> tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_compute_with_volumes
>  
> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,image,network,volume]
> 
> Sorry I haven

Re: [openstack-dev] [QA][Tempest] Proposing Ghanshyam Mann for Tempest Core

2014-11-26 Thread Attila Fazekas
+1

- Original Message -
From: "Marc Koderer" 
To: "OpenStack Development Mailing List (not for usage questions)" 

Sent: Wednesday, November 26, 2014 7:58:06 AM
Subject: Re: [openstack-dev] [QA][Tempest] Proposing Ghanshyam Mann for Tempest 
Core

+1 

Am 22.11.2014 um 15:51 schrieb Andrea Frittoli < andrea.fritt...@gmail.com >: 





+1 
On 21 Nov 2014 18:25, "Ken1 Ohmichi" < ken1ohmi...@gmail.com > wrote: 


+1 :-) 

Sent from my iPod 

On 2014/11/22, at 7:56, Christopher Yeoh < cbky...@gmail.com > wrote: 

> +1 
> 
> Sent from my iPad 
> 
>> On 22 Nov 2014, at 4:56 am, Matthew Treinish < mtrein...@kortar.org > wrote: 
>> 
>> 
>> Hi Everyone, 
>> 
>> I'd like to propose we add Ghanshyam Mann (gmann) to the tempest core team. 
>> Over 
>> the past couple of cycles Ghanshyam has been actively engaged in the Tempest 
>> community. Ghanshyam has had one of the highest review counts on Tempest for 
>> the past cycle, and he has consistently been providing reviews that have 
>> been 
>> of consistently high quality that show insight into both the project 
>> internals 
>> and it's future direction. I feel that Ghanshyam will make an excellent 
>> addition 
>> to the core team. 
>> 
>> As per the usual, if the current Tempest core team members would please vote 
>> +1 
>> or -1(veto) to the nomination when you get a chance. We'll keep the polls 
>> open 
>> for 5 days or until everyone has voted. 
>> 
>> Thanks, 
>> 
>> Matt Treinish 
>> 
>> References: 
>> 
>> https://review.openstack.org/#/q/reviewer:%22Ghanshyam+Mann+%253Cghanshyam.mann%2540nectechnologies.in%253E%22,n,z
>>  
>> 
>> http://stackalytics.com/?user_id=ghanshyammann&metric=marks 
>> 
>> ___ 
>> OpenStack-dev mailing list 
>> OpenStack-dev@lists.openstack.org 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> 
> ___ 
> OpenStack-dev mailing list 
> OpenStack-dev@lists.openstack.org 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 

___ 
OpenStack-dev mailing list 
OpenStack-dev@lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
___ 
OpenStack-dev mailing list 
OpenStack-dev@lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][cinder][qa] Volume attachment not visible on the guest

2014-08-18 Thread Attila Fazekas
Hi All,

I have a `little` trouble with the volume attachment stability.

The test_stamp_pattern test is skipped since long, you
can see what would happen if it would be enabled [1] now.

There is a workaround kind way for enabling that test [2].

I suspected the acpi hot plug event is not detected by the kernel 
at some phases of the boot, for example after the first pci scan,
 but before the pci hot plug initialized.

Is the above blind spot really exists ?

If yes, is something what needs to be handled by init system or
 kernel needs to ensure all device is discovered before calling init ?

Long time ago I had trouble with reproducing the above issue,
but now I was able to see a PCI rescan can solve the issue.
'echo "1" > /sys/bus/pci/rescan' (ssh to guest)

Recently we found `another type` of volume attachment issue,
when booting from volume. [3]

Here I would expect the PCI device is ready before,
the VM actually started, but according to the 
console log, the disk device is missing.

When I am booting from an iscsi volume, is the virtual device show up
guaranteed by nova/cinder/libvirt/qemu/whatever
 to be present at the first pci scan ?

Is there anything what can delay the device/disk appearance ?

Best Regards,
Attila

[1] https://review.openstack.org/#/c/52740/
[2] https://review.openstack.org/#/c/62886/
[3] https://bugs.launchpad.net/nova/+bug/1357677

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [QA] Proposed Changes to Tempest Core

2014-07-25 Thread Attila Fazekas
+1


- Original Message -
> From: "Matthew Treinish" 
> To: openstack-dev@lists.openstack.org
> Sent: Tuesday, July 22, 2014 12:34:28 AM
> Subject: [openstack-dev] [QA] Proposed Changes to Tempest Core
> 
> 
> Hi Everyone,
> 
> I would like to propose 2 changes to the Tempest core team:
> 
> First, I'd like to nominate Andrea Fritolli to the Tempest core team. Over
> the
> past cycle Andrea has been steadily become more actively engaged in the
> Tempest
> community. Besides his code contributions around refactoring Tempest's
> authentication and credentials code, he has been providing reviews that have
> been of consistently high quality that show insight into both the project
> internals and it's future direction. In addition he has been active in the
> qa-specs repo both providing reviews and spec proposals, which has been very
> helpful as we've been adjusting to using the new process. Keeping in mind
> that
> becoming a member of the core team is about earning the trust from the
> members
> of the current core team through communication and quality reviews, not
> simply a
> matter of review numbers, I feel that Andrea will make an excellent addition
> to
> the team.
> 
> As per the usual, if the current Tempest core team members would please vote
> +1
> or -1(veto) to the nomination when you get a chance. We'll keep the polls
> open
> for 5 days or until everyone has voted.
> 
> References:
> 
> https://review.openstack.org/#/q/reviewer:%22Andrea+Frittoli+%22,n,z
> 
> http://stackalytics.com/?user_id=andrea-frittoli&metric=marks&module=qa-group
> 
> 
> The second change that I'm proposing today is to remove Giulio Fidente from
> the
> core team. He asked to be removed from the core team a few weeks back because
> he
> is no longer able to dedicate the required time to Tempest reviews. So if
> there
> are no objections to this I will remove him from the core team in a few days.
> Sorry to see you leave the team Giulio...
> 
> 
> Thanks,
> 
> Matt Treinish
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] Proposals for Tempest core

2013-11-21 Thread Attila Fazekas
+1 for both!



- Original Message -
> From: "Sean Dague" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, November 15, 2013 2:38:27 PM
> Subject: [openstack-dev] [qa] Proposals for Tempest core
> 
> It's post summit time, so time to evaluate our current core group for
> Tempest. There are a few community members that I'd like to nominate for
> Tempest core, as I've found their review feedback over the last few
> months to be invaluable. Tempest core folks, please +1 or -1 as you feel
> appropriate:
> 
> Masayuki Igawa
> 
> His review history is here -
> https://review.openstack.org/#/q/reviewer:masayuki.igawa%2540gmail.com+project:openstack/tempest,n,z
> 
> Ken'ichi Ohmichi
> 
> His review history is here -
> https://review.openstack.org/#/q/reviewer:ken1ohmichi%2540gmail.com+project:openstack/tempest,n,z
> 
> They have both been actively engaged in the Tempest community, and have
> been actively contributing to both Tempest and OpenStack integrated
> projects, working hard to both enhance test coverage, and fix the issues
> found in the projects themselves. This has been hugely beneficial to
> OpenStack as a whole.
> 
> At the same time, it's also time, I think, to remove Jay Pipes from
> tempest-core. Jay's not had much time for reviews of late, and it's
> important that the core review team is a working title about actively
> reviewing code.
> 
> With this change Tempest core would end up no longer being majority
> north american, or even majority english as first language (that kind of
> excites me). Adjusting to both there will be another mailing list thread
> about changing our weekly meeting time to make it more friendly to our
> APAC contributors.
> 
>   -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] nominations for tempest-core

2013-09-12 Thread Attila Fazekas
+1 for both of them

- Original Message -
From: "Sean Dague" 
To: "OpenStack Development Mailing List" 
Sent: Wednesday, September 11, 2013 10:32:11 PM
Subject: [openstack-dev] [qa] nominations for tempest-core

We're in Feature Freeze for the Open Stack projects, which actually 
means we're starting the busy cycle for Tempest in people landing 
additional tests for verification of features that hadn't gone in until 
recently. As such, I think now is a good time to consider some new core 
members. There are two people I think have been doing an exceptional job 
that we should include in the core group

Mark Koderer has been spear heading the stress testing in Tempest, 
completing the new stress testing for the H3 milestone, and has gotten 
very active in reviews over the last three months.

You can see his contributions here: 
https://review.openstack.org/#/q/project:openstack/tempest+owner:m.koderer%2540telekom.de,n,z

And his code reviews here: his reviews here - 
https://review.openstack.org/#/q/project:openstack/tempest+reviewer:m.koderer%2540telekom.de,n,z


Giulio Fidente did a lot of great work bringing our volumes testing up 
to par early in the cycle, and has been very active in reviews since the 
Havana cycle opened up.

You can see his contributions here: 
https://review.openstack.org/#/q/project:openstack/tempest+owner:gfidente%2540redhat.com,n,z

And his code reviews here: his reviews here - 
https://review.openstack.org/#/q/project:openstack/tempest+reviewer:gfidente%2540redhat.com,n,z


Both have been active in blueprints and the openstack-qa meetings all 
summer long, and I think would make excellent additions to the Tempest 
core team.

Current QA core members, please vote +1 or -1 to these nominations when 
you get a chance. We'll keep the polls open for 5 days or until everyone 
has voiced their votes.

For reference here are the 90 day review stats for Tempest as of today:

Reviews for the last 90 days in tempest
** -- tempest-core team member
+--+---+
|   Reviewer   | Reviews (-2|-1|+1|+2) (+/- ratio) |
+--+---+
| afazekas **  | 275 (1|29|18|227) (89.1%) |
|  sdague **   |  198 (4|60|0|134) (67.7%) |
|   gfidente   |  130 (0|55|75|0) (57.7%)  |
|david-kranz **|  112 (1|24|0|87) (77.7%)  |
| treinish **  |  109 (5|32|0|72) (66.1%)  |
|  cyeoh-0 **  |   87 (0|19|4|64) (78.2%)  |
|   mkoderer   |   69 (0|20|49|0) (71.0%)  |
| jaypipes **  |   65 (0|22|0|43) (66.2%)  |
|igawa |   49 (0|10|39|0) (79.6%)  |
|   oomichi|   30 (0|9|21|0) (70.0%)   |
| jogo |   26 (0|12|14|0) (53.8%)  |
|   adalbas|   22 (0|4|18|0) (81.8%)   |
| ravikumar-venkatesan |   22 (0|2|20|0) (90.9%)   |
|   ivan-zhu   |   21 (0|10|11|0) (52.4%)  |
|   mriedem|13 (0|4|9|0) (69.2%)   |
|   andrea-frittoli|12 (0|4|8|0) (66.7%)   |
|   mkollaro   |10 (0|5|5|0) (50.0%)   |
|  zhikunliu   |10 (0|4|6|0) (60.0%)   |
|Anju5 |9 (0|0|9|0) (100.0%)   |
|   anteaya|7 (0|3|4|0) (57.1%)|
| Anju |7 (0|0|7|0) (100.0%)   |
|   steve-stevebaker   |6 (0|3|3|0) (50.0%)|
|   prekarat   |5 (0|3|2|0) (40.0%)|
|rahmu |5 (0|2|3|0) (60.0%)|
|   psedlak|4 (0|3|1|0) (25.0%)|
|minsel|4 (0|3|1|0) (25.0%)|
|zhiteng-huang |3 (0|2|1|0) (33.3%)|
| maru |3 (0|1|2|0) (66.7%)|
|   iwienand   |3 (0|1|2|0) (66.7%)|
|FujiokaYuuichi|3 (0|1|2|0) (66.7%)|
|dolph |3 (0|0|3|0) (100.0%)   |
| cthiel-suse  |3 (0|0|3|0) (100.0%)   |
|walter-boring | 2 (0|2|0|0) (0.0%)|
|bnemec| 2 (0|2|0|0) (0.0%)|
|   lifeless   |2 (0|1|1|0) (50.0%)|
|fabien-boucher|2 (0|1|1|0) (50.0%)|
| alex_gaynor  |2 (0|1|1|0) (50.0%)|
|alaski|2 (0|1|1|0) (50.0%)|
|   krtaylor   |2 (0|0|2|0) (100.0%)   |
|   cbehrens   |2 (0|0|2|0) (100.0%)   |
|   Sumanth|2 (0|0|2|0) (100.0%)   |
| ttx  | 1 (0|1|0|0) (0.0%)|
|   rvaknin| 1 (0|1|0|0) (0.0%)|
| rohitkarajgi | 1 (0|1|0|0) (0.0%)|
|   ndipanov   | 1 (0|1|0|0) (0.0%)|
|   michaeltchapman| 1 (0|1|0|0) (0.0%)|
|